Introduction to Audio Analysis

Follow Along


Slides: devonbryant.github.io/audioanalysis

About Me

  • Weekdays
    • Intelligent Software Solutions (ISS)
    • Predictive Analytics, Machine Learning, Anomaly Detection, Plan Monitoring
  • Evenings & Weekends

What is Audio Analysis?

Extracting information from audio signals

Leveraging techniques from Digital Signal Processing (DSP) and Machine Learning.

  • Music Information Retrieval
  • Speech Processing

Music Information Retrieval

  • Recommender Systems
  • Instrument Recognition
  • Music Transcription (Beats, Notes, Chords, Key, etc.)
  • Genre Classification
  • Score Following
  • Query by Singing/Humming
  • Similarity


See ISMIR and MIREX for more info on current research.

Speech Processing

  • Voice Recognition
  • Speaker Recognition
  • Speaker Diarization
  • Voice Analysis
  • Speech Enhancement

Tools of the Trade

Audio in the Time Domain

Sound Waveforms

Pulse-code Modulation (PCM)


Example 4-bit PCM Encoding

Sampling Rate & Nyquist Frequency


  • Nyquist Freq. = 2 times max frequency
  • CD Quality Audio (44.1 KHz, 16-bit stereo)

Audio in the Frequency Domain

Fourier Transforms

Signal represented as a sum of simple sine and cosine functions.
How do we move from time -> frequency and vice versa?

Fourier Transforms for Mathematicians


Fourier Transforms for the rest of us

Two periodic signals, A (input signal) and B (generated).
What happens when we multiply them together and sum the area underneath?
What if the signals share the same frequency, say A X A?

Discrete Fourier Transform (DFT)

Evenly spaced frequencies from 0 hz to the sampling frequency.


Frequency bins for 1024 samples @ 44.1 KHz

No temporal information in the frequency domain


  • What if you have two sounds at different times?
  • How do you deal with this?

The Short-Time Fourier Transform (STFT)

Additional Resources