0% found this document useful (0 votes)

169 views

MSC Data Science - 02 PDF

This document summarizes an audio processing and analysis lesson that discusses: 1) Representing audio signals in both the time and frequency domains, including using the discrete Fourier transform (DFT) and fast Fourier transform (FFT) to obtain a frequency-domain representation. 2) Computing spectrograms, which provide a two-dimensional time-frequency representation of audio by applying short-time FFTs over windows of the signal. 3) Code examples for calculating the FFT of a simple test signal, computing a spectrogram from a music audio file, and recording live audio input and visualizing its spectrogram.

Uploaded by

Rodrigo Sánchez Mariño

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

169 views

MSC Data Science - 02 PDF

Uploaded by

Rodrigo Sánchez Mariño

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

MSc in Data Science

NCSR Demokritos
http://msc-data-science.iit.demokritos.gr
Semester: 3rd
Course: Multimodal Information Processing And Analysis

Lesson 2
Audio Representations
and Feature Extraction
Theodoros Giannakopoulos
2

Audio Analysis Applications

Applications Audio Speech Music

Automatic Speech Recognition (ASR)

Virtual Assistants

Music Search

Surveillance

Environmental Monitoring

Recommendation
3

Audio Analysis Goal & Applications

- Goal: extract high-level descriptions from raw audio signals (sounds)
- Using:
- signal analysis: to extract features and representations
- machine learning (supervised or unsupervised) to train models and to discover patterns
- Speech / Music / Audio
- Also referred to as “machine listening”

# music
# group:arctic_monkeys
# genre:indie
# genre:post_punk
# singer:alex_turner
analyze # emotion_high_arousal
+ # emotion_negative_valence
recognize # bpm:170
...
4

Demo code

Course’s code samples will be available at this github repo:

https://github.com/tyiannak/multimodalAnalysis
5

What is sound?
- sound (physics): x(t)

- a travelling vibration (wave)

x(n)
- through a medium (e.g. air)
- transfers energy (particle to particle)
- until “perceived” by our ears
- amplitude - loudness
- frequency - vibrations per sec Ts

- analog sound → digital sound

- sampling (sampling freq), fs
- quantization (bits per sample)
- example:
- 44100 Hz
- 16 bits per sample (sample resolution)
- ~8 million integers for an average song! (single
channel….)
6

DFT
- Discrete Fourier Transform (DFT) DFT is:

- Deﬁned in the range 0..fs

- A frequency domain representation of the signal (time domain)
- Symmetric (center 0..fs/2)
- FFT: eﬃcient implementation of the DFT

- Given x(n) (signal), DFT is

- Inverse →

- Can be re-written in the form →

- I.e. original signal can be written as a weighted average of complex exponentials (weights are DFT

coeﬃcients)
7

Representation: Time Vs Frequency

?
8

Frequency Representation: Spectrogram

- Spectrogram: Time - Frequency 2D representation
- 1st step: Windowing
- Signal broken into short-term windows (or frames)
- Typically 20 to 100 mseconds
- (Non)overlapping
- E.g.: 50 msec frame size, 10 msec step: 80% overlap Time Freq
- 2nd step: FFT resolution resolution

- Fast Implementation of the Discrete Fourier Transform (DFT)

- Not to be confused with Discrete Time Fourier Transform which:
- Is continuous in the frequency domain
- Is periodic in the range 0..fs
- FFT Resolution & window length:
- longer windows → more dense representations → better frequency resolutions
- but also losing time resolution (because of signals’ non - stationarity)
- trade off between time and frequency resolution
9

Frequency Representation: FFT Example

1 """!
2 @brief Example 01
3 @details FFT computation example
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import scipy.fftpack as scp
7 import numpy as np
8 import plotly
9 import plotly.graph_objs as go
10 if __name__ == '__main__':
11 f1, f2, fs, duration = 500, 2500, 8000, 0.1
12 # define time range
13 t = np.arange(0, duration, 1.0/fs)
14 # define signal as sum of cosines
15 x = np.cos(2 * np.pi * t * f1) + np.cos(2 * np.pi * t * f2)
16 # get mag of fft
17 X = np.abs(scp.fft(x))
18 # normalize FFT mag
19 X = X / X.max()
20 freqs = np.arange(0, 1, 1.0/len(X)) * fs
21 # get 1st symmetric part
22 freqs_1 = freqs[0:int(len(freqs)/2)]
23 X_1 = X[0:int(len(X)/2)]
24 figs = plotly.tools.make_subplots(rows=2, cols=1,
25 subplot_titles=["FFT Mag",
26 "FFT Mag 1st sym part"])
27 figs.append_trace(go.Scatter(x=freqs, y=X, showlegend=False), 1, 1)
28 figs.append_trace(go.Scatter(x=freqs_1, y=X_1, showlegend=False), 2, 1)
29 plotly.offline.plot(figs, filename="temp.html", auto_open=True)
10

Frequency Representation: Spectrogram Example

1 """!
2 @brief Example 02
3 @details Example of spectrogram computation for a wav file, using only scipy
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import scipy.fftpack as scp
7 import numpy as np
8 import scipy.io.wavfile as wavfile
9 import plotly
10 import plotly.graph_objs as go
11 layout = go.Layout(title='Spectrogram Calculation Example',
12 xaxis=dict(title='time (sec)',),
13 yaxis=dict(title='Freqs (Hz)',))
14
15
16 def get_fft_spec(signal, fs, win):
17 frame_size, signal_len, spec, times = int(win * fs), len(signal), [], []
18 # break signal into non-overlapping short-term windows (frames)
19 frames = np.array([signal[x:x + frame_size] for x in
20 np.arange(0, signal_len - frame_size, frame_size)])
21 for i_f, f in enumerate(frames): # for each frame
22 times.append(i_f * win)
23 # append mag of fft
24 X = np.abs(scp.fft(f)) ** 2
25 freqs = np.arange(0, 1, 1.0/len(X)) * (fs/2)
26 spec.append(X[0:int(len(X)/2)] / X.max())
27 return np.array(spec).T, freqs, times
28
29 if __name__ == '__main__':
30 [Fs, s] = wavfile.read("../data/sample_music.wav")
31 S, f, t = get_fft_spec(s, Fs, 0.02)
32 heatmap = go.Heatmap(z=S, y=f, x=t)
33
34
plotly.offline.plot(go.Figure(data=[heatmap], layout=layout),
filename="temp.html", auto_open=True)
- scipy.fftpack used for fft
11

Frequency Representation: Spectrogram Recording Example

1 """!
2 @brief Example 03
3 @details Example of audio recording and spectrogram computation
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import numpy as np
7 import scipy.io.wavfile as wavfile
8 import example02, pyaudio, struct, cv2, signal, sys
9
10 fs = 8000
11 bufSize = int(fs * 1.0)
12 def signal_handler(signal, x):
13 wavfile.write("output.wav", fs, np.int16(aggregated_buf))
14 sys.exit(0)
15 signal.signal(signal.SIGINT, signal_handler)
16
17 if __name__ == '__main__':
18 global aggregated_buf; aggregated_buf = np.array([])
19 pa = pyaudio.PyAudio() # initialize recording
20 stream = pa.open(format=pyaudio.paInt16 , channels=1, rate=fs,
21 input=True, frames_per_buffer=bufSize)
22 while 1:
23 # read recorded data, convert bytes to samples and then to numpy array
24 block = stream.read(bufSize, exception_on_overflow=False)
25 s = np.array(list(struct.unpack("%dh"%(len(block)/2), block))).astype(float)
26 aggregated_buf = np.concatenate((aggregated_buf, s))
27 s /= (2 ** 15)
28 # get spectrogram and visualize it using opencv
29 specgram, f, t = example02.get_fft_spec(s, fs, 0.02)
30 iSpec = np.array(specgram[::-1] * 255, dtype=np.uint8)
31 iSpec2 = cv2.resize(iSpec, (600, 350), interpolation=cv2.INTER_CUBIC)
32 iSpec2 = cv2.applyColorMap(iSpec2, cv2.COLORMAP_JET) - scipy.fftpack used for fft (imported from example2)
33 cv2.imshow('Spectrogram', iSpec2) - pyaudio used for data acquisition (recording)
34 cv2.moveWindow('Spectrogram', 100, 100)
35 ch = cv2.waitKey(10)
- opencv used for fast online, non-blocking visualization (matplotlib slower)
12

Frequency Representation: Spectrogram Recording Example

Sampling freq / Aliasing ?
Check the example03.py script on this sound:

- Fs = 8000 Hz→ Fnyquist = 4000 Hz

- Freqs after the Nyquist Freq are not captured
-
13

Open-source libraries for audio signal analysis

- librosa (Python)
- https://librosa.github.io
- Implements various audio features (mfccs, chroma, beat, etc)
- Audio fx (e.g. pitch shift)
- Some ML components
- pyAudioAnalysis (Python)
- https://github.com/tyiannak/pyAudioAnalysis
- Implements various audio features
- Built-in training/testing of classiﬁers (using scikit-learn)
- Clustering (speaker diarization)
- Visualization
- Opensmile (C++)
- https://audeering.com/technology/opensmile/
- Richest set of audio features
14

Spectrogram calculation using pyAudioAnalysis

1 """!
2 @brief Example 04
3 @details pyAudioAnalysis spectrogram calculation and visualization example
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import numpy as np
7 import scipy.io.wavfile as wavfile
8 import plotly
9 import plotly.graph_objs as go
10 from pyAudioAnalysis import ShortTermFeatures
as aF
11 layout = go.Layout(title='Spectrogram Extraction Example using pyAudioAnalysis',
12 xaxis=dict(title='time (sec)',),
13 yaxis=dict(title='Freqs (Hz)',))
14
15 def normalize_signal(signal):
16 signal = np.double(signal) Only for 16-bit sample resolution
17 signal = signal / (2.0 ** 15)
18 return (signal - signal.mean()) / ((np.abs(signal)).max() + 0.0000000001)
19
20 if __name__ == '__main__':
21 [Fs, s] = wavfile.read("../data/sample_music.wav")
22 s = normalize_signal(s)
23 [S, t, f] = aF.spectrogram(s, Fs, int(Fs * 0.020), int(Fs * 0.020))
24 heatmap = go.Heatmap(z=S.T, y=f, x=t)
25 plotly.offline.plot(go.Figure(data=[heatmap], layout=layout),
26 filename="temp.html", auto_open=True)

Also returns time and frequency

scales (in Hz and secs
respectively)
15

Spectrogram calculation using librosa

1 """!
2 @brief Example 05
3 @details librosa spectrogram calculation and visualization example
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import numpy as np
7 import scipy.io.wavfile as wavfile
8 import plotly
9 import librosa
10 import plotly.graph_objs as go
11 layout = go.Layout(title='Spectrogram Extraction Example using librosa',
12 xaxis=dict(title='time (sec)',),
13 yaxis=dict(title='Freqs (Hz)',))
14
15 def normalize_signal(signal):
16 signal = np.double(signal)
17 signal = signal / (2.0 ** 15)
18 return (signal - signal.mean()) / ((np.abs(signal)).max() + 0.0000000001)
19
20 if __name__ == '__main__':
21 [Fs, s] = wavfile.read("../data/sample_music.wav")
22 s = normalize_signal(s)
23 S = np.abs(librosa.stft(s, int(Fs * 0.020), int(Fs * 0.020)))
24 # create frequency and time axes
25 f = [float((f + 1) * Fs) / (int(Fs * 0.020)) for f in range(S.shape[0])]
26 t = [float(t * int(Fs * 0.020)) / Fs for t in range(S.shape[1])]
27 heatmap = go.Heatmap(z=S, y=f, x=t)
28 plotly.offline.plot(go.Figure(data=[heatmap], layout=layout),
29 filename="temp.html", auto_open=True)
16

Melgram calculation using librosa

1 """!
2 @brief Example 06
3 @details librosa spectrogram calculation and visualization example
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import numpy as np
7 import scipy.io.wavfile as wavfile
8 import plotly
9 import librosa
10 import plotly.graph_objs as go
11 layout = go.Layout(title='Melgram Extraction Example using librosa',
12 xaxis=dict(title='time (sec)',),
13 yaxis=dict(title='Mel Coefficient (index)',))
14
15 def normalize_signal(signal):
16 signal = np.double(signal)
17 signal = signal / (2.0 ** 15)
18 return (signal - signal.mean()) / ((np.abs(signal)).max() + 0.0000000001)
19 mel- scale
20
21
if __name__ == '__main__':
[Fs, s] = wavfile.read("../data/sample_music.wav") - Conform with psychoacoustic observations
22
23
s = normalize_signal(s)
S = librosa.feature.melspectrogram(s, Fs, None, int(Fs * 0.020),
- The human auditory system can distinguish neighboring
24 int(Fs * 0.020), power=2) frequencies more easily in the low frequency regions
25 # create frequency and time axes
26 f = range(S.shape[0])
27 t = [float(t * int(Fs * 0.020)) / Fs for t in range(S.shape[1])]
28 heatmap = go.Heatmap(z=S, y=f, x=t)
29 plotly.offline.plot(go.Figure(data=[heatmap], layout=layout),
30 filename="temp.html", auto_open=True)
17

Why Mel?
1 """!
2 @brief Example 07
3 @details Frequency prerceived discrimination experiment Thresholds
4 @author Theodoros Giannakopoulos {[email protected]} Freq 2 Hz 5 Hz 10 Hz 20 Hz
5 """
6 from __future__ import print_function 250 Hz 0.7 1 1 1
7 import os, time, scipy.io.wavfile as wavfile, numpy as np
8 from random import randint 500 Hz 0.4 0.8 0.9 1
9 1000 Hz 0.6 0.8 1 0.9
10 def play_sound(freq, duration, fs):
11 t = np.arange(0, duration, 1.0/fs); x = 0.5*np.cos(2 * np.pi * t * freq) 2000 Hz 0.5 0.4 0.9 1
12 wavfile.write("temp.wav", fs, x); os.system("play temp.wav -q")
13 3000 Hz 0.5 0.5 0.6 1
14 if __name__ == '__main__':
15 freqs, thres, n_exp, fs = [250, 500, 1000, 2000, 3000], [2, 5, 10, 20], 10, 16000
16 answers = [[] for i in range(len(freqs))]
17 for i_f, f in enumerate(freqs):
18 for t in thres:
19 answers[i_f].append(0)
20 for i in range(n_exp):
21 sequel = randint(1, 2)
22 if sequel == 2:
23 play_sound(f, 0.5, fs); time.sleep(0.5); play_sound(f+t, 0.5, fs)
24 else:
25 play_sound(f+t, 0.5, fs); time.sleep(0.5); play_sound(f, 0.5, fs)
26 ans = int(raw_input('Which was higher (1/2):'))
27 if ans == sequel: answers[i_f][-1] += 1
28 print("Freq\t", end='')
29 for t in thres: print("{0:.1f}\t".format(t), end='')
30 print("")
31 for i_f, f in enumerate(freqs):
32 print("{} Hz\t".format(f), end='')
33 for i_t, t in enumerate(thres):
34 print("{0:.1f}\t".format(answers[i_f][i_t] / float(n_exp)), end='')
35 print("")
18

Audio segment feature extraction

- Short-term windowing:
- “frames”
- extract features per frame (such as energy, or spectral centroid)
- result: sequence of vectors (one vector for each frame)
- Segment windowing:
- segments are either predeﬁned or applied on long recordings (e.g. ﬁx-sized)
- each segment corresponds to a sequence of short-term feature vectors
- common practice
- extract segment (mid-term) statistics (μ, σ2)
- applied per sequence of short-term feature sequence (in the segment)

energy e1, e2, …, eN μ

… f1
Framing
f2
1 2 N Spectral c1, c2, …, cN σ2
centroid Feature vector
19

Audio features: Segment Statistics

- Each feature is extracted in a short-term basis
- Segment feature statistics capture temporal changes in short-term feature sequences
- Statistics:
- mean value
- std/var
- percentiles
- max / min
- Skewness
- Examples:
- average zero crossing rate
- deviation of the spectral centroid
20

Time-domain features
- Energy
- usually normalized by window length
- high variation over successive speech frames (std statistic)
- Zero Crossing Rate
- rate of sign changes during the frame
- measure of noisiness
- high values for noisy signals
- Energy Entropy
- measure of abrupt changes in the signal’s energy
- divide frames to K sub-frames and compute (normalized)
sub-energies (esubframe_k)
- compute entropy of esubframe_k sequence
21

Time-domain features - Example

1
2
"""!
@brief Example 08 - using pyAudioAnalysis
3
4
@details pyAudioAnalysis feature extraction example 1
@author Theodoros Giannakopoulos {[email protected]}
- short-term feature sequences for ZCR / Energy
5 """ - sound: vacuum cleaner
6
7
import numpy as np
import plotly - zcr is higher for “noisy” sounds
8 import plotly.graph_objs as go
9 from pyAudioAnalysis import ShortTermFeatures as aF
10 from pyAudioAnalysis import audioBasicIO as aIO Also returns list of
11
12 short-term feature
13 if __name__ == '__main__': names
14 # read machine sound
15 fs, s = aIO.read_audio_file("../data/general/objects/1-46744-A.ogg.wav")
16 duration = len(s) / float(fs)
17 # extract short term features and plot ZCR and Energy
18 [f, fn] = aF.feature_extraction(s, fs, int(fs * 0.050), int(fs * 0.050))
19 figs = plotly.tools.make_subplots(rows=3, cols=1,
20 subplot_titles=["signal", fn[0], fn[1]])
21 time = np.arange(0, duration - 0.050, 0.050)
22 time_s = np.arange(0, duration, 1/float(fs))
23 figs.append_trace(go.Scatter(x=time_s, y=s, showlegend=False), 1, 1)
24 figs.append_trace(go.Scatter(x=time, y=f[0, :], showlegend=False), 2, 1)
25 figs.append_trace(go.Scatter(x=time, y=f[1, :], showlegend=False), 3, 1)
26 plotly.offline.plot(figs, filename="temp.html", auto_open=True)

This is a #n_frames x #n_wins matrix

Audio features: Frequency Domain (Spectral) Features

- Let X be the abs(FFT)
- Spectral Centroid
- Center of gravity of the spectrum
- Spectral spread
- 2nd central moment of the spectrum
- Spectral entropy
- Divide spectrum into L sub-bands
- Compute normalized sub-band energies (Ef)
- Compute entropy
- Spectral Flux
- Spectral change between two successive frames
- Spectral Rolloff
- Freq below which a percentage of the mag distribution of
the spectrum is concentrated
- If the m-th DFT coeﬃcient is the spectral rolloff →
23

Audio features: Frequency Domain (Spectral) Features - Example

1 """!
2 @brief Example 09
3 @details pyAudioAnalysis feature extraction example for male / female speeches
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import numpy as np
7 import plotly
8 import plotly.graph_objs as go
9 from pyAudioAnalysis import ShortTermFeatures as aF
10 from pyAudioAnalysis import audioBasicIO as aIO
11
12
13 if __name__ == '__main__':
14 win = 0.05
15 fp1 = "../data/general/speech/m1_neu-m1-l1.wav.wav" # male
16 fp2 = "../data/general/speech/f1_neu-f1-l2.wav.wav" # female
17 # read machine sound
18 fs1, s1 = aIO.read_audio_file(fp1)
19 fs2, s2 = aIO.read_audio_file(fp2)
20 dur1, dur2 = len(s1) / float(fs1), len(s2) / float(fs2)
21 # extract short term features
22 [f1, fn] = aF.feature_extraction(s1, fs1, int(fs1 * win), int(fs1 * win))
23 [f2, fn] = aF.feature_extraction(s2, fs2, int(fs2 * win), int(fs2 * win))
24 figs = plotly.tools.make_subplots(rows=2, cols=2,
25 subplot_titles=["male sig", "female sig",
26 fn[3], fn[3]])
27 t1 = np.arange(0, dur1 - win, win)
28 ts_1 = np.arange(0, dur1, 1/float(fs1))
29 t2 = np.arange(0, dur2 - win, win)
30 ts_2 = np.arange(0, dur2, 1/float(fs2))
31 figs.append_trace(go.Scatter(x=ts_1, y=s1, showlegend=False), 1, 1)
32 figs.append_trace(go.Scatter(x=ts_2, y=s2, showlegend=False), 1, 2)
33 figs.append_trace(go.Scatter(x=t1, y=f1[3, :], showlegend=False), 2, 1)
34 figs.append_trace(go.Scatter(x=t2, y=f2[3, :], showlegend=False), 2, 2)
35 plotly.offline.plot(figs, filename="temp.html", auto_open=True)
24

Audio features: Cepstral Domain

- Mel-Frequency Cepstral Coefficients
- Compute DFT
- Mel-scale filter bank application
- Compute Ok as the power of the output of each filter
- Compute MFCCs as the discrete cosine transform
coefficients of the mel-scaled log-power spectrum
- Usually select the first 13 MFCCs (considered to
carry enough discriminative information especially
for speech classification tasks)
- Cepstrum in general (not mel):
- Inverse fft of the log fft

signal fft log ifft cepstrum

Audio features: Cepstral Domain - Example

1 """!
2 @brief Example 10
3 @details pyAudioAnalysis feature extraction example - mfccs for male/female
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import numpy as np
7 import plotly
8 import plotly.graph_objs as go
9 from pyAudioAnalysis import audioFeShortTermFeaturesatureExtraction as aF
10 from pyAudioAnalysis import audioBasicIO as aIO
11
12
13 if __name__ == '__main__':
14 win = 0.05
15 fp1 = "../data/general/speech/m1_neu-m1-l1.wav.wav" # male
16 fp2 = "../data/general/speech/f1_neu-f1-l2.wav.wav" # female
17 # read machine sound
18 fs1, s1 = aIO.read_audio_file(fp1)
19 fs2, s2 = aIO.read_audio_file(fp2)
20 dur1, dur2 = len(s1) / float(fs1), len(s2) / float(fs2)
21 # extract short term features
22 [f1, fn] = aF.feature_extraction(s1, fs1, int(fs1 * win), int(fs1 * win))
23 [f2, fn] = aF.feature_extraction(s2, fs2, int(fs2 * win), int(fs2 * win))
24 figs = plotly.tools.make_subplots(rows=1, cols=2,
25 subplot_titles=[fn[9], fn[10]])
26 t1 = np.arange(0, dur1 - 0.050, 0.050)
27 t2 = np.arange(0, dur2 - 0.050, 0.050)
28 figs.append_trace(go.Scatter(x=t1, y=f1[9, :], name="male"), 1, 1)
29 figs.append_trace(go.Scatter(x=t2, y=f2[9, :], name="female"), 1, 1)
30 figs.append_trace(go.Scatter(x=t1, y=f1[10, :], name="male"), 1, 2)
31 figs.append_trace(go.Scatter(x=t2, y=f2[10, :], name="female"), 1, 2)
32
33 plotly.offline.plot(figs, filename="temp.html", auto_open=True)
26

Audio features: Chroma Vector

- 12-element frequency representation
- In music applications
- Group the DFT coeﬃcients of a window into 12 bins
- Each bin represents the 12 equal-tempered classes
of western-type music
- Bins in semitone spacing
- Sk is the set of frequencies for the k-th bin
(representing DFT coeﬃcients)
27

Audio features: Chroma Vector - Example

1 """!
2 @brief Example 11
3 @details pyAudioAnalysis chromagram example
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import plotly
7 import plotly.graph_objs as go
8 from pyAudioAnalysis import ShortTermFeatures as aF
9 from pyAudioAnalysis import audioBasicIO as aIO
10 layout = go.Layout(title='Chromagram example for doremi.wav signal',
11 xaxis=dict(title='time (sec)',),
12 yaxis=dict(title='Chroma Name',))
13
14
15 if __name__ == '__main__':
16 win = 0.04
17 fp = "../data/doremi.wav" # music sample
18 # read machine sound
19 fs, s = aIO.read_audio_file(fp)
20 fs = float(fs)
21 dur1 = len(s) / float(fs)
22 spec, time, freq = aF.chromagram(s, fs, int(fs * win),
23 int(fs * win), False)
24 heatmap = go.Heatmap(z=spec.T, y=freq, x=time)
25 plotly.offline.plot(go.Figure(data=[heatmap], layout=layout),
26 filename="temp.html", auto_open=True)
28

Plotly histogram representation (function in utilities.py)

6 import plotly 36 for i in range(n_features):
7 import plotly.graph_objs as go 37 # for each feature get its bin range (min:(max-min)/n_bins:max)
8 import numpy as np 38 f = np.vstack([x[:, i:i + 1] for x in list_of_feature_mtr])
9 import matplotlib.pyplot as plt 39 bins = np.arange(f.min(), f.max(), (f.max() - f.min()) / n_bins)
10 40 for fi, f in enumerate(list_of_feature_mtr):
11 def plot_feature_histograms(list_of_feature_mtr, feature_names, 41 # load the color for the current class (fi)
12 class_names, n_columns=5): 42 mark_prop = dict(color=clr[fi], line=dict(color=clr[fi], width=3))
13 ''' 43 # compute the histogram of the current feature (i) and normalize:
14 Plots the histograms of all classes and features for a given 44 h, _ = np.histogram(f[:, i], bins=bins)
15 classification task. 45 h = h.astype(float) / h.sum()
16 :param list_of_feature_mtr: list of feature matrices 46 cbins = (bins[0:-1] + bins[1:]) / 2
17 (n_samples x n_features) for each class 47 scatter_1 = go.Scatter(x=cbins, y=h, name=class_names[fi],
18 :param feature_names: list of feature names 48 marker=mark_prop, showlegend=(i == 0))
19 :param class_names: list of class names, for each feature matr 49 # (show the legend only on the first line)
20 ''' 50 figs.append_trace(scatter_1, int(i/n_columns)+1, i % n_columns+1)
21 clr_map = plt.cm.get_cmap('jet') 51 for i in figs['layout']['annotations']:
22 n_features = len(feature_names) 52 i['font'] = dict(size=10, color='#224488')
23 n_bins = 12 53 plotly.offline.plot(figs, filename="report.html", auto_open=True)
24 n_classes = len(class_names)
25 n_rows = int(n_features / n_columns) + 1
26 figs = plotly.tools.make_subplots(rows=n_rows, cols=n_columns,
27 subplot_titles=feature_names)
28 figs['layout'].update(height=(n_rows * 250))
29 range_cl = range(int(int(255/n_classes)/2), 255, int(255/n_classes))
30 clr = []
31 for i in range(n_classes):
32 clr.append('rgba({},{},{},{})'.format(clr_map(range_cl[i])[0],
33 clr_map(range_cl[i])[1],
34 clr_map(range_cl[i])[2],
35 clr_map(range_cl[i])[3]))

https://plot.ly/python/
29

Feature discrimination example: male vs female segments

1 """!
2 @brief Example 12
3 @details pyAudioAnalysis feature extraction for classes organized in folders
4 and feature histogram representation (per feature and class).
5 Binary classification task: male vs female speech segments
6 @author Theodoros Giannakopoulos {[email protected]}
7 """
8 from pyAudioAnalysis import MidTermFeatures as aF
9 import os.path
10 import utilities as ut
11
12 if __name__ == '__main__':
13 dirs = ["../data/gender/male",
14 "../data/gender/female"]
15 class_names = [os.path.basename(d) for d in dirs]
16 m_win, m_step, s_win, s_step = 1, 1, 0.1, 0.05
17 features = []
18 for d in dirs:
19 # get feature matrix for each directory (class)
20 f, files, fn = aF.directory_feature_extraction(d, m_win, m_step, s_win,
21 s_step)
22 features.append(f)
23 ut.plot_feature_histograms(features, fn, class_names)
30

Feature discrimination example: 3-class task

1 """!
2 @brief Example 13
3 @details pyAudioAnalysis feature extraction for classes organized in folders
4 and feature histogram representation (per feature and class).
5 3-class classification task: animals vs speech vs music segments
6 @author Theodoros Giannakopoulos {[email protected]}
7 """
8 from pyAudioAnalysis import MidTermFeatures as aF
9 import os.path
10 import utilities as ut
11
12 if __name__ == '__main__':
13 dirs = ["../data/general/animals",
14 "../data/general/speech",
15 "../data/general/music"]
16 class_names = [os.path.basename(d) for d in dirs]
17 m_win, m_step, s_win, s_step = 1, 1, 0.1, 0.05
18 features = []
19 for d in dirs:
20 # get feature matrix for each directory (class)
21 f, files, fn = aF.directory_feature_extraction(d, m_win, m_step, s_win,
22 s_step)
23 features.append(f)
24 ut.plot_feature_histograms(features, fn, class_names)
31

Beat tracking (1)

- Tempo / beat: fundamental properties of music
- Beat: a steady “pulse” that provides the temporal framework of a song
- Beat tracking: “tapping the foot when music plays”
- onset detection:
- onset: time position of a signiﬁcant signal change (e.g note)
- change in signal’s energy or frequency distribution
- onset → attach → decay
- tempo estimation & beat peaks selection:
- detect peaks that are (almost) equally spaced in time
- detected peaks are (almost) consistent with est. tempo
32

Beat tracking (2) Feature Extraction & Onset

Detection

period (seconds)
Certain peaks that are not consistent
tempo = 60 / period bpms
with the estimated tempo are discarded

Tempo estimation /
beats detection

*check example14.py for this demo

Beat tracking (3)

- Perception of beat is
- hierarchical
- ambiguous
- Estimated tempo can be multiple of the “real” tempo
- E.g. rap song tracked at 170 bpms: true tempo is 170/2 = 85 bpms!
- External expert knowledge may be needed to put constraints in the last step of the tempo extraction method (see
prev slide)

J. P. Bello, L. Daudet, S.Abdallah, C. Duxbury, M. Davies, M. B. Sandler, “A Tutorial on Onset Detection in Music Signals,” IEEETr.Speech andAudio Proc.,vol.13,no.5,pp.1035-1047,September 2005

P. Desain & H. Honing,“Computational models of beat induction:The rule-based approach,” J. New Music Research, vol. 28 no. 1, pp. 29-42, 1999.

Eric. D. Scheirer,“Tempo and beat analysis of acoustic musical signals,” J.Acoust. Soc.Am., vol. 103, pp. 588-601, 1998.

Davies, M. E., & Plumbley, M. D. (2007). Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1009-1020.

McKinney, M. F., & Moelants, D. (2006). Ambiguity in tempo perception: What draws listeners to different metrical levels?. Music Perception: An Interdisciplinary Journal, 24(2), 155-166.
34

Beat tracking - Example

1 """!
2 @brief Example 15
3 @details librosa beattracking example
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import numpy as np
7 import scipy.io.wavfile as wavfile
8 import sys
9 import librosa
10
11 if __name__ == '__main__':
12 # needs filepath as main argument:
13 if len(sys.argv) != 2:
14 sys.exit()
15 # load file and extract tempo and beats:
16 [Fs, s] = wavfile.read(sys.argv[1])
17 tempo, beats = librosa.beat.beat_track(y=s, sr=Fs, units="time")
18 beats -= 0.05
19 # add small 220Hz sounds on the 2nd channel of the song on each beat
20 s = s.reshape(-1, 1)
21 s = np.array(np.concatenate((s, np.zeros(s.shape)), axis=1))
22 for ib, b in enumerate(beats):
23 t = np.arange(0, 0.2, 1.0 / Fs)
24 amp_mod = 0.2 / (np.sqrt(t)+0.2) - 0.2
25 amp_mod[amp_mod < 0] = 0
26 x = s.max() * np.cos(2 * np.pi * t * 220) * amp_mod
27 s[int(Fs * b):
28 int(Fs * b) + int(x.shape[0]), 1] = x.astype('int16')
29 wavfile.write("output.wav", Fs, np.int16(s))

Usage example:
audio|master⚡ ⇒ python3 example15.py
../data/musical_genres_small/hiphop/nwa_straght_outta_campton.wav
35

Beat tracking - Discrimination Example

1 """!
2 @brief Example 16
3 @details librosa beattracking example: extract tempo and spectral centroid for
4 songs from different musical genres
5 @author Theodoros Giannakopoulos {[email protected]}
6 """
7 import scipy.io.wavfile as wavfile, utilities as ut
8 import glob, os, librosa, plotly, numpy as np, plotly.graph_objs as go
9 from pyAudioAnalysis.MidTermFeatures import mid_feature_extraction as mt
10
11 layout = go.Layout(title='Beat and spectral centroid distributions',
12 xaxis=dict(title='Tempo (bpms)',),
13 yaxis=dict(title='Spectral Centroid Mean',))
14
15 def get_dir_features(dir_name):
16 feats = []
17 for f in glob.glob(os.path.join(dir_name, "*.wav")):
18 [Fs, s] = wavfile.read(f)
19 tempo, _ = librosa.beat.beat_track(y=s, sr=Fs)
20 f, _, fn = mt(s, Fs, int(1.0*Fs), int(1.0*Fs), int(0.1*Fs), int(0.1*Fs))
21
22
feats.append([tempo, np.mean(f[fn.index("spectral_centroid_mean")],
axis=0)])
“True” tempo values: - Tempo is not always
23 return np.array(feats) - Triphop: 90-110 discriminative
24
25 if __name__ == '__main__': - Jungle: 160-170 - Tempo estimation has
26
27
g_paths = glob.glob("../data/musical_genres_small/*/")
g_names = [p.split('/')[-2] for p in g_paths]
- Hiphop: 80-100 errors (e.g. triphop
28 clr = ut.get_color_combinations(len(g_paths)) - Techno: 120-130 estimated values are
29 features = [get_dir_features(g) for g in g_paths]
30 plots = [go.Scatter(x=features[i][:, 0], y=features[i][:, 1], double)
31 mode='markers', name=g_names[i], marker=dict(
32 color=clr[i], size=15))
33
34
for i in range(len(g_paths))]
plotly.offline.plot(go.Figure(data = plots, layout=layout),
Need for multiple features to achieve accurate discrimination
35 filename="temp.html", auto_open=True)
36

Pitch tracking
- f0: Pitch tracking:
- fundamental frequency
- a physical property of sound: - Time / spectral domain
- speech: glottal pulses freq - Spectral:
- music: most dominant freq of a note (eg freq of vibration of a string) - simple argmax?
- pitch - no! f0 not always the freq with the
- a subjective phenomenon (f0 open to measurement) max freq in spectrogram
- perceptual
- follows f0
- speech:
- not always clear
- vad required
- music:
- note transcription
- polyphony
37

Pitch Tracking - Example using librosa

1 """!
2 @brief Example 17
3 @details librosa pitch tracking example
4 @author Theodoros Giannakopoulos {[email protected]}
5 """
6 import scipy.io.wavfile as wavfile
7 import librosa
8 import plotly
9 import numpy as np
10 import plotly.graph_objs as go
11 from scipy.signal import medfilt as mf
12
13 layout = go.Layout(title='Librosa pitch estimation',
14 xaxis=dict(title='time frame',),
15 yaxis=dict(title='freq (Hz)',))
16
17 def get_librosa_pitch(signal, fs, window):
18 pitches, magnitudes = librosa.piptrack(y=signal, sr=fs, n_fft=int(window),
19 hop_length=int(window/10))
20 pitch_pos = np.argmax(magnitudes, axis=0)
21 pitches_final = []
22 for i in range(len(pitch_pos)):
23 pitches_final.append(pitches[pitch_pos[i], i])
24 pitches_final = np.array(pitches_final)
25 pitches_final[pitches_final > 2000] = 0 # cut high pitches
26 return mf(pitches_final, 3) # use medfilt for smoothing
27
28 if __name__ == '__main__':
29 [fs, s] = wavfile.read("../data/acapella.wav")
30 p = get_librosa_pitch(s, fs, fs/20)
31 plt = go.Scatter(x=np.arange(len(p)), y=p, mode='markers', showlegend=False)
32 plotly.offline.plot(go.Figure(data=[plt], layout=layout),
33 filename="temp.html", auto_open=True)

n10235728 CAB401 Report
No ratings yet
n10235728 CAB401 Report
14 pages
Department of Electronics 2020-2021: Prof. Shilpa Achaliya
No ratings yet
Department of Electronics 2020-2021: Prof. Shilpa Achaliya
15 pages
1
No ratings yet
1
3 pages
Audio Noise detection
No ratings yet
Audio Noise detection
29 pages
Pad Assignment 2
No ratings yet
Pad Assignment 2
12 pages
A Comparison of Random and Periodic Marine Simultaneous-Source Encoding
No ratings yet
A Comparison of Random and Periodic Marine Simultaneous-Source Encoding
3 pages
Hall 2018 Time Frequency Decomposition
No ratings yet
Hall 2018 Time Frequency Decomposition
3 pages
FROMTXTTIMESERIESTOWAVEFILESANDSPECTROGRAMEXTRACTION SEISMIC JupyterNotebook
No ratings yet
FROMTXTTIMESERIESTOWAVEFILESANDSPECTROGRAMEXTRACTION SEISMIC JupyterNotebook
29 pages
DSP_LAB_2[1]
No ratings yet
DSP_LAB_2[1]
6 pages
Audio Fingerprinting With Python and Numpy
No ratings yet
Audio Fingerprinting With Python and Numpy
13 pages
Dsp Da-02 23bec0056 Yash Mehta
No ratings yet
Dsp Da-02 23bec0056 Yash Mehta
14 pages
Fourier Transform
No ratings yet
Fourier Transform
12 pages
Fourier 4
No ratings yet
Fourier 4
18 pages
Aml CT2 4M
No ratings yet
Aml CT2 4M
8 pages
Predicting Singer Voice Using Convolutional Neural Network
No ratings yet
Predicting Singer Voice Using Convolutional Neural Network
17 pages
Fourier Transform
No ratings yet
Fourier Transform
12 pages
Scribd
No ratings yet
Scribd
10 pages
Experiment No. 3: The Fourier Transform - An Audio Signal Is Comprised of Several Single-Frequency Sound
No ratings yet
Experiment No. 3: The Fourier Transform - An Audio Signal Is Comprised of Several Single-Frequency Sound
7 pages
Scribd
No ratings yet
Scribd
10 pages
Scribd
No ratings yet
Scribd
10 pages
Spectrogram Examples-2
No ratings yet
Spectrogram Examples-2
9 pages
Analysisof Speech Signal 29 TH October 2018
No ratings yet
Analysisof Speech Signal 29 TH October 2018
16 pages
Audio Classification
No ratings yet
Audio Classification
1 page
S20220020307_DSP_3
No ratings yet
S20220020307_DSP_3
10 pages
# 2) Plot Signal With Respect To The Time.: # Imports
No ratings yet
# 2) Plot Signal With Respect To The Time.: # Imports
6 pages
Scribd
No ratings yet
Scribd
9 pages
A6: Harmonic Model: Audio Signal Processing For Music Applications
No ratings yet
A6: Harmonic Model: Audio Signal Processing For Music Applications
9 pages
SEE SPP LAB FINAL (1)
No ratings yet
SEE SPP LAB FINAL (1)
1 page
Scribd
No ratings yet
Scribd
9 pages
DSP DA-01
No ratings yet
DSP DA-01
14 pages
Developing A MATLAB Code For Fundamental Frequency and Pitch Estimation From Audio Signal
No ratings yet
Developing A MATLAB Code For Fundamental Frequency and Pitch Estimation From Audio Signal
16 pages
2 - Fourier - Transforms-Correction: 1 Lab 2: Fourier Transform and Spectrum
No ratings yet
2 - Fourier - Transforms-Correction: 1 Lab 2: Fourier Transform and Spectrum
4 pages
Lab Filter Noise Music
No ratings yet
Lab Filter Noise Music
5 pages
13MFCC Tutorial
No ratings yet
13MFCC Tutorial
6 pages
DSP Lab 5
No ratings yet
DSP Lab 5
7 pages
DSP Exp6
No ratings yet
DSP Exp6
4 pages
1 FFT and Spectrogram: 1.1 Fourier Transform For Finite Duration Signals
No ratings yet
1 FFT and Spectrogram: 1.1 Fourier Transform For Finite Duration Signals
3 pages
Import Numpy As NP
No ratings yet
Import Numpy As NP
8 pages
Listing Code Voice Recognition
No ratings yet
Listing Code Voice Recognition
11 pages
FFT Research
No ratings yet
FFT Research
8 pages
Audio and Digital Signal Processing
No ratings yet
Audio and Digital Signal Processing
18 pages
FFT Calculation
No ratings yet
FFT Calculation
6 pages
FFT Calculation
No ratings yet
FFT Calculation
6 pages
Shazam Princeton ELE201
No ratings yet
Shazam Princeton ELE201
7 pages
MFCC Code
No ratings yet
MFCC Code
8 pages
Mel Spectrograms Explained Easily
No ratings yet
Mel Spectrograms Explained Easily
41 pages
SNP201 Mini Project
No ratings yet
SNP201 Mini Project
7 pages
Fourier Series Expansion of Periodic Signal: (With Period of T)
No ratings yet
Fourier Series Expansion of Periodic Signal: (With Period of T)
45 pages
Report On Project 1 Speech Emotion Recognition
No ratings yet
Report On Project 1 Speech Emotion Recognition
10 pages
DSP Assignment 1
No ratings yet
DSP Assignment 1
7 pages
A4: Short-Time Fourier Transform (STFT) : Audio Signal Processing For Music Applications
No ratings yet
A4: Short-Time Fourier Transform (STFT) : Audio Signal Processing For Music Applications
6 pages
MFCC
100% (2)
MFCC
6 pages
Frequency Response Ex-4
No ratings yet
Frequency Response Ex-4
6 pages
ASP Exercises 1
No ratings yet
ASP Exercises 1
12 pages
Lab4 2011
No ratings yet
Lab4 2011
6 pages
lab
No ratings yet
lab
5 pages
122EC0579_DSP_ASSGN5
No ratings yet
122EC0579_DSP_ASSGN5
7 pages
Analog vs Digital
From Everand
Analog vs Digital
Marcus Tesla
No ratings yet
Dither: Dither: Visual Noise in Computer Vision
From Everand
Dither: Dither: Visual Noise in Computer Vision
Fouad Sabry
No ratings yet
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
From Everand
Digital Signal Processing for Audio Applications: Volume 1 - Formulae
Anton R Kamenov
No ratings yet
Astm F 1249
100% (1)
Astm F 1249
3 pages
Law of Sine and Cosine
No ratings yet
Law of Sine and Cosine
10 pages
Fluke PM6666
No ratings yet
Fluke PM6666
8 pages
Download Statistical programing in SAS Second Edition A. John Bailer ebook All Chapters PDF
100% (1)
Download Statistical programing in SAS Second Edition A. John Bailer ebook All Chapters PDF
62 pages
Unit 3 - HW Answers 1718
No ratings yet
Unit 3 - HW Answers 1718
4 pages
IQLISP
No ratings yet
IQLISP
1 page
Tybba (CA) - CA-504 Python
No ratings yet
Tybba (CA) - CA-504 Python
2 pages
Practice Exercise For Validity
No ratings yet
Practice Exercise For Validity
5 pages
KOCKELMANS, Joseph J. A First Introduction To Husserl's Phenomenology
100% (2)
KOCKELMANS, Joseph J. A First Introduction To Husserl's Phenomenology
392 pages
Business Analytics Data Analysis and Decision Making 6th Edition Albright Test Bank - Full Version With All Chapters Is Ready For Download
No ratings yet
Business Analytics Data Analysis and Decision Making 6th Edition Albright Test Bank - Full Version With All Chapters Is Ready For Download
59 pages
LMS-based Structural Health Monitoring of A Non-Linear Rocking Structure
No ratings yet
LMS-based Structural Health Monitoring of A Non-Linear Rocking Structure
22 pages
3.1 Sequences and Series (L8)
No ratings yet
3.1 Sequences and Series (L8)
28 pages
F24 10423 Homework 4
No ratings yet
F24 10423 Homework 4
19 pages
Exercise 3
No ratings yet
Exercise 3
2 pages
15cdv6 ht2
No ratings yet
15cdv6 ht2
7 pages
Egd Assignment
No ratings yet
Egd Assignment
7 pages
Assignment 01 - CS-101
No ratings yet
Assignment 01 - CS-101
4 pages
P1 Final Revision
No ratings yet
P1 Final Revision
10 pages
Oceansimoactonweather
No ratings yet
Oceansimoactonweather
3 pages
Fluid II Open Ended Lab Report
No ratings yet
Fluid II Open Ended Lab Report
17 pages
Performance Evaluation of GeneratorTransformer Unit Overall Differential Protection
No ratings yet
Performance Evaluation of GeneratorTransformer Unit Overall Differential Protection
9 pages
Worksheet 89FundamentalsofMathematicsi
No ratings yet
Worksheet 89FundamentalsofMathematicsi
12 pages
Nouns - Quiz 3 - Schoology
100% (1)
Nouns - Quiz 3 - Schoology
2 pages
NumberSeries pdf-37 PDF
100% (1)
NumberSeries pdf-37 PDF
4 pages
Elliptic Curve Cryptography Thesis
100% (3)
Elliptic Curve Cryptography Thesis
8 pages
Artificial Intelligence and Data Analytics For Geosciences and Remote Sensing: Theory and Application
No ratings yet
Artificial Intelligence and Data Analytics For Geosciences and Remote Sensing: Theory and Application
28 pages
On The Sunflower Special Graph ???
No ratings yet
On The Sunflower Special Graph ???
27 pages
Spark Transformations and Actions
No ratings yet
Spark Transformations and Actions
24 pages
Electric and Magnetic Field Exposure During Live Working
No ratings yet
Electric and Magnetic Field Exposure During Live Working
8 pages
Theory of Errors in Observations
0% (1)
Theory of Errors in Observations
45 pages

MSC Data Science - 02 PDF

Uploaded by

MSC Data Science - 02 PDF

Uploaded by

MSc in Data Science

Audio Analysis Applications

Automatic Speech Recognition (ASR)

Audio Analysis Goal & Applications

Course’s code samples will be available at this github repo:

- a travelling vibration (wave)

- analog sound → digital sound

- Deﬁned in the range 0..fs

- Given x(n) (signal), DFT is

- Can be re-written in the form →

Representation: Time Vs Frequency

Frequency Representation: Spectrogram

- Fast Implementation of the Discrete Fourier Transform (DFT)

Frequency Representation: FFT Example

Frequency Representation: Spectrogram Example

Frequency Representation: Spectrogram Recording Example

Frequency Representation: Spectrogram Recording Example

- Fs = 8000 Hz→ Fnyquist = 4000 Hz

Open-source libraries for audio signal analysis

Spectrogram calculation using pyAudioAnalysis

Also returns time and frequency

Spectrogram calculation using librosa

Melgram calculation using librosa

Audio segment feature extraction

energy e1, e2, …, eN μ

Audio features: Segment Statistics

Time-domain features - Example

This is a #n_frames x #n_wins matrix

Audio features: Frequency Domain (Spectral) Features

Audio features: Frequency Domain (Spectral) Features - Example

Audio features: Cepstral Domain

signal fft log ifft cepstrum

Audio features: Cepstral Domain - Example

Audio features: Chroma Vector

Audio features: Chroma Vector - Example

Plotly histogram representation (function in utilities.py)

Feature discrimination example: male vs female segments

Feature discrimination example: 3-class task

Beat tracking (1)

Beat tracking (2) Feature Extraction & Onset

*check example14.py for this demo

Beat tracking (3)

Beat tracking - Example

Beat tracking - Discrimination Example

Pitch Tracking - Example using librosa

You might also like