E9 205 - Machine Learning For Signal Processing
E9 205 - Machine Learning For Signal Processing
Homework # 2
Due date: Sept. 26, 2018
(3:45 PM) for analytical and end of the day for coding part.
Coding assignment submitted to mlsp18 doT iisc aT gmail doT com
y = Ax + b + ǫ
where the source features x are assumed to be non-random and ǫ represents i.i.d.
channel noise ǫ ∼ N (0, σ 2 I). Given this model, his advisor Mohan recommends
the maximum
likelihood (ML) method to estimate the parameters of the channel
A, b, σ . How will you solve the problem if you were Kiran ? (Points 10)
(b) While Kiran is successful in estimating the parameters of his model, Mohan is un-
happy with the results when the model is used to approximate a cell phone trans-
mission. Mohan proposes a more complex model where the source ultrasound data
xi , i = 1, ... , N is modeled as a Gaussian mixture model (GMM).
M
X
x∼ αm N (x, µm , Σm )
m=1
Further, the channel is modeled as a linear transformation of the GMM mean com-
ponents µ̂m = Aµm + b. The covariances are not affected in this model. With the
channel ouptuts yi , i = 1, ... , N , how will you help Kiran achieve his PhD faster by
solving for the channel parameters Am , bm , m = 1, ... , M assuming that the source
signal GMM is already estimated. Simplify your result. (Points 15)
3. Implementing GMM - A set of training and test examples of music and speech are
provided.
http : //www.leap.ee.iisc.ac.in/sriram/teaching/M LSP 18/assignments/speech music data.tar.gz
a Generate spectrogram features - Use the log magnitude spectrogram as before with
a 64 component magnitude FFT (NFFT). In this case, the spectrogram will have
dimension 32 times the number of frames (using 25 ms with a shift of 10 ms).
b Train two GMM models with K-means initialization (for each class) separately each
with (i) 2 mixtures with diagonal covariance, (ii) 2 mixtures with full covariance and
(iii) 5-mixture components with diagonal/full covariance respectively on this data.
Plot the log-likelihood as a function of the EM iteration.
c Classify the test samples using the built classifiers and report the performance in
terms of error rate (percentage of mis-classified samples) on the text data.
e Discuss the impact on the performance for different number of mixture components,
diagonal versus full covariance ?
(Points 30)
a Write a code to implement the likelihood computation using the forward variable
after assuming a uniform flat start based initialization with 3 and 5 states per HMM
and GMM with 2 mixture components per state.
b Write a code to implement the Viterbi algorithm to decode the best state sequence
using the existing model.
c Use the Baum-Welch reestimation method to train HMMs with examples from music
and speech features.
d Classify the test examples and report the performance. How does the performance
change for different number of states per HMM, different number of mixture com-
ponents per GMM ? Will diagonal or full covariance GMM be a good choice ?
Hint - Use a flat-start to initialize the model and use scaling technique in implementation.
(Points 35)