Lab 9a. Linear Predictive Coding For Speech Processing: Vocal Tract Parameters Pitch Period Voiced/Unvoiced Speech Switch
Lab 9a. Linear Predictive Coding For Speech Processing: Vocal Tract Parameters Pitch Period Voiced/Unvoiced Speech Switch
k=1
a
k
s[n k] + Gu[n] (2)
To obtain model coecients, we resort to the following: Assume that you are trying to predict signal s[n]
at time n from previous values at times n 1, n 2, . . . etc.. A linear predictor with prediction coecients
k
is dened as a system whose output is
s[n] =
p
k=1
k
s[n k] (3)
The transfer function of the p
th
order linear predictor of equation (3) is the polynomial
P(z) =
p
k=1
k
z
k
The prediction error e(n) is dened as
e[n] = s[n] s[n] = s[n]
p
k=1
k
s[n k] (4)
Equivalently,
E(z) = A(z)S(z)
where
A(z) = 1
p
k=1
k
z
k
Comparing equations (2) and (4) it is seen that when the speech signal obeys the model of (2) exactly, then
k
= a
k
exactly. Then e[n] = Gu[n] and E(z) = GU(z). Thus the prediction error lter A(z) will be the
inverse lter of the system H(z) of (1). That is,
E(z) = GU(z) = A(z)S(z)
Hence,
H(z) =
S(z)
U(z)
=
G
A(z)
So we have A(z), the analysis lter and H(z), the synthesis lter.
The basic problem of linear prediction analysis is to determine the set of predictor coecients coecients
k
directly from the speech signal. Because of the non-stationary nature of speech, coecients are determined
for short segments of the speech where the signal is considered approximately stationary. These are found
through a minimization of the mean-square prediction error. The resulting parameters are then assumed to
be the parameters of the system function H(z) which is then used for the synthesis of that speech segment.
The method of determining these coecients is outlined below.
0.3 Minimum Mean-Square Error and the Orthogonality Principle
We consider the linear prediction problem of equation (3) as predicting a random variable from a set of other
random variables. Given RVs (x
1
, x
2
, . . . , x
n
) we wish to nd n constants
a
1
, a
2
, a
3
, . . . , a
n
such that we form a linear estimate of a random variable s by the sum of RVs
s = a
1
x
1
+ a
2
x
2
+ . . . , +a
n
x
n
. (5)
This is typically done by assuring that the the mean-square value
P = E{|s (a
1
x
1
+ a
2
x
2
+ . . . , +x
n
)|
2
}
of the resulting error
= s s = s (a
1
x
1
+ a
2
x
2
+ . . . , +x
n
)
is minimum. We do this by setting
P
a
i
= E{2[s (a
1
x
1
+ a
2
x
2
+ . . . , +a
n
x
n
)](x
i
)} = 0 (6)
which yields the so-called Yule Walker equations:
Setting i = 1, 2, . . . , n in equation (6) we get
R
11
a
1
+ R
12
a
2
+ .... + R
1n
a
n
= R
01
R
21
a
1
+ R
22
a
2
+ .... + R
2n
a
n
= R
02
R
31
a
1
+ R
32
a
2
+ .... + R
3n
a
n
= R
03
............................................................
R
n1
a
1
+ R
n2
a
2
+ .... + R
nn
a
n
= R
0n
(7)
where
R
ji
= E{x
i
x
j
} R
0j
= E{sx
j
}
If the data x
i
are linearly independent then the determinant of the coecients R
ij
is positive. Equation
(7) is solved for the unknown coecients a
k
, k = 1, 2, . . . n (
k
on the previous page) by using the so-called
Levinson-Durbin algorithm. Accordingly, the problem essentially consists of determining, for a short segment
of speech, the matrix of correlation coecients R
i,j
and then inverting the matrix to obtain the prediction
coecients which are then transmitted. All this often has to be done in real-time.
0.4 MATLAB LPC DEMO
Run the Demo as per instructions in Lab 9.
Demo Decsription
The demo consists of two parts; analysis and synthesis. The analysis portion is found in the transmitter
section of the system.
Analysis Section:
In this simulation, the speech signal is divided into frames of size 20 ms (160 samples), with an overlap of 10
ms (80 samples). Each frame is windowed using a Hamming window. The original speech signal is passed
through an analysis lter, which is an all-zero lter. It is a so-called lattice lter with coecients referred
to as reection coecients obtained in the previous step. The output of the lter is called the residual
signal. This is what is transmitted here along with the lter coecients. Here, the analysis section output
is simply connected to the synthesis portion.
Synthesis Section:
This residual signal is passed through a synthesis lter which is the inverse of the analysis lter. The output
of the synthesis lter is the original signal.
0.5 LAB REPORT
Give a brief description of what exactly is happening in the analysis and synthesis portion of the MATLAB
LPC speech analysis and synthesis Demo. Observe the residual signal and lter coecients generated in the
Analysis section that are then transmitted to the synthesis section.
Figure 2:
Ref: MATLAB Help, Linear Predicting & Coding of Speech.
Class notes:mirchand/ee276-2003