0% found this document useful (0 votes)
2 views

3

Linear Predictive Coding (LPC) is a key technique for estimating speech parameters like pitch and formants, widely used in low bit rate transmission and storage due to its accuracy and computational efficiency. LPC involves approximating speech samples as a linear combination of past samples, with various methods for analysis including covariance and autocorrelation methods. The document also discusses the stability and computational considerations of different LPC solution methods and applications such as pitch detection.

Uploaded by

pummyvarma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

3

Linear Predictive Coding (LPC) is a key technique for estimating speech parameters like pitch and formants, widely used in low bit rate transmission and storage due to its accuracy and computational efficiency. LPC involves approximating speech samples as a linear combination of past samples, with various methods for analysis including covariance and autocorrelation methods. The document also discusses the stability and computational considerations of different LPC solution methods and applications such as pitch detection.

Uploaded by

pummyvarma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

08-02-2025

SPEECH PROCESSING
Linear Predictive Coding of Speech

Introduction
• Linear Predictive Coding (LPC) is a predominant technique for
estimating the basic speech parameters such as
• Pitch
• Formants
• Spectra
• Vocal Tract Area function
Applications:
• Low bit rate transmission
• Low bit rate storage
Advantages:
• Accurate estimation
• Computation speed is high

1
08-02-2025

Introduction
Basic concept:
• A speech sample is approximated as a linear combination of past speech
samples
• By minimizing the sum of squared differences (over a finite interval)
between the actual and the linearly predicted samples a unique set of
predictor coefficients (weighing coefficients) are determined
Formulations of LP Analysis:
1. Covariance method
2. Autocorrelation method
3. Lattice method
4. Inverse filter formulation
5. Spectral estimation formulation
6. Maximum likelihood formulation
7. Inner product formulation

Basic Principles of LP Analysis


• From the figure the effects of
radiation, vocal tract and glottal
excitation are represented by a time
varying digital filter whose steady
state system function is
( )
𝐻 𝑧 = = ∑
- (1)
( )

• This system is excited by an impulse


train for voiced speech and random
noise for unvoiced speech

2
08-02-2025

Basic Principles of LP Analysis


Parameters of this model are:
• Voiced/Unvoiced classification
• Pitch period for voiced speech
• Gain parameter 𝐺
• Coefficients 𝑎 of the digital filter
• These parameters vary slowly with time
• This simplified all-pole model is a natural representation of non-
nasal voiced sounds
• For nasals and fricatives, the acoustic theory calls for both poles
and zeros in the VT transfer function
• However if order 𝑝 is high enough, an all-pole model would be a
good representation for almost all sounds of speech

Basic Principles of LP Analysis


• From the system in Fig. 8.1, the speech samples 𝑠 𝑛 are related to the
excitation 𝑢(𝑛) by the simple difference equation
𝑠 𝑛 =∑ 𝑎 𝑠(𝑛 − 𝑘) + 𝐺𝑢(𝑛) - (2)
• A linear predictor with prediction coefficients, 𝛼 is defined as a system whose
o/p is
𝑠̃ 𝑛 = ∑ 𝛼 𝑠(𝑛 − 𝑘) - (3)
• The system function of a 𝑝 order linear predictor is
𝑃 𝑧 =∑ 𝛼 𝑧 - (4)
• The prediction error, 𝑒(𝑛) is defined as
𝑒 𝑛 = 𝑠 𝑛 − 𝑠̃ 𝑛 = 𝑠 𝑛 − ∑ 𝛼 𝑠(𝑛 − 𝑘) - (5)

3
08-02-2025

Basic Principles of LP Analysis


• The prediction error sequence is the o/p of a system whose transfer function
is
𝐴 𝑧 = 1−∑ 𝛼 𝑧 - (6)
• Comparing Eqs.(2) and (5) if the speech signal obeys the model of Eq.(2) and
if 𝛼 = 𝑎 , then 𝑒 𝑛 = 𝐺𝑢(𝑛)
• The prediction error filter, 𝐴(𝑧), will be an inverse filter for the system, 𝐻(𝑧)
of Eq.(1) i.e.,
𝐻 𝑧 = - (7)
( )
• The basic problem of LPC is to determine the coefficients 𝛼 directly from
the speech signal to obtain a good estimate of spectral properties of speech
signal through the use of Eq.(7)

Basic Principles of LP Analysis


• Because of the time-varying nature of the speech signals the predictor
coefficients must be estimated from short segments of speech signal
• The basic approach is to find a set of predictor coefficients that will minimize
the mean-squared prediction error over a short segment of the speech
waveform
• The resulting parameters are then assumed to be the parameters of the system
function, 𝐻(𝑧), in the model for speech production
• The short-time average prediction error is defined as
𝐸 = ∑ 𝑒 (𝑚) - (8)
= ∑ (𝑠 (𝑚) − 𝑠̃ (𝑚)) - (9)
=∑ 𝑠 𝑚 −∑ 𝛼 𝑠 (𝑚 − 𝑘) - (10) from Eq.(3)
where 𝑠 (𝑚) is a segment of speech that has been selected in the vicinity of
sample 𝑛, i.e., 𝑠 𝑚 = 𝑠(𝑚 + 𝑛) - (11)

4
08-02-2025

Basic Principles of LP Analysis


• The averaging constant in the said equations is irrelevant and hence omitted
• The values of 𝛼 can be found that minimize 𝐸 in Eq.(10) by setting =
0, 𝑖 = 1,2, … , 𝑝 thereby obtaining
∑ 𝑠 𝑚−𝑖 𝑠 𝑚 =∑ 𝛼 ∑ 𝑠 𝑚−𝑖 𝑠 𝑚−𝑘 1≤𝑖≤𝑝 - (12)
where 𝛼 are the values of 𝛼 that minimize 𝐸 (since 𝛼 is unique, drop the
caret and use the notation 𝛼 to denote the values that minimize 𝐸 )
• If we define,
𝜙 𝑖, 𝑘 = ∑ 𝑠 𝑚 − 𝑖 𝑠 𝑚 − 𝑘 - (13)
Then Eq.(12) is written more compactly as,
∑ 𝛼 𝜙 (𝑖, 𝑘) = 𝜙 𝑖, 0 𝑖 = 1,2, … . , 𝑝 - (14)

Basic Principles of LP Analysis


• This set of 𝑝 equations in 𝑝 unknowns can be solved in an efficient manner for
the unknown predictor coefficients 𝛼 that minimize the average squared
prediction error for the segment 𝑠 𝑚
• Using Eqs.(10) and (12), the minimum mean-squared prediction error can be
shown to be
𝐸 =∑ 𝑠 𝑚 −∑ 𝛼 ∑ 𝑠 (𝑚)𝑠 (𝑚 − 𝑘) - (15)
and using Eq.(14) 𝐸 can be expressed as
𝐸 = 𝜙 0,0 − ∑ 𝛼 𝜙 (0, 𝑘) - (16)
• Thus the total minimum error consists of a fixed component and a component
which depends on the predictor coefficients
• To solve for the optimum predictor coefficients, firstly the quantities 𝜙 𝑖, 𝑘
for 1 ≤ 𝑖 ≤ 𝑝 and 0 ≤ 𝑘 ≤ 𝑝 must be computed
• Secondly the Eq.(14) need to be solved for 𝛼

5
08-02-2025

Autocorrelation Method
• One approach to determine the limits on the sums in Eqs.(8) - (10) and
Eq.(12) is to assume that the waveform segment, 𝑠 (𝑚) ranges over 0 ≤ 𝑚 ≤
𝑁 − 1 otherwise expressed as,
𝑠 𝑚 = 𝑠 𝑚 + 𝑛 𝑤(𝑚) - (17)
where 𝑤(𝑚) is a finite length window over the interval 0 ≤ 𝑚 ≤ 𝑁 − 1
• Now the prediction error, 𝑒 𝑚 , for a 𝑝 order predictor will be defined over
the interval 0 ≤ 𝑚 ≤ 𝑁 − 1 + 𝑝 and hence the short-time average prediction
error is,
𝐸 =∑ 𝑒 (𝑚) - (18)
• Limits on the expression for 𝜙 𝑖, 𝑘 in Eq.(13) are identical to those of
Eq.(18). Hence,
𝜙 𝑖, 𝑘 = ∑ 𝑠 (𝑚 − 𝑖)𝑠 (𝑚 − 𝑘) - (19a)

Autocorrelation Method
• Eq.(19) can be expressed as,
( )
𝜙 𝑖, 𝑘 = ∑ 𝑠 (𝑚)𝑠 (𝑚 + 𝑖 − 𝑘) - (19b)
• From the above equation, 𝜙 𝑖, 𝑘 is identical to short-time autocorrelation
function evaluated for (𝑖 − 𝑘). That is,
𝜙 𝑖, 𝑘 = 𝑅 𝑖 − 𝑘 - (20)
where,
𝑅 𝑘 =∑ 𝑠 (𝑚)𝑠 (𝑚 + 𝑘) - (21)
• Since 𝑅 𝑘 is an even function,
, ,…,
𝜙 𝑖, 𝑘 = 𝑅 𝑖−𝑘 , ,…,
- (22)

6
08-02-2025

Autocorrelation Method
• Therefore Eq.(14) is expressed as,
∑ 𝛼 𝑅 𝑖−𝑘 =𝑅 𝑖 1≤𝑖≤𝑝 - (23)
• Similarly the minimum mean squared prediction error of Eq.(16) is of form
𝐸 =𝑅 0 −∑ 𝛼 𝑅 (𝑘) - (24)
• Eq.(23) can be expressed in matrix form as,

• The 𝑝 × 𝑝 matrix of autocorrelation values is a Toeplitz matrix, i.e., it is


symmetric and all the elements along the diagonal are equal

Covariance Method
• The second basic approach is to define the speech segment 𝑠 𝑚 and the
limits of the sums to fix the interval over mean-squared prediction error and
then to consider the effect on 𝜙 𝑖, 𝑘
• Define,
𝐸 =∑ 𝑒 (𝑚) - (26)
and hence 𝜙 𝑖, 𝑘 becomes,
𝜙 𝑖, 𝑘 = ∑ 𝑠 (𝑚 − 𝑖)𝑠 (𝑚 − 𝑘) - (27)
• If the index of summation is changed 𝜙 𝑖, 𝑘 can be expressed as,
𝜙 𝑖, 𝑘 = ∑ 𝑠 (𝑚)𝑠 (𝑚 + 𝑖 − 𝑘) - (28a)
or 𝜙 𝑖, 𝑘 = ∑ 𝑠 (𝑚)𝑠 (𝑚 + 𝑘 − 𝑖) - (28b)

7
08-02-2025

Covariance Method
• Eqs.(28) look similar to Eq.(19b) but not the same
• Eqs.(28) call for values of 𝑠 𝑚 outside the interval 0 ≤ 𝑚 ≤ 𝑁 − 1
• Evaluation of 𝜙 𝑖, 𝑘 for all values of 𝑖, 𝑘 requires the segment 𝑠 𝑚 interval
to change to −𝑝 ≤ 𝑚 ≤ 𝑁 − 1
• This approach is similar to the modified autocorrelation function and as
pointed out earlier that the approach leads to a function which is not an
autocorrelation but cross-correlation between two very similar but not
identical finite length segments of speech
• Hence the Eq.(14) which in this case contain cross-correlation values may be
used to compute the prediction coefficients
∑ 𝛼 𝜙 (𝑖, 𝑘) = 𝜙 𝑖, 0 𝑖 = 1,2, … . , 𝑝 - (14)

Covariance Method
• Eq.(14) can be expressed in matrix form as,

• In this case, since 𝜙 𝑖, 𝑘 = 𝜙 𝑘, 𝑖 , the 𝑝 × 𝑝 matrix of cross-correlation


values is symmetric but not Toeplitz
• The diagonal elements are related by the equation,
𝜙 𝑖 + 1, 𝑘 + 1 = 𝜙 𝑖, 𝑘 + 𝑠 −𝑖 − 1 𝑠 −𝑘 − 1 − 𝑠 𝑁 − 1 − 𝑖 𝑠 𝑁 − 1 − 𝑘 - (30)
• The matrix {𝜙 𝑖, 𝑘 } has the properties of covariance matrix

8
08-02-2025

Comparison of LPC solution methods


Considered issues are:
1. Computations
2. Stability
Two main issues in computation of predictor coefficients:
1. Storage
2. No. of multiplications
• In autocorrelation method, roots of 𝐴(𝑧) lie inside the unit circle and hence
𝐻(𝑧) is stable
• This may not be guaranteed if the autocorrelation is computed without
sufficient accuracy and the round-off encountered in computing
autocorrelation will ill condition the matrix
• These undesirable effects can be overcome by pre-emphasizing the speech to
the make its spectrum as flat as possible

Comparison of LPC solution methods


• The necessary and sufficient condition for stability using Durbin algorithm is
−1≤𝑘 ≤1
• For covariance method, the stability of the predictor polynomial is not
guaranteed. If the no. of samples in the frame is large, both autocorrelation
and covariance methods yield identical results
• Lattice method guarantees stability as the predictor coefficients are obtained
from partial correlation coefficients. Stability is preserved even when the
computation is performed using finite word length computations

9
08-02-2025

Comparison of LPC solution methods

Applications of LPC
Pitch detection using LPC:

10
08-02-2025

Pitch detection using LPC


• A method proposed by Markel is used for pitch detection called the SIFT
(Simple Inverse Filtering Tracking) method
• Here the i/p signal 𝑠(𝑛) is lowpass filtered with a cutoff frequency of about
900Hz and then the sampling rate of 10KHz is reduced to 2KHz by
decimating (dropping 4 out of 5 samples)
• The decimated o/p is analyzed using LPC with a filter order of 𝑝 = 4
• This forth order filter is sufficient to model the signal spectrum in the
frequency range of 0-1KHz as there will be only 1-2 formants in this range
• The signal is then inverse filtered to give 𝑦(𝑛) which is approximately a flat
spectrum. Hence the purpose of LPC here is to spectrally flatten the signal
• Short-time autocorrelation is performed and the largest peak in the
appropriate range is chosen as the pitch period

Pitch Detection using LPC


• To get additional resolution in the value of pitch,
the autocorrelation function is interpolated in
the region of the max value
• An unvoiced classification is chosen when the
level of the autocorrelation peak falls below a
given threshold
• This method is not applicable for high pitched
speakers (children) as the spectral flattening is
unsuccessful due to lack of more than one pitch
harmonic in the band from 0 to 900Hz (esp. for
telephone line i/ps)
• For such speakers and transmission conditions,
other pitch detection methods may be used

11
08-02-2025

Formant Analysis using LPC


Formants can be estimated in the following two ways:
1. Factor the predictor polynomial and based on the roots, decide which
correspond to formants and which correspond to spectral shaping poles
2. Obtain the spectrum and choose the formants by peak picking method
Advantage of formant analysis using LPC:
• In the first method, as the predictor order 𝑝 is chosen a priori, the max
possible no. of complex conjugate poles obtained is 𝑝/2
• Labelling need to be done as which pole correspond to which formant and
this is less complicated as there are fewer no. of poles compared to cepstral
smoothing method
• The extraneous poles are easily isolated in LPC analysis as their bandwidths
are often very large compared to a typical formant bandwidth

Formant Analysis using LPC


Disadvantages:
• An all-pole model is used to model the speech spectrum
• For nasals and nasalized vowels, though the analysis is adequate in spectral
matching ability, the significance of the roots of the predictor polynomial is
not clear
• It is not clear if the roots correspond to nasal zeros or additional nasal poles or
the expected resonances of vocal tract
• Although the bandwidth of the root is determined, it is not clear how it is
related to the actual formant bandwidth as it is sensitive to frame duration,
position and method of analysis

12

You might also like