0% found this document useful (0 votes)
56 views

Speech Coding With Wavelet Packet Excitation Signal Compression

Di gi tal Speech Codi ng I s I mportant probl emi n the area of network speech appl I cati ons where there is a necessi ty of bi g amount of data through l I mi t ed bandwi dth channel s. Speech coders are used For achi evement of good compressi on rates and work wi th speech model s the

Uploaded by

ptkn
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Speech Coding With Wavelet Packet Excitation Signal Compression

Di gi tal Speech Codi ng I s I mportant probl emi n the area of network speech appl I cati ons where there is a necessi ty of bi g amount of data through l I mi t ed bandwi dth channel s. Speech coders are used For achi evement of good compressi on rates and work wi th speech model s the

Uploaded by

ptkn
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

2 8

Speech Codi ng wi th Wavel et Packet Exci tati on


Si gnal Compressi on*
Atanas Gotchev, Elena Rangelova, Zdravko Nikolov
Institute of Information Technologies, 1113 Sofia
I . I ntroducti on
Di gi tal speech codi ng i s i mportant probl emi n the area of network speech appl i cati ons
where there i s a necessi ty of t ransmi ssi on of bi g amount of data through l i mi t ed
bandwi dth channel s. I nvesti gati ons i n thi s f i el d ai mat fi ndi ng newmethods f or si gnal
compression and compati bi l i ty wi th the network transfer protocols.
Di gi tal speech codi ng i nvol ves sampl i ng and ampl i tude quanti zati on of the speech
si gnal usi ng mi ni mumnumber of bi ts, whi l e preservi ng the qual i ty of reconstructed
speech.
For achi evement of good compressi on rates three basi c manners i n speech coders
are used:
+ representati on of frequency- domai n characteri sti c of speech si gnal
+ provi di ng wavef ormcoi nci dence
+ coder s opti mi zati on accordi ng to perceptual properti es of the ear
Accordi ng t o the codi ng mechani smthere are t wo t ypes of syst ems: waveform
coders and vocoders. The l atter achi ve better compressi on rates and work wi th speech
model s the most f amous type bei ng l i near predi cti ve vocoders.
I I . Li near-predi cti ve vocoders wi th di f ferenti ated gl ottal wave exci tati on
Linear prediction (LP) i s the most wi del y used method of speech processi ng duri ng the
l ast 20 years. I t i s based on Fant s l i near voi ce producti on model [1]. Accordi ng to thi s
model sounds are generated by vocal tract exci tati on fromthe source si gnal . The l atter
i s a peri odi c sequence of i mpul ses f or voi ced sounds and a randomnoi se f or unvoi ced
sounds. The vocal tract i s model ed as an al l - pol e system; glottal model i s represented
as two-pol e l ow-pass fi l ter and the lip radiation as di fferenti al uni t (Fi g. 1a). The system
* Thi s research was supported by the Nat i onal Sci ence Fund under grant No I 625/ 96.
. BULGARIAN ACADEMY OF SCIENCES
, 50
PROBLEMS OF ENGI NEERI NG CYBERNETI CS AND ROBOTI CS, 50
. 2000 . Sofia
2 9
can be reduced to an al l - pol e model by cancel l ati on of one pol e by one zero f roml i p
radi ati on model (Fi g. 1b). Fi gures 1c and 1d shows generati on mechani smf or voi ced
and unvoi ced speech accordi ng to the descri bed model . Cl assi cal l i near predi cti on
method esti mates vocal tract parameters (LP coef f i ci ents). Accordi ng to shape of A(z)
fi l ter and l ength of data segment there are di ff erent sets of coeffi ci ents obtai ned usi ng
di f f erent methods. For l ong segments there are autocorellation and rel ated to them
reflection coefficients [2]. For short speech segments there are al gori thms for generati on
of covariance coefficients [2]. There are di fferent representati ons of these coeffi ci ents
that are l ess sensi ti ve to quanti zati on errors, l i ke Log Area Rati os (LAR) and Li ne
Spectrum Pairs (LSP). Estimation of 814 LP parameters is usually enough for good
representati on of vocal tract.
Systems wi th di f f erent grades of compl exi ty accordi ng to codi ng of excitation
source have been proposed. I ni ti al l y the source was represented by two- state scheme:
pul se sequences or whi t e noi se. For exampl e accordi ng t o t he Ameri can Federal
Standard FS1015 [3] 10 LP vocal tract parameters are exci ted by a source represented
by gai n parameters, pi tch and f l ag f or segment type (voi ced/ unvoi ced). Mai n drawback
of thi s method i s wrong cl assi fi cati on f or voi ced/ unvoi ced segment. Perf ect generator
i s residual signal obtai ned as a di f f erence between speech si gnal and i ts LP model .
Resi dual si gnal carri es al l i nformati on that has not been captured by LP anal ysi s: phase,
pi tch, zeros due to nasal sounds, etc. Vocoders wi th si mi l ar source are cal l ed Resi dual
Exci t ed Li near Pr edi ct i on ( RELP) whi ch oper at e among 6 and 9. 6 KB per s.
Decreasi ng of bi t rate i s achi eved by down- sampl i ng of the resi dual si gnal and i t s
bandwi dt h i s rest ri cted t o 800 Hz. However thi s decreases t he speech qual i t y.
Exci tati on model s deri ved by f eedback l oops known as anal ysi s- by- synthesi s scheme
are proposed to avoi d thi s probl em. Two LPs wi th l ong- termand short- termperi ods,
represent the pi tch and the formant structure respecti vel y. Wei ght f i l ter W(z) di stort
error so that the quanti zati on noi se be masked by the hi gh energy formants. Exci tati on
source f orms or sel ects f romdi cti onary exci tati on sequence so that the Mean Squared
Glottal
model
Vocal
tract
Spectral
correction
Lip
Radiation
V/UV.
excitation
Synthesized
speech
(a)
Gain
V/UV.
excitation

Synthesized
speech
(b)
A(z)
+
+
time
time
time
time
time
time
(c)
(d)
Voiced
Unvoiced
*
*
=
=
Fi g. 1
3 0
Error (MSE) be mi ni mi zed. Thi s scheme has been proposed f i rst by [ 5] and i t i s a
mi l est one i n speech codi ng and provi ded a bi g qual i ty i mprovement. Accordi ng to
exci tati on model s there are: Multipulse excitation (MPE), regular pulse excitation (RPE)
[6] and code excitation (CELP) [7].
An al ternati ve approach f or speech codi ng ai ms at exact model i ng of vocal tract
f eatures and usi ng themi n i nverse f i l teri ng (I F) f or esti mati on of gl ottal wavef orm
(GW) whi ch represents the f uncti on of the voi ce source [8]. I t gi ves i nf ormati on f or
phonati on type, emoti onal status and other i ndi vi dual speaker s characteri sti cs [9].
Restorati on of coded speech si gnal usi ng exci tati on si gnal that i s cl ose to gl ottal
wavef orml eads t o more natural l y soundi ng. Thi s approach al l ows separat i on of
phonati on and speech qual i ty of di f f erent speakers [10]. Transmi ssi on (or storage) of
exact GWrequi res addi ti onal resources. A Di f f erenti ated Gl ottal Wave (DGW) i s used
i n speech encodi ng tasks. I t represents voi ce source f uncti on and l i p radi ati on. I ts
typi cal shape i s shown on Fi g. 2.
The most i mportant i nstants are gl ottal cl osure i nstant (GCI ) and gl ottal openi ng
i nstant (GOI ). Between these i nstants the gl otti s i s cl osed and vocal tract systemi s i n
free osci l l ati ng state. The exact determi nati on of these i nstants i s very i mportant f or
adequate vocal tract model i ng [11].
I I I . Compressi on of DGWwi t h wavel et packet s
Wavel et packets and wavel et packet transf orm( WPT)
Wavel et packet (WP) w i s an i ntegrabl e f uncti on wi th f i ni te energy, zero mean and wel l
l ocal i zed i n both space and f requency. I t may be assi gned three parameters: scal e (ti me
uncertai nty), frequency and ti me posi ti on.
Fast wavel et packets can be def i ned by a pai r of quadrature mi rror f i l ters (QMF)
[12].
Let h={h
j
} i s a l ow- pass fi l ter wi th the f ol l owi ng properti es:
(a) f or c>0
j
| h
j
| | j|
c
< ;
(1) (b) h
2j+i
= 1/ \

2 f or i =0 and 1 ;
(c) h
j
h
j+2k
= o
k
, where o i s t he Kronecker symbol .
The (a) property i s rel ated to fi l ter coeff i ci ents decay whi l e (b) and (c) concern
i ts orthogonal i ty.
Let g={g
j
} i s def i ned by:
(2) g
j
=( 1)
1j
h
1j
.
Two sequences const i t ut e QMF pai r. These f i l t ers avoi d def i ni t i on of t wo
convol uti on-deci mati on operators:
(3) H x(t) =
j
h
j
x (2t j) and G x(t) =
j
g
j
x (2t j)
and thei r adj oi nts:
GOI GCI
GOI
Fi g. 2
3 1

1 | t j | 1
| t j |
(4) H*x( t) =
j
h
j
x + | and G*x( t) =
j
g
j
x + |

2 \ 2 2 .

2
\ 2 2 .
Let suppose that the sequences h and g are f i ni te and def i ne:
=l i m
n
H
n
S
where S i s i ndi cator functi on of [1/ 2, 1/ 2]. Thi s i s the onl y fi xed poi nt of the equati on
=H . A f ast wavel et packet i s t he i mage of under any f i ni te composi ti on of H
and G, possi bl y transl ated by an i nteger and uni tary di l ated by an i nteger power of 2.
Al l wavel et packets are orthogonal to thei r di l ated and transl ated versi ons. Order the
f requency, scal e and posi t i on paramet ers of wavel et packet s w
f,s,p
we can wri t e
w
0, 0, 0
(t)= f(t); w
2f,0, 0
(t)=Hw
f,0,0
(t); w
2f+1, 0, 0
(t)= G w
f,0, 0
(t), etc.
Wavel et packets al l owapproxi mati on of conti nuous f uncti on xeL
2
(R) to accuracy
O(2
L
) by the l
2
sequence of i nner products . We can recursi vel y compute f romthese
the other WP coef fi ci ents as fol l ows:
( x, w
2 f, s+1, p
) =
j
h
j
( x, w 2
f, s,2p+j
),
(5)
( x, w
2 f+1, s+1, p
) =
j
g
j
( x, w 2
f, s,2p+j
).
Operators H and G and thei r adj oi nts ref er to the di screte sequences (si gnal s) too:
H: l
2
l
2
, H x
n
=
j
h
j
x
2n+j
,
(6)
G: l
2
l
2
, G x
n
=
j
g
j
x
2n+j
.
Wavel et packets f orma di cti onary of basi s f uncti ons. Thei r approxi mati ons by 2
N
vectors i n R
N
f orma set of N l gN vectors. Vectors and thei r coef f i ci ents are di sposed
i n a bi nary tree nodes. Nodes of one l evel correspond t o one scal e and di f f er by
frequency l ocal i zati on, and coeffi ci ents i n a si ngl e node di ffer by ti me posi ti on (Fi g. 3).
Fi g. 3
Each node i s an orthogonal sumof i ts sons. We can obtai n a basi s f romdi cti onary
by connecti on of tree branches. Di fferent bases di cti onari es are deri ved f romdi f ferent
QMF pai rs, whi ch f orma WP basi s library.
The best bases sel ecti on i s carri ed out by mi ni mi zati on of addi ti ve i nf ormati on
measure for al l bases of a di cti onary i n R
N
. Usual l y the measure i s of entropy type [12].
Such a procedure can be repeated f or the bases l i brary.
Wavel et packets are wi del y used i n si gnal compressi on because of good l ocal i za-
ti on and possi bi l i ty of opti mal decomposi ti on choi ce. Compressi on i s achi eved by si gnal
reconstructi on usi ng the k bi ggest i n absol ute val ue WP coef f i ci ents. Thi s approxi ma-
3 2
ti on i s opti mal i n mean square sense. I f the anal yzed si gnal has marked pecul i ari ty then
more coef f i ci ents are needed and the opposi te i s val i d f or f l at si gnal s. I n thi s way WP
packets f ocus to si gni f i cant parts of the si gnal i n i nf ormati on sense. A threshol d
sel ecti on to el i mi nate non si gni f i cant WP coef fi ci ents i s very i mportant.
The mi ni mumdescri pti on l ength pri nci pl e
Di f ferenti ated gl ottal waves are characteri zed by abrupt transi ti ons round to cl osure
i nstances and comparati vel y sl anti ng secti ons i n cl osed phase. Hence we can expect that
wavel et transf orms can represent DGWby f ewcoef f i ci ents due to thei r capabi l i ty of
si ngul ari ty detecti on.
Let us consi der DGWas a di screte model of signal-noise mi xture:
(7) y =x+c, wher e y, x, c e R
N
, N=2
n
0
.
The vector y represents the noi sy observed si gnal , x i s i nformati on si gnal , c i s whi te
Gaussi an noi se wi th unknown vari ance o
2
:
(8) c ~ N ( 0, o
2
I) .
Noi se component i s generated by i nadequacy of vocal tract model or roundi ng
errors.
We can gener at e a l i br ar y of m or t hogonal WP bases: o= {A
1
,A
2
,A
3
,...,A
m
}
where A
1
,A
2
,A
3
,...,A
m
di f f er by t ype of QMF s and compr i se t he best basi s f rom
di cti onary m.
We suppose the si gnal can be compl etel y represented by k coef f i ci ents of a basi s
A
m
.
(9) x =W
m
o
m
( k)
,
where W
m
e R
N N
i s an orthogonal matri x whose col umns are the basi s vect ors of A
m
,
and o
m
(k)
e R
N)
i s the vector of expansi on coef f i ci ents wi th onl y k non- zero el ements.
I n the expressi on (9) the actual val ues of k and m are not known.
The i dea f or determi nat i on of k and m by si mul taneous noi se suppressi on and
si gnal compressi on al gori thmi s devel oped by S a i t o i n [13]. One of the most sui tabl e
cri teri a for thi s purpose i s the so-cal l ed Minimum Description Length Principle (MDLP)
[14]. Accordi ng to the l atter, mi ni mal l ength of descri pti on of numbers or vectors, i . e.
codel ength i n bi ts i s found. I n the Sai to al gori thmcodel engths for representati on of the
al l components of model (7) are esti mated.
Let assume L as the operator f or determi nati on of codel ength. Total codel ength
i s composed of the f ol l owi ng terms.
1. Codel ength of the i ntegers k and m: L(k,m);
2. Codel ength of a k number real coef f i ci ents of the best basi s: L(a
.
m
(k)
, k, m);
3. Codel ength of the noi se vari ance esti mati on: L(o
.
2
, k, m); ;
4. Codel ength of t he devi ati on of the observed si gnal y f romt he esti mated
si gnal x (9): L(y , o
.
2
, k, m).
The total codel ength to mi ni mi ze i s:
(10) L( y, a
.
m
( k)
, o
.
2
, k, m) =L( k, m) +L ( a
.
m
( k)
, o
.
2
, k, m) +L( y , a
.
m
( k)
, o
.
2
, k, m) .
By assumpti on of whi te Gaussi an noi se i t can be seen that maxi mal l i kel i hood
esti mati on of vari ance i s obtai ned by sumof the N k squared l east coef f i ci ents [13]:
(11) o
.
2
=( 1/ N) ,, o
m
( N)
o
m
( k)
,,
2
.
Terms anal ysi s by MDLP l ead to the f ol l owi ng expressi on :
3 3
(12) L( k*, m*) =mi n ( ( 3/ 2) k l g N +( N/ 2) l g ,,o
m
( N)
o
m
( k)
,,
2
).
0sk<N
0smsM
Mi ni mi zi ng of the l atter by f i ndi ng the best k* and m* si mul taneousl y. Recon-
structed si gnal i s obtai ned by:
(13) x
.

( k)
=W
m*
o
m*
( k*)
.
I V. Experi menti onal resul ts
For present i nvesti gati ons compactl y supported wavel ets, whi ch are represented by
fi ni te l ength fi l ters, are used. Basi s l i brary consi sts of Daubechi es wavel et fami l y, l ess
asymmetri c wavel ets and coi f l ets [15, 16]
We appl y the method descri bed i n secti on I I I (equati ons (10)(13). The entropy
mi ni mumi s used as the best basi s cri teri a. Each QMF pai r f romthe l i brary l eads to
decomposi ti on upon the bases di cti onary. Fromthe current di cti onary (wi th number m)
the mi ni mumentropy basi s i s sel ected. The obtai ned basi s determi nes the val ue of k
that mi ni mi zes expressi on (12). Passi ng thought al l the bases f romthe l i brary we can
obtai n the (k*,m*) pai r, where k* i s the number of essenti al coef f i ci ents and m* i s the
number of bases.
Synthesi zed si gnal s
There are a set of 16 DGW(each of l ength 512 sampl es) obtai ned af ter I F of synthesi zed
vowel s /a/, /e/, /i / and/u/ [8].
Resul ts of the processi ng of synthesi zed si gnal s by the method based on MDLP
are di spl ayed i n Tabl e 1. The number of the essenti al coef f i ci ents k* i s shown.
Tabl e 1 Tabl e 2
Si gnal Wavel et k* RMSE Si gnal Wavel et k* RMSE
x 10- 3
mei 1 S20 30 8.5 /a/ D20 94 0.1
mei 2 S10 24 8.7 /e/ S6 78 0.2
bab1 D16 11 14.8 /i/ S5 97 0. 19
bab2 D14 12 13.3 /u/ S5 159 0.1
babi 3 S10 28 10.3 /aa/ C4 96 0.2
babi 4 S10 29 6.2 /ee/ D14 79 0.2
/ii/ S8 105 0.4
/uu/ S4 156 0.6
/uu1/ D8 118 0.9
/ii1/ S4 78 1.9
The compar i son r eveal s t hat wavel et packet s repr esent at i on of DGWuses f ew
coeff i ci ents. Due to the hi gher frequency resol uti on i n scal es more eff i ci ent groupi ng
of i nf ormati on contents i n basi s vectors i s achi eved. Entropy as an i nformati on measure
l eads to best basi s f i ndi ng too. Mi ni mumdescri pti on l ength pri nci pl e combi nes codi ng
wi th noi se suppressi on wi thout the necessi ty of separate noi se esti mati on.
3 Probl ems of Engi neeri ng Cyberneti cs and Roboti cs, 50
3 4
Natural si gnal s
The database consi sts of si x DGW(l enght of the anal yzed segments i s 512 sampl es),
obtai ned vi a I F of voi ced sounds of two speakers (mal e and f emal e). Si gnal s are shown
i n Tabl e 2 t oget her wi th root mean squared error (RMSE) between DGWaf ter I F and
DGWaf ter wavel et packet reconstructi on f romreduced set of coef f i ci ents. The l atter
i s vastl y l ess than number of coeffi ci ents obtai ned by synthesi zed si gnal s anal ysi s. Mean
squared errors are comparabl e i n the two cases because of the exi stence of more noi se
components i n natural sounds whi ch i nf l uence on processi ng.
Fi g. 4 shows i nverse f i l tered DGW, reconstructed DGWaf ter WPT compressi on
and the di f f erence between them. Recovered si gnal i s very cl ose to the ori gi nal one and
i s achi eved by a l ownumber of coeff i ci ents. The correspondi ng gl ottal waves are al most
of the same shape accordi ng to possi bi l i ty of WP transf ormf or detecti on of l ocal
features wi th good ti me-frequency resol uti on. The best WP basi s fi nds essenti al hi gh-
f requency components too.
Fi g. 4
V. Concl usi ons
The potenti al of WPT to compress ef f ecti vel y DGWi s reported i n the present paper.
Thi s transformhas been chosen havi ng i n mi nd the possi bi l i ti es of preservi ng poi nts i n
DGWwhi ch enabl es the natural soundi ng of the reconstructed si gnal .
The resul t s of DGWcompressi on make possi bl e t he const ruct i on of l owand
medi umbi t- rate speech coders wi th equi val ent or hi gher qual i ty i n compari si on to the
present CELP coders i n si mi l ar transmi ssi on rate.
-0.50
0.00
0.50
I
n
v
e
r
s
e

f
i
l
t
e
r
e
d

D
G
W
-0.50
0.00
0.50
R
e
s
t
o
r
e
d

D
G
W
0 100 200 300
Time, samples
-0.20
0.00
0.20
R
e
s
i
d
u
a
l
3 5
R e f e r e n c e s
1. F a n t , G. Acousti c Theory of Speech Product i on. Gravenhage. The Netherl ands: Mounton and Co. , 1960.
2. Ma r k e l , J . , A. Gr a y. Li near Predi cti on of Speech. NewYork, Spri nger- Verl ag, 1976.
3. Federal Standard 1015. Tel ecommuni cati ons: Anal og to di gi tal conversi on of radi o voi ce by 4800 bi t/ second
code exci ted l i near predi cti ve codi ng, nati onal communi cati on syst em. Nati onal Communi cat i on
System- Of f i ce of Technol ogy and Standards, Nov. 1984.
4. A t a l , B. , J . Re md e. Anewmodel f or LPCexci t at i on f or produci ng natural soundi ng speech at l owbi t
rates. I n: Proc. I CASSP- 82, Apr. 1982, 614- 617.
5. S i n g h a l, S., B. A t a l. Improving the performance of multipulse coders at low bit rates. In: Proc. ICASSP
84, p. 1. 3. 1, 1984.
6. K r o o n, P. , E. De p e r e t t e, R. J . S l u y e t e r. Regul ar- pul se exci tati on - a novel approach to eff ecti ve
and efficient multipulse coding of speech. IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP
34, No 5, Oct. 1986.
7. S c h r o e d e r, M. R. , B. At a l . Code- exci ted l i near predi cti on (CELP): Hi gh qual i ty speech at very l owbi t
rates. I n: Proc. I CASSP- 85 ( Tampa, FL, Apr. 1985), p. 937.
8. F u j i s a k i , H. , M. L j u n g q v i s t. Proposal and eval uati on of model s f or t he gl ottal source wavef orm.
I n : Proc. I CASSP, 1986, 1605- 1608.
9. R o s e n b e r g, A. Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Am., 49,
1971, 583-590.
10. Ri e g e l s b e r g e r, E. L. , A. K. K r i s h n a mu r t h y. Gl ottal source esti mati on: Methods of appl yi ng
the LF. I n: Model to I nverse Fi l teri ng. Proc. I CASSP 1993, 542- 545.
11. G o t h c e v, . Determination of closure glottal instance via wavelet transform. chn. Ideas, 1995,
No 2, 28- 42, (i n Bul gari an).
12. Co i f ma n, R. R. , M. V. Wi c k e r h a u s e r. Entropy- based al gori thms f or best- basi s sel ecti on. I EEE
Trans. I nf o. Theory, vol . 38, 1992, 713- 718.
13. S a i t o, N. Si mul taneous noi se suppressi on and si gnal compressi on usi ng a l i brary of orthonormal bases and
the minimum description length criterion. In: Wavelets in Geophysics ( E. FoufoulaGeorgiou and
P. Kumar, eds. ), San Di ego, CA, Academi c Press, 1994, 299- 324.
14. R i s s a n e n, J. Universal coding, information, prediction, and estimation. IEEE Trans. Inf. Theory, 30,
1984, , 629- 636.
15. D a u b e c h i e s, I. Orthonormal bases of compactly supported wavelets. Comm. in Pure and pplied ath.,
41, 1988, 909-996.
16. Da u b e c h i e s, I . Ten l ectures of wavel ets. Phi l adel phi a, SI AM, 1992.


, ,
, 1113
( )
,

.
.

.

You might also like