Speech Coding With Wavelet Packet Excitation Signal Compression
Speech Coding With Wavelet Packet Excitation Signal Compression
2 f or i =0 and 1 ;
(c) h
j
h
j+2k
= o
k
, where o i s t he Kronecker symbol .
The (a) property i s rel ated to fi l ter coeff i ci ents decay whi l e (b) and (c) concern
i ts orthogonal i ty.
Let g={g
j
} i s def i ned by:
(2) g
j
=( 1)
1j
h
1j
.
Two sequences const i t ut e QMF pai r. These f i l t ers avoi d def i ni t i on of t wo
convol uti on-deci mati on operators:
(3) H x(t) =
j
h
j
x (2t j) and G x(t) =
j
g
j
x (2t j)
and thei r adj oi nts:
GOI GCI
GOI
Fi g. 2
3 1
1 | t j | 1
| t j |
(4) H*x( t) =
j
h
j
x + | and G*x( t) =
j
g
j
x + |
2 \ 2 2 .
2
\ 2 2 .
Let suppose that the sequences h and g are f i ni te and def i ne:
=l i m
n
H
n
S
where S i s i ndi cator functi on of [1/ 2, 1/ 2]. Thi s i s the onl y fi xed poi nt of the equati on
=H . A f ast wavel et packet i s t he i mage of under any f i ni te composi ti on of H
and G, possi bl y transl ated by an i nteger and uni tary di l ated by an i nteger power of 2.
Al l wavel et packets are orthogonal to thei r di l ated and transl ated versi ons. Order the
f requency, scal e and posi t i on paramet ers of wavel et packet s w
f,s,p
we can wri t e
w
0, 0, 0
(t)= f(t); w
2f,0, 0
(t)=Hw
f,0,0
(t); w
2f+1, 0, 0
(t)= G w
f,0, 0
(t), etc.
Wavel et packets al l owapproxi mati on of conti nuous f uncti on xeL
2
(R) to accuracy
O(2
L
) by the l
2
sequence of i nner products . We can recursi vel y compute f romthese
the other WP coef fi ci ents as fol l ows:
( x, w
2 f, s+1, p
) =
j
h
j
( x, w 2
f, s,2p+j
),
(5)
( x, w
2 f+1, s+1, p
) =
j
g
j
( x, w 2
f, s,2p+j
).
Operators H and G and thei r adj oi nts ref er to the di screte sequences (si gnal s) too:
H: l
2
l
2
, H x
n
=
j
h
j
x
2n+j
,
(6)
G: l
2
l
2
, G x
n
=
j
g
j
x
2n+j
.
Wavel et packets f orma di cti onary of basi s f uncti ons. Thei r approxi mati ons by 2
N
vectors i n R
N
f orma set of N l gN vectors. Vectors and thei r coef f i ci ents are di sposed
i n a bi nary tree nodes. Nodes of one l evel correspond t o one scal e and di f f er by
frequency l ocal i zati on, and coeffi ci ents i n a si ngl e node di ffer by ti me posi ti on (Fi g. 3).
Fi g. 3
Each node i s an orthogonal sumof i ts sons. We can obtai n a basi s f romdi cti onary
by connecti on of tree branches. Di fferent bases di cti onari es are deri ved f romdi f ferent
QMF pai rs, whi ch f orma WP basi s library.
The best bases sel ecti on i s carri ed out by mi ni mi zati on of addi ti ve i nf ormati on
measure for al l bases of a di cti onary i n R
N
. Usual l y the measure i s of entropy type [12].
Such a procedure can be repeated f or the bases l i brary.
Wavel et packets are wi del y used i n si gnal compressi on because of good l ocal i za-
ti on and possi bi l i ty of opti mal decomposi ti on choi ce. Compressi on i s achi eved by si gnal
reconstructi on usi ng the k bi ggest i n absol ute val ue WP coef f i ci ents. Thi s approxi ma-
3 2
ti on i s opti mal i n mean square sense. I f the anal yzed si gnal has marked pecul i ari ty then
more coef f i ci ents are needed and the opposi te i s val i d f or f l at si gnal s. I n thi s way WP
packets f ocus to si gni f i cant parts of the si gnal i n i nf ormati on sense. A threshol d
sel ecti on to el i mi nate non si gni f i cant WP coef fi ci ents i s very i mportant.
The mi ni mumdescri pti on l ength pri nci pl e
Di f ferenti ated gl ottal waves are characteri zed by abrupt transi ti ons round to cl osure
i nstances and comparati vel y sl anti ng secti ons i n cl osed phase. Hence we can expect that
wavel et transf orms can represent DGWby f ewcoef f i ci ents due to thei r capabi l i ty of
si ngul ari ty detecti on.
Let us consi der DGWas a di screte model of signal-noise mi xture:
(7) y =x+c, wher e y, x, c e R
N
, N=2
n
0
.
The vector y represents the noi sy observed si gnal , x i s i nformati on si gnal , c i s whi te
Gaussi an noi se wi th unknown vari ance o
2
:
(8) c ~ N ( 0, o
2
I) .
Noi se component i s generated by i nadequacy of vocal tract model or roundi ng
errors.
We can gener at e a l i br ar y of m or t hogonal WP bases: o= {A
1
,A
2
,A
3
,...,A
m
}
where A
1
,A
2
,A
3
,...,A
m
di f f er by t ype of QMF s and compr i se t he best basi s f rom
di cti onary m.
We suppose the si gnal can be compl etel y represented by k coef f i ci ents of a basi s
A
m
.
(9) x =W
m
o
m
( k)
,
where W
m
e R
N N
i s an orthogonal matri x whose col umns are the basi s vect ors of A
m
,
and o
m
(k)
e R
N)
i s the vector of expansi on coef f i ci ents wi th onl y k non- zero el ements.
I n the expressi on (9) the actual val ues of k and m are not known.
The i dea f or determi nat i on of k and m by si mul taneous noi se suppressi on and
si gnal compressi on al gori thmi s devel oped by S a i t o i n [13]. One of the most sui tabl e
cri teri a for thi s purpose i s the so-cal l ed Minimum Description Length Principle (MDLP)
[14]. Accordi ng to the l atter, mi ni mal l ength of descri pti on of numbers or vectors, i . e.
codel ength i n bi ts i s found. I n the Sai to al gori thmcodel engths for representati on of the
al l components of model (7) are esti mated.
Let assume L as the operator f or determi nati on of codel ength. Total codel ength
i s composed of the f ol l owi ng terms.
1. Codel ength of the i ntegers k and m: L(k,m);
2. Codel ength of a k number real coef f i ci ents of the best basi s: L(a
.
m
(k)
, k, m);
3. Codel ength of the noi se vari ance esti mati on: L(o
.
2
, k, m); ;
4. Codel ength of t he devi ati on of the observed si gnal y f romt he esti mated
si gnal x (9): L(y , o
.
2
, k, m).
The total codel ength to mi ni mi ze i s:
(10) L( y, a
.
m
( k)
, o
.
2
, k, m) =L( k, m) +L ( a
.
m
( k)
, o
.
2
, k, m) +L( y , a
.
m
( k)
, o
.
2
, k, m) .
By assumpti on of whi te Gaussi an noi se i t can be seen that maxi mal l i kel i hood
esti mati on of vari ance i s obtai ned by sumof the N k squared l east coef f i ci ents [13]:
(11) o
.
2
=( 1/ N) ,, o
m
( N)
o
m
( k)
,,
2
.
Terms anal ysi s by MDLP l ead to the f ol l owi ng expressi on :
3 3
(12) L( k*, m*) =mi n ( ( 3/ 2) k l g N +( N/ 2) l g ,,o
m
( N)
o
m
( k)
,,
2
).
0sk<N
0smsM
Mi ni mi zi ng of the l atter by f i ndi ng the best k* and m* si mul taneousl y. Recon-
structed si gnal i s obtai ned by:
(13) x
.
( k)
=W
m*
o
m*
( k*)
.
I V. Experi menti onal resul ts
For present i nvesti gati ons compactl y supported wavel ets, whi ch are represented by
fi ni te l ength fi l ters, are used. Basi s l i brary consi sts of Daubechi es wavel et fami l y, l ess
asymmetri c wavel ets and coi f l ets [15, 16]
We appl y the method descri bed i n secti on I I I (equati ons (10)(13). The entropy
mi ni mumi s used as the best basi s cri teri a. Each QMF pai r f romthe l i brary l eads to
decomposi ti on upon the bases di cti onary. Fromthe current di cti onary (wi th number m)
the mi ni mumentropy basi s i s sel ected. The obtai ned basi s determi nes the val ue of k
that mi ni mi zes expressi on (12). Passi ng thought al l the bases f romthe l i brary we can
obtai n the (k*,m*) pai r, where k* i s the number of essenti al coef f i ci ents and m* i s the
number of bases.
Synthesi zed si gnal s
There are a set of 16 DGW(each of l ength 512 sampl es) obtai ned af ter I F of synthesi zed
vowel s /a/, /e/, /i / and/u/ [8].
Resul ts of the processi ng of synthesi zed si gnal s by the method based on MDLP
are di spl ayed i n Tabl e 1. The number of the essenti al coef f i ci ents k* i s shown.
Tabl e 1 Tabl e 2
Si gnal Wavel et k* RMSE Si gnal Wavel et k* RMSE
x 10- 3
mei 1 S20 30 8.5 /a/ D20 94 0.1
mei 2 S10 24 8.7 /e/ S6 78 0.2
bab1 D16 11 14.8 /i/ S5 97 0. 19
bab2 D14 12 13.3 /u/ S5 159 0.1
babi 3 S10 28 10.3 /aa/ C4 96 0.2
babi 4 S10 29 6.2 /ee/ D14 79 0.2
/ii/ S8 105 0.4
/uu/ S4 156 0.6
/uu1/ D8 118 0.9
/ii1/ S4 78 1.9
The compar i son r eveal s t hat wavel et packet s repr esent at i on of DGWuses f ew
coeff i ci ents. Due to the hi gher frequency resol uti on i n scal es more eff i ci ent groupi ng
of i nf ormati on contents i n basi s vectors i s achi eved. Entropy as an i nformati on measure
l eads to best basi s f i ndi ng too. Mi ni mumdescri pti on l ength pri nci pl e combi nes codi ng
wi th noi se suppressi on wi thout the necessi ty of separate noi se esti mati on.
3 Probl ems of Engi neeri ng Cyberneti cs and Roboti cs, 50
3 4
Natural si gnal s
The database consi sts of si x DGW(l enght of the anal yzed segments i s 512 sampl es),
obtai ned vi a I F of voi ced sounds of two speakers (mal e and f emal e). Si gnal s are shown
i n Tabl e 2 t oget her wi th root mean squared error (RMSE) between DGWaf ter I F and
DGWaf ter wavel et packet reconstructi on f romreduced set of coef f i ci ents. The l atter
i s vastl y l ess than number of coeffi ci ents obtai ned by synthesi zed si gnal s anal ysi s. Mean
squared errors are comparabl e i n the two cases because of the exi stence of more noi se
components i n natural sounds whi ch i nf l uence on processi ng.
Fi g. 4 shows i nverse f i l tered DGW, reconstructed DGWaf ter WPT compressi on
and the di f f erence between them. Recovered si gnal i s very cl ose to the ori gi nal one and
i s achi eved by a l ownumber of coeff i ci ents. The correspondi ng gl ottal waves are al most
of the same shape accordi ng to possi bi l i ty of WP transf ormf or detecti on of l ocal
features wi th good ti me-frequency resol uti on. The best WP basi s fi nds essenti al hi gh-
f requency components too.
Fi g. 4
V. Concl usi ons
The potenti al of WPT to compress ef f ecti vel y DGWi s reported i n the present paper.
Thi s transformhas been chosen havi ng i n mi nd the possi bi l i ti es of preservi ng poi nts i n
DGWwhi ch enabl es the natural soundi ng of the reconstructed si gnal .
The resul t s of DGWcompressi on make possi bl e t he const ruct i on of l owand
medi umbi t- rate speech coders wi th equi val ent or hi gher qual i ty i n compari si on to the
present CELP coders i n si mi l ar transmi ssi on rate.
-0.50
0.00
0.50
I
n
v
e
r
s
e
f
i
l
t
e
r
e
d
D
G
W
-0.50
0.00
0.50
R
e
s
t
o
r
e
d
D
G
W
0 100 200 300
Time, samples
-0.20
0.00
0.20
R
e
s
i
d
u
a
l
3 5
R e f e r e n c e s
1. F a n t , G. Acousti c Theory of Speech Product i on. Gravenhage. The Netherl ands: Mounton and Co. , 1960.
2. Ma r k e l , J . , A. Gr a y. Li near Predi cti on of Speech. NewYork, Spri nger- Verl ag, 1976.
3. Federal Standard 1015. Tel ecommuni cati ons: Anal og to di gi tal conversi on of radi o voi ce by 4800 bi t/ second
code exci ted l i near predi cti ve codi ng, nati onal communi cati on syst em. Nati onal Communi cat i on
System- Of f i ce of Technol ogy and Standards, Nov. 1984.
4. A t a l , B. , J . Re md e. Anewmodel f or LPCexci t at i on f or produci ng natural soundi ng speech at l owbi t
rates. I n: Proc. I CASSP- 82, Apr. 1982, 614- 617.
5. S i n g h a l, S., B. A t a l. Improving the performance of multipulse coders at low bit rates. In: Proc. ICASSP
84, p. 1. 3. 1, 1984.
6. K r o o n, P. , E. De p e r e t t e, R. J . S l u y e t e r. Regul ar- pul se exci tati on - a novel approach to eff ecti ve
and efficient multipulse coding of speech. IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP
34, No 5, Oct. 1986.
7. S c h r o e d e r, M. R. , B. At a l . Code- exci ted l i near predi cti on (CELP): Hi gh qual i ty speech at very l owbi t
rates. I n: Proc. I CASSP- 85 ( Tampa, FL, Apr. 1985), p. 937.
8. F u j i s a k i , H. , M. L j u n g q v i s t. Proposal and eval uati on of model s f or t he gl ottal source wavef orm.
I n : Proc. I CASSP, 1986, 1605- 1608.
9. R o s e n b e r g, A. Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Am., 49,
1971, 583-590.
10. Ri e g e l s b e r g e r, E. L. , A. K. K r i s h n a mu r t h y. Gl ottal source esti mati on: Methods of appl yi ng
the LF. I n: Model to I nverse Fi l teri ng. Proc. I CASSP 1993, 542- 545.
11. G o t h c e v, . Determination of closure glottal instance via wavelet transform. chn. Ideas, 1995,
No 2, 28- 42, (i n Bul gari an).
12. Co i f ma n, R. R. , M. V. Wi c k e r h a u s e r. Entropy- based al gori thms f or best- basi s sel ecti on. I EEE
Trans. I nf o. Theory, vol . 38, 1992, 713- 718.
13. S a i t o, N. Si mul taneous noi se suppressi on and si gnal compressi on usi ng a l i brary of orthonormal bases and
the minimum description length criterion. In: Wavelets in Geophysics ( E. FoufoulaGeorgiou and
P. Kumar, eds. ), San Di ego, CA, Academi c Press, 1994, 299- 324.
14. R i s s a n e n, J. Universal coding, information, prediction, and estimation. IEEE Trans. Inf. Theory, 30,
1984, , 629- 636.
15. D a u b e c h i e s, I. Orthonormal bases of compactly supported wavelets. Comm. in Pure and pplied ath.,
41, 1988, 909-996.
16. Da u b e c h i e s, I . Ten l ectures of wavel ets. Phi l adel phi a, SI AM, 1992.
, ,
, 1113
( )
,
.
.
.