Vector Quantization
Vector Quantization
I
Robert M. Gray
A vector quantizer is a system for mapping a sequence of
continuous or discrete vectorsinto a digital sequence suitable
for communication over or storage in a digital channel. The
goal of such a system i s data compression: to reduce the bit
rate so as to minimize communication channel capacity or
digital storage memory requirements while maintaining the
necessary fidelity of the data. The mapping for each vector
may or may not have memory in the sense of depending on
past actions of the coder, just as in well established scalar
techniquessuch as PCM, which has no memory,and predictive quantization, which does,Even though information
theory implies that one
can alwaysobtain better performance
by coding vectors instead of scalars, scalar quantizers have
remained by far the most common data compression system
because oftheirsimplicity
andgoodperformancewhen
is sufficiently large. In addition,
thecommunicationrate
relativelyfewdesigntechniqueshaveexistedforvector
quantizers.
During the past few years several design algorithms have
been developed for a variety of vector quantizers
and the
performance of thesecodes has beenstudied for speech
waveforms,speech linearpredictiveparameter
vectors,
images, andseveral simulatedrandom processes. It is the
purpose of this article to survey some of these design techniques and their applications.
Mathematically,ak-dimensionalmemorylessvector
quantizer or, simply, a V Q (without modifying adjectives)
consists of two mappings: an encoder y which assigns to
, x ~ - ~achannel
)
symbol
each inputvector x = (xo,xl,
y ( x ) in some channel symbol set M, and a decoder p assigning to each chanFel symbol u in M a value in a reproduction alphabet A. The channel symbol set is often
assumed to be a space of binary vectors for convenience,
e.g., M may be the set of all 2R binary R-dimensional vectors. The reproduction alphabet may or may not be the
same as the input vector
space; in particular, it may consist
of real vectors of a different dimension.
If M has M elements, then the quantity R = logz M is
called the rate ofthequantizerinbitspervectorand
r = R/k is the rate in bits per symbol or, when the input
is
a sampled waveform, bits per sample.
The application of a quantizer to
data compression is
depictedinthestandard
Fig. 1. The input datavectors
might be consecutivesamples of a waveform, consecutive
parameter vectors in a voice coding system, or consecutive rasters or subrasters in an image coding system.
For integer values of R it is useful to think of the channel
symbols,theencodedinputvectors,
as binary Rdimensional vectors. As is commonly done in informawe, assume that the chantion and communication theory,
nel is noiseless, that is, that U , = U,. While real channels
are rarely noiseless, the joint source and channel coding
a good data
theorem of information theory implies that
compression system designed for anoiseless channel can
be combined witha good error correction codingsystem
for a noisy channel in order to produce a complete
system.
In other words, the assumption of a noiseless channel is
made simplyt o focus on the problem
of data compression
system design and not to reflect any practical model.
CHANNEL
Observethatunlike
scalar quantization,general V Q
permits fractional rates in bits per sample. For example,
scalar PCM must. have a bit rate of at least 1 bit persample
while a k dimensional V Q can have a bit rate of only I l k
bits per sample by having only a single binary channel
symbol for k-dimensional input vectors.
is to produce the
The goal of such a quantization system
"best" possible reproduction sequence for a given rate R.
To quantify thisidea, to define the performancea ofquantizer, and to complete the definition of a quantizer, we
require the idea of a distortion measure.
Distortion
"-1
lim L.2'
d(Xi,.Xi)
n-n ,=O
provided, of course, that the limitmakes sense. If the vector process is stationaryand ergodic, then, with probability
one, the limit exists and equals an expectation (d(X,X)).
For the moment we will assume that such conditions are
met and that suchlong termsample averages are given by
expectations. Later remarks will focus on the general assumptionsrequiredandtheirimplicationsforpractice.
Ideallyadistortionmeasureshouldbetractableto
permit analysis, computable so that it can be evaluated in
real time and used in minimum distortion systems, and
subjectively meaningful so that large or small quantitative
distortion measures correlate with bad and good subjective quality. Here we do not consider the difficult and
controversial issues of selecting a distortion measure; we
assume that one has been selected and considermeans of
designing systems which yield small average distortion.
For simplicity and to ease exposition, we focus on two
important specific examples:
(1) The squared error distortion measure: Here the input and reproductionspaces are k-dimensional Euclidean
space
k-l
d(x,9) =
IIX
k(12 =
(Xi
- 2i)2,
i=O
Figure 1. Data CompressionSystem. The data or information source {Xn;n = 0,1, , . . } is a sequence of
random vectors. The encoderproducesaseqyence
of
cbannel symbols {&: n = 0 , 1 , 2 , , . . }, The sequence
{U,,; n = 0,1,2, . . . } is delivered to the receiver by the
digital channel.The decoderthenmaps this sequence
of vectors
in-to the final reproduction sequence
{Xn:n = 0,1,2, . . , }.
A VQ is optimal if it minimizes
anaverage distortion
Ed{X,/3[y(X)]}.Two necessary conditions for a VQ to be
optimal follow easily using the same logic as in Lloyd's [9]
b ( i ) = bin
of
Figure 2. VQ Encoder. The distortion between the input
vector and each stored codeword is computed. The encoded output is then the binary representation of the
index of the minimum distortion codeword.
IEEE ASSP
MAGAZINE
APRIL
1984
(4)
p4
4~
X
X
-1
X
X
The Euclidean centroids of the example of Fig. 3 are depicted in Fig. 4. (The numerical values may be found in
[251.) The new codewords better represent the training
vectors mapping into the old codewords, but they yield a
different minimum distortion partition of the input alphabet, as indicated by the broken lineFig.
in 3. This is the key
of the algorithm: iteratively optimize the codebook for the
old encoder and then use a minimum distortion encoder
for the new codebook.
i s somewhat
TheItakura-Saitodistortionexample
As
morecomplicated,butstilleasilycomputable.
with the squared error distortion, one groups
all input
vectors yielding a common channel symbol. Instead of
averaging thevectors,however,the
sample autocorrelationmatrices for all of the vectorsareaveraged.The
centroid is then given by the standard LPC all-pole model
forthis average autocorrelation, that is, the centroid
is found by a standard Levinsons recursion run on the
average autocorrelation.
-l
x -1
O 2
p2
p3
X = training vectors
0 = codewords
Pi = region encoded into codeword i
pr
x
Figure 3 . Two-DimensionalMinimumDistortion
Partition. The fourcirclesarethecodewordsofa
two-dimensional codebook. The Voronoi regions are the
quadrants containing the circles. The xs were produced
by a training sequence of twelve two-dimensional
Gaussian vectors. Eachinputvectoris
mapped into
the nearest-neighbor codeword, that is, the circle in the
same quadrant.
01
0 2
X
\
\
cent(v) = - C. x i ,
i(v) x,:r(x,)=v
V.
3 0
X
p3
X
P2
-1
\
\
s
APRIL 1984 IEEE
ASSP
MAGAZINE
NRandomNcodes
Product codes
10
IEEE
ASSP
MAGAZINE
APRIL
1984
r------
.. .
Splitting
Instead of constructing long codes from smaller dimensional codes, we can construct a sequence of bigger
codes
havingafixeddimensionusingasplittingtechnique
[25,16]. This method can be used for any fixed dimension,
0
including scalar codes. Here one first finds the optimum
rate code-the centroid of the entire training sequence,
as depicted in Fig. 5a for a two-dimensional input alphabet. This single codeword is then split to form two codewords (Fig. 5b). For example, the energycan be perturbed
slightly to form a second distinct word or one might purposefullyfind aworddistant from the first. is
It convenient
to have the original codeword a member of the new pair
to ensure that the distortion will not increase. The algorithm is then run to get a good rate 1 bit per vector code
as indicated in Fig. 5c. The design continues in thisway in
stages as shown: the final code of one
stage is split to form
an initial code for the next.
VARIATIONS OF MEMORYLESS VECTOR QUANTIZERS
el
VQ
11
+ +
Multistep VQ
12
IEEE ASSP
MAGAZINE
APRIL
1984
ENCODER
DECODER
Figure 7 . Multistage VQ with 2 Stages. The input
vector is first encoded by one VQ and an error vector is
formed. The second VQ then encodes the error vector.
The two channelsymbols from the two VQs together
form the complete channel symbol for the entireencoder.
The decoder adds together the corresponding
reproduction vectors.
ENCODER
Gainlshape VQ
DECODER
energy shape
Figure 8. GainiShape VQ. Firstaunit
vector is chosen t o match the input vector by maximizing
the inner product over the codewords.
Given the resulting
shape vector, a scalargain codeword is selectedso as t o
minimize the indicated quantity. The encoder yields the
product codeword aiyiwith theminimum possible squared
error distortion from the input vector.
Thus this multistep
encoder is optimum for the product codebook.
13
Ideaily, onewouldliketotakeafull
search, unconstrained VQ and find some fast means of encoding having
complexity more like the above techniques than that of
of m u l t i thefullsearch.Forexample,someform
dimensional companding followed by a lattice quantizer
as suggested byGersho [24] would provide both good
performanceand efficient implementation. Unfortunately,
however, no design methods accomplishing this goal
have
yet been found.
Separating mean V Q
14
IEEE ASSP
MAGAZINE
APRIL 1984
topic of this section. The name follows because the encoder output is "fed back" for use in selecting the new
codebook. A feedback vector quantizercan be viewed as
the vector extension of a scalar adaptive quantizer with
backward estimation (AQB) [31. Thesecondapproach is
the vector extension of a
scalar adaptive quantizer with
is calledsimplyadaptive
forwardestimation(AQF)and
vector quantization. Adaptive VQ will be considered in a
ENCODER
DECODER
Figure 10. Feedback VQ. A t time n both encoder and
decoder are in a common state S,,The encoder uses a
state VGI ys, t o encode the input vector and then selects
a new state for the next input vector. Knowing the VQ
3/s,,(Xn),
2n
ps,(vn),
sn+1
= f(un,sn).
(5)
15
I E;
R"+ 1
ENCODER
DECODER
16
-F
operation on successive vectors by the decoderat successive times while a tree-searched VQ uses a tree to construct a fast search for a single vector at a single time.
A natural variation of the basic algorithm for designing
FSVQs can be used to design trellis encoding systems:
Simply replace the FSVQ encoder which finds the minimum distortion reproduction for asingle input vector by
a Viterbi or other
search algorithm which searches the
decoder trellis to some fixed depth to find a good long
term minimum distortion path.The centroid computation
is accomplished exactly as with an FSVQ: each branch or
transition label is replaced by the centroid of all training
vectors causing that transition, thatis, the centroid conditioned on the decoder state and channel symbol. Scalar
and simple two dimensional vector trellis encoding
systems were designed in [52] using this approach.
Trellis encoding systemsare not reallyvectorquantization systems as we have defined them since the encoder is permitted tosearch aheadto determine the effect
on the decoder output of several input vectors while a
vector quantizer is restricted to search only a single vector
ahead.The two systems areintimatelyrelated,however, andatrellisencoder
can always beused to improve the performance of a feedback vector quantizer.
Very little work has yetbeendoneonvectortrellis
encoding systems.
ENCODER
IEEE ASSP
MAGAZINE
APRIL
1984
ADAPTIVE VQ
As a final class of VQ we considersystems that use one
VQ to adapt a waveform coder, which might be another
VQ. The adaptation information is communicated to the
receiver via a low rate side information channel.
of vector quantization using the
Thevariousforms
Itakura-Saito family of distortion measures can be considered as model classifiers, that is, they fit an all-pole model
to an observed sequence of sampled speech. When used
alone in an LPC VQ system, the model is used, t o synthesize thespeech at the receiver. Alternatively, one could
use the model selected to choose a waveform coder designed to be good for sampled waveforms that produce
thatmodel. For example,analogous to theomniscient
design of FSVQ one could design separate VQs for the
subsequencesof thetrainingsequenceencodinginto
common models. Both the model index and the waveform
coding index are then sent to the receiver. Thus LPC VQ
can be usedto adapt a waveform coder, possiblyalso a VQ
or related system. Thiswill yield system
a
typically of much
higher rate, but potentially of much better quality since
the codebooks can be matched to local behavior of the
data. The general structure is shown inFig. 13. The model
VQ typically operates on a much larger vector of samples
and at a much lower rate in bits per sample than does the
waveform coder and hence the bits spent on specifying
the model through the side channel
are typically much
fewer than those devoted to the waveform coder.
There are a variety of such possible systems since both
the model quantizer and the waveform quantizer
can take
on many of the structures so far considered. In addition,
as inspeechrecognitionapplications
[ 5 5 ] thegainindependentvariations of theItakura-Saitodistortion
measure which either normalize or optimize gain may be
better suited for the model quantization than the
usual
form. Few such systems have yet been studied in detail.
We here briefly describe some systems of this type that
have appeared in the literature to exemplify some typical
combinations. All of them use some form of memoryless
VQ for the model quantization, buta variety of waveform
coders are used.
The first application of VQ to adaptive coding
was by
[32] w h o used anLPC VQ t o
Adoul, Debray, and Dalle
choose a predictor for use in a scalar predictive waveform
coder.Vector quantization was used only for the adaptationandnotforthewaveformcoding.Anadaptive
VQ generalization of this system was later developed by
[45,461 who used an alternative
CupermanandGersho
classification techniqueto pick one of three vector predictors and then used those predictors in a predictive vector
quantizer. The predictive vector quantizer design algorithmpreviouslydescribed
was used,except now the
training sequence was broken up intosubsequences corresponding to the selected predictor anda quantizer was
t471
designed for each resulting error sequence. Chang
used a similar scheme with an ordinary LPC VQ as the
classifier and with a stochastic gradient algorithm run on
each of the vector predictive quantizers in order to im-
DECODER
Figure 14. RELP VQ. AnLPC VQ is used for model selection and a single VQ t o waveform encode the residuals
formed by passing the original waveform through the inverse filter A/*, The side information specifies t o the
decoder which of the model filters */A should be used
for synthesis.
ENCODER
CODEBOOK
DECODER
19
TABLE I
MEMORYLESS VQ FOR AGAUSSMARKOVSOURCE.
VQ
k SNR n
TSVQ
SNR n M
M
2
2 4.4 2
7.9 124
2 7.9 8 4
3 9.2 8 24 9.2 6
4 10.2 16 64 10.2 8
5 10.6 32 160 10.4 10
12
610.9
64 384 10.7
7 11.2 128 896 11.0 14
8
9
1 4.4
MVQ
W V Q
SNR n M SNR n M
2 4.4
7.6
42 8.6
120 8.4
8
310 9.3
756 9.1
1778 9.4
9.9
2 2
4 8
6 18
9.4
32
10 50
12 72
14 98
16 128
7.93 1
9.3 1
102
9.8 173
9.9
4
10.2 4
10.6 5
10.9 6
26
31
43
57
20
IEEE ASSP
MAGAZINE
APRIL 1984
TABLE I I
FEEDBACK VQ OF AGAUSSMARKOV
FSVQI
VPQ
k SNR
FSVQ2
M
SNR
2
64 9.5 16
2 10.8 256 4 512 10.8 32
11.4
3
512 8 1536 11.1 64
4 12.1 5-12 16 2048 11.3 128
1 10.0
64
SOURCE,
M SNR
2 16 10.0
4 64 11.2
8 192 11.6
16 5-12 11.6
16
n
2 2
4 a
8 24
64
TABLE I l l
MEMORYLESS VQ DF SAMPLEDSPEECH.
9
k SNRin
SNRout
n
1 2.0
2.1
2 5.2
5.3
3 6.1
6.0
4 7.1
7.0
5 7.9
7.6
6 8.5
8.1
7 9.1
8.4
8 9.7
8.8
2
4
8
16
32
64
128
256
MVQ
k SNRin SNRout n
2
1 2.0
2.1
2 4.3
4.4
4
6
34.4 4.3
4 4.4
4.5
8
5 5.0
5.0
10
12
6 5.0
4.9
14
7 5.3
5.1
8.812816
8 5.6
5.5
9
IO
11
M SNRin
SNRout
n
2 2.0
2.1
8 5.1
5.1
24 5.5
5.5
64 6.4
6.4
160 7.1
6.9
384 7.9
7.5
896 8.3
7.8
2048 8.9
8.0
W V Q
SNRin
SNRout
n
2
8
18 4.6
32
50
72
98
4.5
6.0
7.2
7.7
8.2
9.3
9.8
10.4
2
4
6
8
10
12
14
16
M
2
12
42
120
310
756
1778
4080
4
14
6.1
4
20
6.9
8 4 4
7.4
16 100
7.7
16 120
8.1 264 32
8.5 584 64
8.9
128 1288
9.3 2824
256
21
FEEDBACK V Q OF SAMPLEDSPEECH.
SNRin
SNRout
I
2.0
32 !7.5 7.8
64!8.3 9.0
I 10.9
; 12.2
2 2.0 2
512
9.4
10.8 2560
32
512
SNRin
SNRout
n
2.6 2.1
6.2 46.4 64
6.8
10 7.3192
16 2048 , 7.6
8.0
2
8
2
4
8 24
16 64
Table V presents a comparisonof VQ and FSVQ for vecusing the ltakura-Saito distorquantizationofspeech
tortion measure or, equivalently, vector quantization of
idered because the longer training sequence made them LPC speech models [16,14,53]. The training sequence and
more trustworthy. Again for comparison, the best known
(nonadaptive) scalar trellisencodingsystemfor
this
source yields a performance of 9 dB [521. Here the trellis
TABLE VI
encoder uses the M-algorithm with asearch depth of
ADAPTIVEVPQ.
31 samples. The general comparisons are similar t o those
of the previous source, but thereare several differences.
VPQ
Thetree-searched VQ is nowmoredegraded in comparison to the full search VQ and the multistage VQ is
k
SNRin
SNRout
even worse, about 3 dB below the fullsearch at the largest
4.34
1
4.12
2
3
4
TABLE V
LPC VQ AND FSVQ WITHANDWITHOUTNEXTSTATE
FUNCTIONIMPROVEMENT.
FSVQl
R r
VQ
SNRin
SNRout
SNRin
SNRout
SNRin
SNRout
2.9
1 3.7
.008
6.1
2 .016
5.2
7.5 4.3 7.2
6.2 7.3
3 ,023
8.4
7.5 9.0 5.9
8.8
4 .031
8.7 7.9
9.6 7.8 9.5
9.310.7
5 8.9
.039
10.68.8 9.7
6 ,047
10.5
9.5
7 .055
11.6
10.1
8 ,062
12.6
10.7
FSVQZ
K
6.1
16
16
4
22
7.47
8.10
8.87
7.17
7.67
8.30
Image coding
Figure 16. ImageTraining Sequence. The training sequence consisted of the sequence of 3 x 4 subblocks of
the five 256 x 256 images shown.
In 1980-1982 four separate groups developed successful applications of VQ techniques t o image coding[61, 62,
63,64,65,66,67,371. The only real difference from waveform coding is that now the VQ operates on small rectangular blocks of from 9 to 16 pixels, that is, the vectors
of images, typically
arereally2-dimensionalsubblocks
squares with 3 Or4 pixels on a side or 3 by4 rectangles.
We
here consider both thebasic technique and one variation.
We consider only small codebooks of 6 bitsper4 X 3
block of 12 pixels for purposes of demonstration. Better
quality pictures could be obtained
at the same rate of 1h bit
per pixel by using larger block
sizes and hence largerrates
of, say, 8 to 10 bits per block. Better quality could
also
likely be achieved with more complicated distortion
measures than the simple squared error used.
Fig. 16 gives thetrainingsequenceoffiveimages.
Fig. 17a shows a small portion of the fifth image, an eye,
magnified. Fig. 17b is a picture of the 26 = 64 codewords.
Fig. 17c shows the decoded eye. Fig. 18 shows the original, decoded image,and errorimageforthecomplete
picture. The errorimage is usefulforhighlightingthe
problems encountered with the ordinary memoryless VQ.
In particular, edges are poorly reproduced and the codeword edges make the picture appear "blocky." This problem was attacked by Ramamurthi and Gersho
[62,671 by
constructing segmented (or union or composite) codesseparate codebooksfortheedgeinformationandthe
texture information where a simple
classifier was used
to distinguish the two indesign. In [371 a feedback vector
quantizer was developed by using a separating mean VQ
with a predictivescalar quantizer to track themean. Fig. 19
shows the original eye, ordinary VQ, and the feedback
VQ. The improved ability to track
edges is clearly discernible. Fig. 20 shows the full decoded image for feedback
VQ together with the error pattern.
Although image coding using VQ is still in its infancy,
APRIL 1984 IEEE ASSP MAGAZINE
23
a3
bl
CI
bJ
igure 17. Basic Image V&l Example a t 1/2 bit per pixel.
a1 Original Eye Magnified [bl 6 bit codebook VQ codelook for 4 x 3 blocks [GI Decoded Image.
these preliminaryexperimentsusingonlyfairlysimple
V Q techniques with small
memorylessandfeedback
codebooks demonstrate that the general approach holds
considerable promise for such applications.
COMMENTS
24
cl
a1
a3
bl
Cl
Example
VQ with
VQ" [Ill. Such schemes typically enforce additional structure on the code
such as preprocessing,transforming,
splitting intosubbands, and scalar quantization, however,
and hence the algorithms may not have the freedom to
do as well as the more unconstrained structures consideredhere. Even ifthetraditional schemes provemore
useful because of existing DSP chips or intuitive variations
well matched to particulardatasources,thevector
quantization systemscan proveausefulbenchmark
for comparison.
Recently VQ has also been successfully used in isolated
word recognition systems without dynamic time warping
by using either separate codebooks for each utterance or
by mapping trajectories through one or more codebooks
[68,69,70,71,55,721. Vectorquantization has also been
used as a front end acoustic processor to isolated utterAPRIL 1984 IEEE ASSP MAGAZINE
26
IEEE ASSP
MAGAZINE
APRIL
1984
28
APRIL 1984
J.
iere
29