SlideShare a Scribd company logo
84 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-28, NO. 1, JANUARY 1980
An Algorithmfor Vector Quantizer Design
YOSEPHLINDE, MEMBER. IEEE. ANDRES BUZO, MEMBER, EEE, A m ROBERTM.GRAY,SENIORMEMBER. EEE
’ Abstract-An efficient,andintuitivealgorithm is presented for the design
of vector quantizersbased either on a known prohabitistic model or on a
long training sequence of d
a
t
a
.The basic propertiesof the algorithm are
discussed mid demonstrated by examples.
Quite
general
distoriion
measures and
long blocklengthsareallowed,asexemplifiedby the design
of
parameter vector quantizers of tendiensional vectors arising in Linear
Predictive Coded(LE)
speechcompression witha complicated distortion
measure arisiig in LPC analysis that does not depend only on the error
vector.
INTRODUCTION
A-
N efficient and intuitive algorithm for the design of good
blockor vector quantizerswith ‘quite general distortion
measures is developed for use oneitherknown probabilistic
source descriptions or ona long training sequenceof data. The
algorithm is based on in approach of Lloyd [I] ,is not a varia-
tional
technique,
and involves no differentiation; hence it
works well even whenthedistribution hasdiscrete compo-
nents, as is the case when a sample distribution obtained from
atrainingsequence is used. As with the common variational
techniques, the algorithm produces a quantizer meeting neces-
sary butnot sufficient conditions
for
optimality. Usually,
however, at least localoptimality is assured in both approaches.
We here motivate and describe the algorithm and relate it to
a number of similar algorithms for special cases that have
appeared in both the quantization and cluster analysis litera-
ture.
The basic operation of the algorithm is simple and
intuitive in the general case considered here and it is clear that
variational techniques are not required to develop nor to apply
the algorithm.
Several of the algorithm’s basic properties are developed
using heuristicargumentsand demonstratedby example. In
a companion theoretical paper [ 2 ] ,these properties are given
precise mathematicalstatementsand are proved using -argu-
mentsfromoptiniizationtheory andergodic theory. Those
results will occasionally be quoted here to characterize the
generality of certain properties.
Paper approved by the Editor for Data Communication Systems of
the IEEE Communications.Society for publication after presentation in
part at the 1979 International Telemetering Conference, Los Angeles,
CA, November 1978 and the 1979 International Symposium on hfor-
mation Theory, Gringano, Italy; June 1979. Manuscriptreceived May
22, 1978; revised August 21,1979.Thisworkwassupportedby Air
Force Contract F44620-73-0065, F49620-78C-0087, and F49620-79-
C-0058 andbytheJoint Services Electronics Program atStanford
University, Stanford, CA.
Y.Linde is with the Codex Corporation,
Mansfield, MA.
A. BuzowaswithStanford University, Stanford, CA and Signal
Technology Inc., Santa Barbara, CA.Heis nowwiththeInstitutode
Ingenieria, National University of Mexico, MexicoCity, Mexico.
R. M. Gray is with the Information Systems Laboratory, Stanford
University, Stanford, CA 94305.
In particular, the algorithm’s convergence properties are
demonstrated herein by several examples. We consider the
usual test cas. for such
algorithms-quantizer, design for
memoryless Gaussian sources with a mean-squared.error distor-
tion measure; but we design and evaluate block quantizers with
a rateof one bit per symbol and with blocklengths,of
1 through
6. Comparison with recently developed lower bounds to the
optimaldistortion of such block quantizers (which provide
strict improvement over the traditional bounds of rate-distor-
tion theory) indicate that the resulting quantizers are indeed
nearly optimalandnot simply locally optimal. We also con-
sider a scalar case where local optima arise and show how a
variation of the algorithm yields aglobal optimum. .
The algorithm is also used to design a quantizer for 1Odi-
mensional vectors- arising inspeech compression systems. A
complicated distortion measure is used that does not simply
depend on the errorvector. No probabilistic model is assumed,
and hence the quantizer must be designed based on a training
sequence of real speech. Here the convergence properties for
both length ofthe training
sequence
and the
number of
iterations of the algorithm are demonstratedexperimentally;
No theoretical optimumjs known for
this case,but our system
was used to compress the output of a traditional 6000 bit/s
Linear Predictive Coded (LPC) speech system down to a rate
of 1400 bit& with only a slight loss in quality as judged by
untrained listeners in informal subjective tests. To theauthors’
knowledge,directapplication of variationaltechniques have
not succeededin designing block quantizersforsuch large
block lengths and suchcomplicated distortion measures.
BLOCK QUANTIZERS
An N-level k-dimensionalquantizer is amapping, q;that
assigns to each input vector, x = (xo, -, X ~ L I ) , a reproduc-
tion
vector, i= q(x), drawn from a
finite reproduction
alphabet, A = bi;
i = 1, -, N}. The quantizer 4 is compl$ely
described bythereproduction alphabet(or codebook) A to-
gether with the partition, S = {Si;
i = 1, *-*, N), of the input
vector space into the sets Si= {x: q(x) =vi)of input vectors
mapping into the ith reproduction vector (or codeword)., Such
quantizers are also called block quantizers, vector quantizers,
and block source codes.
DISTORTION MEASURES
We assume thedistortion caused byreproducing an input
vector x by a reproduction vector iis giveri by a nonnegative
distortion measure d(x, 2). Many such distortion measures
have beenproposedin theliterature. Themost commonfor
0090-6778/80/0100-0084$00.75 0 1980 IEEE
LINDE et al. :ALGORITHM FOR VECTOR QUANTIZER DESIGN 85
reasons of mathematical convenience is the squarederror dis-
tortion.
k-1
i=0
Other common distortionmeasures are the l,, or Holder norm,
and its vth power, the uth-law distortion:
b-1
While both distortion measures (2) and (3) depend on theuth
power of the errorsin the separate coordinates, themeasure of
(2) is often more useful since it is adistance ormetricand
hence satisfies the triangle inequality, d(x, 2) d d(x, y ) +
d @ ,.?), for all y . The triangle inequality allows one to bound
the overall distortion easily in a multi-step system by the sum
of the individual distortions incurredin each step. The usual
vth-law distortionof (3) does not have this property. Other
distortion measures are the I,, or Minkowski norm,
the weighted-squares distortion,
k--3
where wi 2 0, i = 0 , .-e, k - 1,and the more general quadratic
distortion
d(x,i
)
= (x -i)B(x -i ) t
where B = {Bi,j}is a k X k positive definite symmetric matrix.
All of the previously described distortion measures have the
property that they depend on the
vectors x and P only through
the error vector x - i
.
Such distortion measures having the
form d(x, i
)
= L(x -2) are called difference distortion meas-
ures. Distortion measures not having this form but depending
on x and 2 in a more complicated fashion have also been pro-
posed for data compression systems. Of interest here is a dis-
tortion measure of Itakura and Saito [3, 41 and Chaffee
[5,321 which arises inspeechcompressionsystems andhas
the form
d(x ,i )= (x -i
)
R
@)(x -i)f
, (7)
where for each x , R ( x )is a positive definite k X k symmetric
matrix. This distortion resembles thequadraticdistortionof
(6),but here the weighting matrix depends on the inputvector
We are here concerned with the particular form andapplica-
tion of this distortion measure rather than its origins, which
are treated in depth in [3-91 and in a paper in preparation. For
motivation, however, we briefly describe the context in which
this distortion measure is used in speech systems. In the LPC
approach to speech compression [IO] ,each frame of sampled
speech is modeled as the output of a finite-order all-pole filter
driven by either white noise (unvoiced sounds) or aperiodic
pulse train (voiced sounds). LPC analysis has, as input, a frame
of speech and produces
parameters describing the model.
These parameters are thenquantized and transmitted. One
collection of suchparameters consists of a voiced/unvoiced
decision together witha pitch estimate for voiced sounds, a
gain term u (related to volume), and the sample response of
the normalized inverse filter (1, al, a2, . * a , aK), thatis,the
normalized all-pole model has transfer function or z-transform
(Zf=o Q ~ z - ~ } - ~ .
Other parameterdescriptionssuch as the
reflection coefficients are also possible [101.
In
traditional LPC systems, the various parameters are
quantized separately, but suchsystemshave effectively reached
their theoretical performance limits [1I] .Hence it is natural
to consider block quantization of these parameters and com-
pare the performance with the traditional scalar quantization
techniques. Here we consider the case where the pitch and gain
are (as usual) quantized separately, but the
parameters describ-
ing the normalizedmodel are to be quantized together as a
vector. Since the lead term is 1, we wish to quantize a vector
(al,u2, .e-, u K ) & x = (xo, - - e , x ~ - ~ ) .
A distortion measure,
d(x, i),
between x and a reproduction x , can then be viewed
as a distortion measure between two normalized (unit gain)
inverse filters or models. A distortion measure for such a case
has been proposed by Itakura and Saito [3,4] and by Chaffee
[5,321 and it has the form of (7) with R(x) the autocorrela-
tion matrix (r,.(k -j ) ; k = 0, 1, a s - , K - 1; j = 0, 1, e - , K -1)
defined by
X.
described by x when theinput has
a flatunit amplitude
spectrum.
Many properties and alternative formsfor thisparticular
distortion measure are developed in [3-91, where it is also
shown thatstandard LPC systems
implicitly minimize this
distortion, which suggests that it is also an appropriate distor-
tion measure for
subsequent
quantization. Here,
however,
theimportantfact is thatit is not adifference distortion
measure; it is one for which the dependence on x and i is
quite complicated.
We also observe that various functions of the previously
defined distortion measures have been proposed in thelit-
erature,forexample,distortion measures of theforms IIx -
i l l r and p( Ilx -ill), where p is a convex functionandthe
norm is any of the previously defined norms. The techniques
to be developed here are applicable to all of these distortion
measures.
86 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL.COM-28,
NO. 1, JANUARY 1980
PERFORMANCE
k t x = (xo,..*,Xk-l)
be a real random vector described
by a cumulative distribution function F(x)= Pr{Xi Qxi;i =
0, 1, a**, k - 1). A measure of the performance of a quantizer
4 applied tothe random vector X isgiven by the expected
distortion
where E denotestheexpectationwith respect to the under-
lying distribution F. This performance measure is physically
meaningful if the quantizer q is to be used toquantize a
sequence of vectors X , = ( X , K , e - , XnK+K-l)
that are sta-
tionary and ergodic, since then thetime-averaged distortion,
converges with probabilityone to D(4) as n -+ 00 (from the
ergodic theorem),that is, D(4) describes the long-run time-
averaged distortion.
An alternativeperformance measure is the maximum of
d(x, 4(x)) over all x in A , but we use only the expected dis-
tortion (9)since,in most problems of interest (to us), it is
the average distortion and not the peak distortion that deter-
mines subjective quality. In addition, the expected distortion
is more easily dealt with mathematically.
OPTIMAL QUANTIZATION
An N-level quantizer will be said to be optimal (or globally
optimal) if it minimizes theexpecteddistortion,that is, 4*
is optimal if for all other quantizers 4 having N reproduction
vectors D(q*) < D(q). A quantizer is said to be locally opti-
mum if D(q) is only a local minimum, that is, slight changes
in q cause an increase in distortion. The goal of block quan-
tizer design is to obtain an optimal quantizer if possible and,
if not,toobtain a locally optimal and hopefully
“good”
quantizer. .Several such algorithms have been proposed in the
literatureforthecomputer-aided design of locally optimal
quantizers. In a few special cases, it has
been possible to
demonstrate global optimalityeither analytically or byex-
hausting alllocal optima.In 1957,in a classic but unfortunately
unpublished Bell Laboratories’ paper, S. Lloyd [ l ] proposed
two methods for quantizer design for the scalar case (k = 1)
with a squarederror distortion criterion. His “Method II” was
a straightforward variational approach wherein he took deriva-
tives with respect tothereproductionsymbols,yi, andwith
respect totheboundarypoints defining the Si andsetthese
derivatives to zero.Thisin general yieldsonlya “stationary-
point”
quantizer (a multidimensional
zero derivative) that
satisfies necessary but not sufficient conditions for optimality.
By second derivative arguments, however, it is easy to establish
that such stationary-point quantizers are at least locally opti-
mum for
vth-power law distortion measures. In addition,
Lloyd also demonstrated global optimalityfor certaindistri-
butionsby a technique of exhaustively searching all local
optima. Essentially the same technique was also proposed and
used in the parallel problem of cluster analysis by Dalenius
[12] in 1950, Fisher [13] in 1953, andCox [14] in 1957.
The technique was also independently developed by Max [151
in1960 andthe resulting quantizer is commonlyknown as
the Ldoyd-Max quantizer.
This approach has proved quite
useful for designing scalar quantizers, with power-law distor-
tion criteria and withknown distributions that were suffi-
ciently well behaved to ensure the existence of the derivatives
in question. In addition, for this case, Fleischer [161 was able
to demonstrate analytically that the resulting quantizers were
globally optimum forseveral interesting probability densities.
In some situations, however, the direct variational approach
has not proved successful. First, if k is not equal to 1 or 2, the
computational requirements
become too
complex. Simple
combinations of one-dimensional differentiation will not work
because of the possibly complicated surface shapes of the
boundaries of the cells of the partition. In fact, the only suc-
cessful applications of a direct variational approach to multi-
dimensional quantization areforquantizers where theparti-
tion cells are required to have aparticular simple form such
as multidimensional “cubes” or, in two dimensions, “pie
slices,” each described only by a radius and two angles. These
shapes are amenable to differentiationtechniques, but only
yield a local optimum within the constrained class. Secondly,
if, in addition, more
complex distortion measures such as
those of (4)-(7) are desired, the required computation asso-
ciated with the variational equations can become exorbitant.
Thirdly, if the underlying probability distribution has discrete
components,
then
the required derivatives may not
exist,
causing furthercomputational problems.Lastly, if one lacks
a precise probabilisticdescription of the randomvector X
and must base the design instead on an observed long training
sequence of data, then there is no obvious way to apply the
variational approach. If the underlying unknown process is
stationaryand ergodic, then hopefullyasystem designed by
using asufficientlylongtrainingsequenceshould also work
well onfuturedata.To directlyapply the variational tech-
nique in this case, one would first have to estimate the under-
lying continuousdistribution based onthe observations and
then take the appropriate derivatives. Unfortunately, however,
most statisticaltechniquesfordensityestimation require an
underlyingassumption on the class of allowed densities, e.g.,
exponential families. Thusthesetechniques are inappropriate
when no such knowledge is available. Furthermore, a good fit
of a continuous model to a finite-sample histogram may have
ill-behaved differential behavior andhencemay not produce
a good quantizer. To our knowledge, no one has successfully
used such an approach nor has anyone demonstrated that this
approach will yield the correct quantizerin the limit of a
long training sequence.
Lloyd [l] also proposed an alternativenonvariational ap-
proach as his “Method I.” Not surprisingly, both approaches
yield the same quantizer forthe special cases he considered,
but we shall argue that a natural and intuitive extension of his
Method I provides an efficientalgorithmfor the design of
good vectorquantizers that overcomes the problems of the
variational approach. In fact, variations of Lloyd’s Method I
have been “discovered” several times in the
literature
for
LINDE et al.:ALGORITHM FOR VECTOR QUANTIZER DESIGN
. .
87
squared-error and magnitudeerror distortion criteria for both
scalar and multidimensional cases (e.g., [22],[23],[24],
[31]). Lloyd’s basic development, however, remains the sim-
plest, yet it extendseasily to thegeneral case considered here.
To describe Lloyd’s Method I in the general case, we first
assume that the distribution is known, but we allow it to be
eithercontinuousor discreteandmake no assumptions re-
quiring the existence of derivatives. piven a
quantizer 4
described by a reproduction alphabet A = bi;
i = 1 , -, N}
andpartitiof S = {Si;
i = 1 , e-, N}, then the expected dis-
tortion D((A,S}) 4D(4)can bewritten as
where E(d(X,vi)1 X E Si)
is the conditional expected distor-
tion, given X E Si,or,,equivalently,given 4(X) =yi.
Suppose that we are given a particular reproduction alpha-
bet A , but a partition is not specified. A partition that is opti-
mum for A is easily constructed by mapping each x into the
y i E A minimizing the distortion d(x,vi),
that is, by choosing
the minimum distortion
or nearest-neighbor
codeword for
each input. A tie-breaking rule such as choosing the reproduc-
tion with the lowest index is required if more than one :ode-
word minimizes thedistortion.Thepartition, say P(A) =
{Pi; i = 1 , e,.., N} constructed inthis manner is suchthat
x €Pi(or q(x) =y i ) only if d(x,yi)<d(x,yj),allj , and hence
which, in turn, implies for any partitionS that
Thusfor afixed reproduction alphabet A^, the best possible
partition is P(A).
Conversely, assume we are given a partition S = {Si;i =
1 , .-,N} describing a quantizer. For the moment, assume also
that the distortion measure and distribution are such that, for
each setSwith nonzero probability
in k-dimensionalEuclidean
space, there existsa minimum distortionvector A!($)for which
Analogous tothe case of a squarederrordistortion measure
and
a
uniform
probability distribution, we call the vector
i
(
S
)
thecentroid or center of gravity of the set S. If such
pointsexist,then clearly for afixed partition S = {Si; i =
1, -, N}, no reproduction alphabet A = bi;
i = 1, -, N} can
yield a smaller average distortion then the reproduction alpha-
bet j(S) 4 {i(Si);i = 1, -, N} containing the centroids of the
sets inS since
It is shown in [2] that the centroids of (13) ehst for all
sets S with nonzeroprobabilityforquite general distortion
measures including all of those considered here. In particular,
if d(x, y) is convex iny, then centroidscan be computed using
standard convex programming techniques.asdescribed, e.g., in
Luenberger [17, 181 or Rockafellar [19]. Incertain cases,
they canbe found easily using variational techniques. If the
probabiiity of a set S is zero, then the centroidcan be defined
in an arbitrary manner since then the conditional expectation
given that Sin (13) has nounique definition.
Equations (12) and (14) suggest anaturalalgorithm for
designing a good quantizer by taking any given quantizer and
iteratively improving it:
Algorithm (Known Distribution)
(0) Initialization: Given N = number s f levels, a distortion
threshold e 2 0, and an initial N-level reproduction alphabet
A , and a distributionF. Set m = 0 and DV1= -.
(1) Given A, bi;
i = 1, -a, N}, find its minimumdistor-
tion partition P(A,) = {Si;
i = 1,.-, N}:x E Siif d(x,yi)Q
d(x, y j ) for all j . Computethe resulting average distortion,
Dm =D({Am, P(Am)})
= E min,EAm d(X?y).
(2) If (Dm-1 - D,,,)/D, < e, halt with A, and P(&)
describing final quantizer. Otherwise continue.
(3) Find theoptimalreproduction alphabet d(P(&)) =
{$(si);
i = 1, -, N} fir ~(i,).
set Am+ e A?(P(A,,,)).
Replace m by m + 1 and go to (1).
If, at some point, the optimd partition P(A,) hasa cell
Sisuch that Pr(X E Si)= 0, thenthe algorithm, as,stated,
assigns an arbitrary vector as centroid and continues. Clearly,
alternative rules are possible and may perform better in prac-
tice. For exainple, one can simply remove the cell si.
and the
corresponding reproduction symbol from
the quantizer
without affectingperformance, andthen contiriue withan
(N - 1) level quantizer. Alternatively, one could assign to Si
its Euclidean centerof gravity orthe zth centroidfromthe
previous iteration. One could also simply reassign the repro-
duction vector corresponding to Sito another cell Sj axid con-
tinue the algorithm. The stated technique isgiven simply for
convenience, since zero probability cells were not aproblem
in the examplesconsideredhere.They can, however,occur
andin suchsituations alternativetechniquessuch as those
described may well work better. In practice, a simple alterna-
tive is that, if the final quantizer produced by the algorithm
hasa zero probability(hence useless) cell, simply rerunthe
algorithm with a different initialguess.
1
88 IEEE TRANSACTIONS
ON
COMMUNICATIONS, VOL.COM-28,NO. 1 , JANUARY 1980
From (12) and (14), D , <Dm-l and hence eachiteration
of the algorithm must either reduce the distortion or leave it
unchanged. We shall later mention‘some minor
additional
details of the algorithm and discuss.techniquesfor choosing
an initial guess, but the previously given description contains
the essential ideas.
Since Dm is nonincreasing and nonnegative, it must have a
limit, say D,, as m * 00. It is shoTn in [2] that if a limiting
quantizer A, exists in the sense A , +A, as m + in the
usual Euclidean sense, then D({A,, P(A,)}) = Dm and A,
has the property that A, = 3(P(A,)), that is, A, is exactly
the centroid of its own optimal partition. In the language of
optimizationtheory, {A,, P(A,)} is a fuced point under
further iterations of the algorithm [17, 181. Hence the limit
quantizer (if itexists) is called afixed-pointquantizer (in
contrastto a stationaiy-point quantizer obtained bya varia-
tional
approach). In this light,
the algorithm is simply a
standardtechniquefor findinga fixed point via themethod
of successive approximation (see, e.g., Luenberger [17,
p. 2721). If E = 0 and the algorithm halts for finite m, then
such a fixed point has been attained [2] .
It is shown in [2] that a necessary condition for a quantizer
to be optimal is that it be afixed-point quantizer.It is also
shown in [2] that, as in Lloyd’s case, if a fixed-pointquantizer
is suchthatthere is no probability ontheboundaryofthe
partition cells, that is if Pr(d(X, vi)= d(X,y j ) ,for some i #
j ) = 0, then the quantizer is locally optimum. This is always
the case withcontinuousdistributions,but can inprinciple
beviolated for discrete distributions.It was never found to
occur in ourexperiments, however. As Lloyd suggests, the
algorithm can easily be modified to test a fixed point for this
condition and if there is nonzero probability of a vector on a
boundary,the strategy would be to reassign the vector to
another cell of the partition and continue the iteration.
For theN = 1 case with a squared-error distortion criterion,
the algorithm is simply Lloyd’s Method I, and his arguments
apply immediately in the more general case considered herein.
A similar technique was earlier proposed in 1953by Fisher
[131 in a cluster analysis problem using Bayes decisions with
a squarederror cost. For larger dimensions
and distortion
measures of the formd(x,2)= Ilx -2 11; ,r> 1,the relations
(12) and (14) were observed by Zador [20] and Gersho [21]
in their
work
on
the
asymptotic performance of optimal
quantizers,and.hencethe algorithm is certainly implicit in
their work. They did not, however, actually propose or apply
the technique to design aquantizerfor fixed N. In 1965,
Forgy [31] proposed the algorithm for cluster analysis for the
multidimensional squarederrordistortion case anda sample
distribution (see the discussion in MacQueen [25]). In 1977,
Chen [22] proposed essentially the same algorithm forthe
multidimensional case with the squarederror distortion meas-
ure and used it
to design two-dimensional quantizers
for
vectors uniformly distributed in a circle.
Since the algorithm has nodifferentiability requirements, it
is valid for purely discrete distributions.This has an important
application to the case where onedoes not possess a priori a
probabilistic description of the source to be compressed, and
hence must base his design on an observed longtraining se-
quence of thedatato be compressed. One approach w.ould
be to use standarddensityestimationtechniques of statistics
toobtain a “smooth”distribution of F and tothenapply
variational techniques. As previously discussed, we donot
adopt this approach as it requires additionalassumptions on
the allowed densities. Instead we consider the following
approach: Use the training sequence, say {xk;k = 0, e - , n - 1)
to form thetime-average distortion
and observe that this is exactly the expected distortion
EGnd(X, 4(X)) with respect tothe sample distribution Gn
determined by the training sequence, i.e., the distribution that
assigns probability m/n to a vectorx that occurs in thetraining
sequence m times. Thus we can design a quantizer that mini-
mizes the time-average distortion for the training sequence by
running the algorithm onthe sample distribution Gnc1p2).
This yields the following variation of the algorithm:
Algorithm (Unknown Distribution)
(0) Initialization: Given N = number of levels, distortion
threshold E > 0, an initial N-level reproduction alphabet Ao,
and a training sequence {xj;j = 0, -, n - 1). Set m = 0 and
(1) GivenA , bi;
i = 1, - e , N}, find the minimum distor-
tionpartition P(A,) = {Si;
i = 1, e-, N} ofthe training
sequence: xj E Siif d(xj,yi)< d(xj, yl),for all 1. Compute
the average distortion
D-1 = 0
0
.
(2) If (Dm-1 -D,)/D, <E , halt with A, final reproduc-
tion alphabet. Otherwise continue.
(3) Find theoptimalreproduction alphabet i(P(Am))=
{?(si);
i = 1, .-, N) for P(A,). Set A,+ & ~(P(A,)).
Replace m by m + 1 and go to (1).
Observe above that while designing the
quantizer, only
partitions of the .training sequence (theinput alphabet) are
considered. Once the final codebook A, is obtained, however,
it is used on new data outside the training sequence with the
optimum nearest-neighbor rule, that is, an optimum partition
of k-dimensional Euclidean space.
( 1 ) Itwasobserved by a reviewerthat application of the algorithm
tothe sample distribution provides a“Monte Carlo”
design of the
quantizer for a vector with a known distribution, thatis,thedesign is
based on samples of the random vectors ratherthan on an explicit
distribution.
(2) During the period this paper was being reviewed for publication,
two similar techniques were reported for special cases. In 1978, Capria,
Westin,and Esposito (231 presented a similar technique for the scalar
caseusing dynamic programmingarguments.Theirapproachwas for
average squarederrordistortion and for maximum distortion over the
training sequence. In 1979, Menez, Boeri, andEsteban (241 proposed
a similar technique for scalar quantization using squarederror and
magnitudeerror distortionmeasures.
LINDE er al. :ALGORITHM
FOR
VECTOR
QUANTIZER
DESIGN 89
If the sequence of randomvectors is stationary andergodic,
then it follows from the ergodic theorem that, with probabil-
ity one, G, goes to the true underlying distribution F as n +
-. Thus if the training sequence is sufficiently long,hopefully
a good quantizer for thesample distribution G, should also be
good for the true distribution F, and hence should yield good
performance forfuturedata producedby the source. All of
these ideas are made precise in [2] where it is shown that,
subject to suitable mathematical assumptions, thequantizer
produced by
applying the algorithm to G
, converges, as
n + 00, to the quantizer produced by applying the algorithm
tothetrue underlying distribution F. We observe thatno
analogousresults are knowntotheauthorsforthe density
estimation/variational approach, and that independence of
successive blocks is not required for these results, only block
stationarity andergodicity.
We also pointoutthat,for finitealphabet distributions
such as sample distributions, thealgorithm always converges to
a fixed-point quantizerin a finitenumber of steps[2].
A similar technique used incluster analysis withsquared-
error cost functions was developed by MacQueen in 1967 [25]
and has been called the k-meansapproach.Amore involved
technique using the k-means approach is the “ISODATA”
approach of Ball and Hall [26]. The basic idea of finding
minimum distortion partitions and centroids is the same, but
the trainingsequence data is used ina different manner and
the resulting quantizers will, ingeneral, be different. Their
sequentialtechnique incorporatesthe trainingvectorsone at
atime and ends when the lastvector is incorporated. This is
in contrast to the previous algorithmwhichconsiders all of
the training ve,ctors at each iteration.The k-means method
can be described as follows: The goal is to produce a partition
SO = {So, . - a , S N - ~ }of the training alphabet, A = {xi;
i =
0, -, n - 1) consisting of all vectors in the !raining sequence.
Thecorresponding reproductionalphabet A will then be the
collection of the Euclidean centroidsofthe sets Si,
that is,
the final reproductionalphabet will be optimalforthe final
partition (but the final partition may not be optimal for the
final reproductionalphabet,except as n + -). Toobtain S ,
we first think of the each Sias a bin in which to place training
sequencevectorsuntil all are placed.Initially, we startby
placing the first N vectorsinseparatebins, i.e., xiE Si,
i =
0, a * - , N - 1. We then proceed as follows: at each iteration, a
new trainingvector x, is observed. We find the set Si for
which thedistortion between x, and thecentroid ;(Si)is
minimized andthen add x, to thisbin. Thus,ateachitera-
tion,the new vector is added to the bin withthe closest
centroid,and hence thenexttime, this bin will have a new
centroid. This operation is continued until all sample vectors
are incorporated.
Although similar in philosophy, the k-means algorithm has
some crucial differences. In particular, it is suited for the case
where only the trainingsequence is to be classified, that is,
wherealongsequence of vectors is to be groupedinalow
distortion manner. The sequentialprocedure is computation-
ally efficient for grouping, but a “quantizer” is not produced
until theprocedure is stopped. In other words,
in cluster analy-
sis, one wishes to groupthings and the groups can change with
time, but in quantization, one wishes to fix the groups (to get
a time-invariant quantizer), and then use these groups (or the
quantizer) on future data outside
of the training sequence.
An additionalproblem is thattheonlytheorems which
guarantee convergence, in the limit of a long training sequence,
require the assumption that successive vectors be independent
[25], unlike the more general case for the proposed algorithm
Recently Levenson et al. used avariation of the k-means
and ISODATA algorithms with a distortion measure proposed
by Itakura [4] to determine reference templatesfor speaker-
independent word recognition [27]. They used, as a distortion
measure, the logarithm of thedistortionof (7) (which is a
gain-optimized Itakura-Saitodistortion [7]-our use ofthe
distortion measure with unit-gain-norrnalizd models results
in no such logarithmic function). In their technique,however,
aminimax rule was used to select thereproduction vectors
(or cluster points) rather than finding the “optimum” centroid
vector. If instead, the distortion measure of (7) is used, then
the centroidsare easily found, as will be seen.
P I .
CHOICE OFA
^
,
There are several ways to choose the initial reproduction
alphabet do requiredby the algorithm. One methodfor use
on sample distributionsis that of thek-means method, namely
choosing the first N vectors in the training sequence. We did
nottry this approach as, intuitively, one would like these
vectors to be well-separated,and N consecutive samples may
not be. Twoothermethods were found to be useful in our
examples.Thefirst is to use a uniform quantizer over all or
most of the source alphabet (if it is bounded). For example,
if used on a sample distribution,one uses a k-dimensional
uniform quantizer on ak-dimensional Euclidean cube includ-
ing all or most of thepoints in the trainingsequence.This
technique was used in the Gaussian examples described later.
Thesecondtechnique is useful when one wishes to design
quantizers
of successively higher rates until achieving an
acceptable level of distortion. Here we consider M-level quan-
tizers with M = 2R, R = 0, 1, * . e , and continue until we
achieve an initial guess for an N-level quantizer as follows:
INITIAL GUESS BY “SPLITTING”
(0) Initialization: Set M = 1 anddefine Ao(l) = i(A),
the centroid of the entire alphabet (the centroid of the train-
ing sequence, if a sample distributionis u
s
e
!
)
.
(1) Given thereproductionalphabet Ao(M) containing M
vectors hi;
i = 1 , - e , M}, “split” each vector yiinto twoclose
vectors yi + e and yi - e, where e is afixed perturbation
vector. The collection 2of f y i + E , yi-E , i = 1, e-, M} has
21.1vectors. Replace M by q.
(2) Is M =N ? If so, set A . =AIM)and halt.Jois then the
initial reproduction
alphabet for the N-level quantization
algorithm. If not, run the algorithm for an M-level quantizer
on i
(
j
l
4
)
to produce a good reproduction alphabetAo(M),and
then return to step(1).
Using the splittingalgorithm on atrainingsequence, one
starts with a one-level quantizer consisting of the centroid of
the training sequence. This vector is then split into twovectors
90 IEEE
TRANSACTIONS
ON
COMMUNICATIONS, VOL. COM-28, NO. 1 , JANUARY 1980
andthe two-level quantizeralgorithm is run on this pair to
obtain a good (fixed-point) two-level quantizer. Each of these
two vectors is then split and the algorithm is run to produce
a good four-level quantizer. At the conclusion, one has fixed-
point quantizers for1,2,4, 8, .-,Nlevels.
EXAMPLES
GaussianSources
The algorithm was used initially to design quantizers for
the classical example of memoryless Gaussian random var-
iables with the squarederror distortion criterion of (l), based
on a training sequence of data. The training and sample data
were produced
by a
zero-mean, unit-variance memoryless
sequence of Gaussian random variables. The initial guess was
a unit quantizer on the k-dimensional cube, {x: Ixi I <4;i =
0, --,k - 1). A distortion threshold of 0.1%was used. The
overall algorithm can be describedas follows:
(0) Initialization: Fix N = number of levels, k = block
length, n = length of trainingsequence, E = .001. Given a
training sequence {xj;j = 0, . - a , n - 1): Let ,do be anN-level
uniform quantizer reproduction alphabet for the
k-dimensional
cube,{u:IuiI~4,i=0,--,k-l}.Setm=OandD~1=~.
(1) Given A , = Q i ; i = 1, * e . , N}, findtheminimum
distortion partition P(A,) = {Si; i = 1, --,N}. For example,
for each j = 0, * . a , n - 1,,compute d(xj,vi)for i = 1, * - a , N.
If d(xj,yi) <d(xj,yr) forall 1, thenxj ESi.Compute:
n-1
D, =D({A, ,P(A,)}) =n-1 min d(xj,y>.
j=O YEA,
(2) If (Dm-1 -D,)/D, <E = .001, halt with final quan-
tizer described by a,. Otherwise continue.
(3) Find the optimal rep5oduction alphabet i(P(km))
=
{$(Si); i = 1, -, N} for P(A,). For the squarederror crite-
rion, ;(Si) is the Euclidean center of gravity or centroid given
by
1
x(si) = - C xi,
II si It j:xjESi
where llSill denotes the number of training vectors in the cell
Si. If llSiII = 0, set ;(Si) = yi, the old codeword. Define
A,+1 =3(P(A,)), replace m by m + 1,and go to (1).
Table 1 presentsa simple but nontrivial example intended
to demonstrate the basic operation of the algorithm. A two-
dimensional quantizer with four levels is designed, based on a
short training sequence of twelve training vectors. Because of
the short training sequence in this case, the final distortion is
lower than one would expect and the final quantizer may not
work well on new data outside of the training sequence. The
tradeoffs between the length of the training sequence and the
performance inside and
outside
the training
sequence
are
developed more carefully in the speech example.
Observe that, in the example of Table 1, the algorithm
could actually halt in step (1) of the m = 1 iteration since, if
P(&) = P(,&,-l), it follows that A,+1 = 9
(
P
(
A
,
)
) =
$(P(A?m-l)) = A,, and hence A, is the desired fixed point.
Nore: Characterswith tildes underneath
appear boldface in text.
In other words, if the quantizer stays the same for two itera-
tions,thenthe.twodistortions are equalandan ‘‘E = 0”
threshold is satisfied.
As a more realistic example, the algorithm was run for the
scalar (k = 1) case with N = 2, 3, 4, 6 and 8, using a training
sequence of 10,000samples per quantizer output froma zero-
mean, unit-variance memoryless Gaussian source. The resulting
quantizer
outputs
and
distortion were within 1% of the
optimal values reported by Max [15]. No more
than 20
LINDE et al. :ALGORITHM FOR VECTOR QUANTIZER DESIGN 91
3L 0.37
L
OPTIMAL
---MINIMAL
'.lo 2 4 6 8
-
ITERATIONS
Fig. 1. TheBasicAlgorithm:GaussianSource N = 4.
iterations were required for N = 8 and,for smaller N, the
number of iterations was considerably smaller. Figure 1
describes the convergence rate of one of the tests forthe
caseN = 4.
'The algorithm was then tried for block
quantizers for
memoryless Gaussian variables withblocklengths k equal to
1, 2,3,4,5and 6 and a rate of onebit per sample,so that N =
2k. The distortion criterion was again the squared-error dis-
tortion measure of (1). The algorithm used a training sequence
of 100,000 samples. In each case, the algorithm converged in
fewer than 50 iterations and the resulting distortion is plotted
in Fig. 2, together with the onebit-per-symbol scalar case as
a function of block length. For comparison, the rate-distortion
bound [28, p. 991 D(R) = 2-2R for R = 1 bit-per-symbol
is also plotted. As expected and as shown in Fig. 2 , the block
quantizers outperform the scalar quantizer, butfor these block
lengths, the performance is still far from the rate distortion
bound (which is achievable, in principle, only in the limit as
k + -). A more favorable comparison is obtained using a
recent result of Yamada,Tazaki,andGray [29] which pro-
vides a lower bound to the performance of an optimalN-level
k-dimensionalquantizerwithadifference distortion measure
when N is large. This bound provides strict improvement over
the rate-distortion boundfor fixed kand tendstotherate-
distortion bound as k + -. In the current case, the bound has
the form
where I
' is the gamma function. This bound is theoretically
inappropriate for small k,yetit is surprisingly close forthe
k = 1 result, which is known tobe almost optimal. Fork = 6,
0.33
z
2 0.31 0
a
O QUANTIZER LOWER BOUND $d(k)
BLOCKLENGTH k
Fig. 2. BlockQuantizationRate I bit/symbolGaussianSource.
N = 26 = 64 is moderately large and the closeness of the
actual performance to the lower bound, compared to optimal
performance provided by D,ck)(l), suggests thatthe algo-
rithm is indeed providing a quantizer with block lengthsix and
rateonebit-per-symbol that is nearly optimal(within 6%of
the optimal).
LLOYD'S EXAMPLE
Lloyd [l] provides an example where both variational and
fixed-pointapproaches can yield locally optimalquantizers
instead of a globally optimum quantizer. We next proposea
slight modification of the futed-point algorithm that indeed
findsa globally optimum quantizer in Lloyd's example. We
conjecture that thistechnique will workmore generally, but
we have been unable to prove this theoretically. A similar tech-
nique can be used with the stationary-pointalgorithm.
Instead of using samples from the source that we wish to
quantize, we use samples corrupted by additive independent
noise, where the marginal distribution of the noise is such that
onlyone locally optimum quantizer exists for it. As an ex-
ample, for scalar quantization with thesquarederror distor-
tion measure, we use Gaussian noise. In this case, any locally
optimal quantizer is also globally optimum. Otherdistribu-
tions, such as the uniform or a discrete amplitude noise with
an alphabet size equal to the number of quantizer output
levels, can also be used.
When the noise power is much greater thanthe source
power, thedistribution of their sum is essentially the distri-
bution of the noise. We assume that, initially, the noise power
is so large that only one locally optimum quantizer exists for
the sum; hence, regardless of the initial guess, the algorithm
w
i
l
l converge to this optimum. On thenextstep,the noise
power is reduced slightly and the quantizer resulting from the
previous run is used as the initial guess. Intuitively, since the
noise has been reduced by a small amount, the global optimum
for the new sum should be close to that of the previous sum
(we use the same source and noise samples with reduced noise
power).Thus we expectthatthe algorithm will converge to
the global optimum even though new local optimumpoints
might have been introduced. We continue in the same manner
reducing the noise gradually to zero.
92 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL.COM-28,NO. 1 , JANUARY 1980
tOPTIMAL
QUANTIZER
‘/// I/
I : b STATIONARY
-1 I
QUANTIZER
I POINT
Fig. 3. TheProbabilityDensityFunction.
To illustratehow the algorithmworks, we use asource
with a probability density function as shown in Fig. 3. In this
case, there are two locally optimum two-level quantizers. One
has the outputlevels + O S and -0.5 and yields a mean-squared
error of 0.083, the second (which is the global optimum) has
output levels -0.71 and 0.13 and yields a mean-squared error
0.048. (This example is essentially the same as one of Lloyd’s
The modified algorithm was tested on a sequence of 2,000
samples chosen according to the probability density shown in
Fig. 3. Gaussian noise was added starting at unityvariance and
reducing the variance by approximately 50%on each succes-
sive run. The initial guess was+ 0.5 and-0.5 which is the non-
global optimum.Each run was stopped when thedistortion
was changed by less than 0.1%from itsprevious value.
The resultsare given in Fig. 4 and it is seen that, inspite
of the bad initial guess, the modifiedalgorithm converges to
the globally optimum quantizer.
P I J
SPEECH EXAMPLE
Inthenextexample, we consider the case of aspeech
compressionsystemconsisting of an LPC analysis of 20 ms-
long speech frames producing a voiced/unvoiced decision and
a pitch, again,andanormalized inverse filter as previously
described, followed. by quantization where the pitch and gain
are separately quantized as usual, butthe normalizedfilter
coefficients (al, --,aK) = (xo, xl, -, X K - ~ ) are quantized
as a vector with K = 10, using the distortion measure of (7)-
(8). Thetrainingsequenceconsisted of asequence ofnor-
malized inverse fdter parameter vectors(3). The LPC analysis
was digital, and hence the training sequence used was already
“finely quantized” to 10 bits per sample or 100 bits for each
vector. The original gain required 12 bitsper speech frame and
the pitch used 8 bits per speechframe. The total rate of the
LPC output (which we wish to further. compress by block
quantization) is 6000 bits/s. No further compression of gain
or pitch was attempted in these experiments as our goal was
(3)The training sequence and additional testdata of LPC reflection
coefficients were provided by Signal Technology Inc. of Santa Barbara
andwere produced usingstandard LPC techniques on a single male
speaker.
--- STATIONARY POINTS
0.8
0
.
6
x ITERATION RESULT
0 NOISE VARIANCE
CHANGE
LEVEL 1 (VARIANCE
INDICATED)
- l , ~ ~ d u = 0 . 3 5
~ ~ 0 . 5
-1.2
-1.4 LEVEL 2
Q= 0.7
Fig. 4. TheModifiedAlgorithm.
to study only potential improvementwhenblockquantizing
the normalized fiter parameters. The more complete problem
including gain andpitch involves manyother issues and is
the subject of a paperstill in preparation.
Forthedistortion measure of (7)-(8), thecentroid of a
subset S of a training sequence {xj,j = 0, 1, * e * , n - 1) is the
vector u minimizing
(Xj -U)R(Xj)(Xj -U ) f .
; : x j a
We observe that the autocorrelation matrix R(x) is a natural
byproduct of the LPC analysis and need not be recomputed.
This
minimization,
however, is a minimumenergy-residual
minimization problem in LPC analysis and it can be solved by
standard LPC algorithms such asLevinson’s algorithm [lo].
Alternatively, it is a muchstudied minimizationproblemin
Toeplitz matrixtheory[30] and thecentroid can be shown
via variational techniques to be
The splittingtechnique forthe initial guess anda distortion
threshold of 0.5%were used. The complete algorithm for this
example can thus bedescribed as follows:
(0) Initialization: Fix N = 2 R ,R an integer, where N is the
largest number of levels desired. Fix K = 10, n = length of
training sequence, E = .005. SetM = 1.
Given atrainingsequence {xi;j = 0, -*, n - l},set A =
{xj;j = 0, e-, n - l}, the training sequence alphabet. Define
A(1) = ;(A), the centroid of theentire trainingsequence
using (15) or Levinson’s algorithm.
LINDE et ai.:ALGORITHM FOR VECTOR QUANTIZER DESIGN 93
(1) (Splitting): Given A(w= bi,i = 1, *..
,M},split each
reproduction vector y i into yi + E and y i - e, where e is a
fixed perturbation vector. Set Ao(2M) = b
i + E , y i - E ,
i = 1, .-,M} and then replace M by 2 M .
(2) Set m = 0 and Del = 00.
(3) Given A,(M) = bl,* . e , y
"
)
, find its optimum parti-
tion P(A,(M)) = {Si; i = 1, e.., M},that is, xiE Si if d(xj,
vi)G&xi,ut),all 1. Compute theresulting distortion
D, =D<Cjrn(M), P d m (M))})
(4) If (Dm-1 -D,)/D, <E = .005, then go to step (6).
Otherwise continue.
(5) Findtheoptimalreproduction alphabet A,+l(M) =
9(P(A,(M)) = {2(Si);i = 1, -, 1)for P(A,(M)). Replace
m by m + 1 and go to (3).
(6)Set A(M) = A,(M). The final M-level quantizer is
described by A(w.If M = N, haltwith final quantizerde-
scribed by A(N).Otherwise go to step(I).
Table 2 describes the results of the algorithm for N = 64,
and hence forone- to eight-bit quantizers trained on n =
19,000 frames of LPC speech producedby a single speaker.
The distortion at the end of each iteration is given and, in all
cases, the algorithm converged in fewer than14 iterations.
When the resulting quantizers were applied to data from the
same speaker outside of the trainingsequence, the resulting
distortion was within 1%of that within the training sequence.
A total of three andone-half hours of computertimeon a
PDP 11/35 was required to obtain all of these codebooks.
Figure 5 depicts the rate of convergence of the algorithm
with a training sequence length for a 16-level quantizer. Note
the marked difference between the distortion for 2400 frames
inside the training se.quence and outside the training sequence
for short training sequences. For alongtrainingsequence of
over 12,000 frames, however,the distortionis nearly the same.
Tapes of the synthesized speech at 8 bits per frame for the
normalizedmodelsounded similar to those of the original
LPC speech with 100 bits per frame for the normalized model
(the gain and the pitch were both left at theoriginal LPC rate
of 12and 8 bits
per frame, respectively). While extensive
subjective tests were not attempted, all informal listening tests
judged the synthesized speech perfectly intelligible (when
heard before the original LPC!) and the quality only slightly
inferior when thetwo were compared. The overall compres-
sion was from 6000 bits/s to 1400 bits/s. This is not startling
as existing scalar quantizers thatoptimally allocate bits among
the parameters and optimally quantize each parameter using a
spectraldeviation distortion measures [111 also perform well
inthis range. It is,however, promising as these were pre-
liminaryresultswith noattempttofurther compress pitch
and gain (which, taken together in our system, hadmore than
twice the bit rate of the normalized model vector quantizer).
Further results on applications of the algorithm to the overall
speech compression system will be the subject of a forthcom-
ing paper [33].
TABLE 2
ITAKURA-SAITO DISTORTION VS. NUMBER OF ITERATIONS.
TRAINING SEQUENCE LENGTH = 19,000 FRAMES.
NUMBER
OF LEVELS DISTORTION
NUMBER
2
10.33476 1
1.98925
2
1.78301
3
1.67244
4
1.55983
5
1.49814
6
1.48493
7
1.48249
8
4
1.38765 1
1.07906
2
1.04223
3
1.03252
4
1.02709
5
8
0.96210 1
0.85183 2
0.81353
3
0.79191
4
0.77472
5
0.76188
6
0.75130
7
0.74383
8
0.73341
9
0.71999
10
0.71346 11
0.70908
12
0.70578
13
0.70347
14
16
0.64653 1
0.55665 2
0.51810
3
0.50146 4
0.49235
5
0.48761
6
0.48507
7
32
0.44277 1
0.40452
2
0.39388
3
0.38667
4
0.38128
5
0.37778
6
0.37574
7
0.37448
8
64
0.34579 1
0.3178
6 2
0.30850
3
0.30366
4
0.30086
5
0.29891
6
0.29746
7
128
0.27587 1
0.2562
8
2
0.24928
3
0.24550
4
0.24309
5
0.24142
6
0.2402 1 7
0.23933
8
2
56
0.22458 1
0.20830
2
0.20226
3
0.19849
4
0.19623
5
0.19479
6
0.19386 7
0.19319 8
94 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-28, NO. 1, JANUARY 1980
A
1.2
-
1.1
-
I
n
: -
0.9-
0
3 0.8-
N
IY
0.7-
0.6-
0
c
e 0.5-
0.4
-
0.1
-
0.2
-
0.3
-
2 OUTSIDETRAINING
SEQUENCE
SEQUENCE
INSIDE
TRAINING
I I I I
1
0 100 1000 10000
NUMBER OF VECTORS IN THE TRAININGSEQUENCE
Fig. 5. ConvergencewithTrainingSequence.
EPILOGUE
The Gaussian example of Figure 2, Lloyd’s example,and
the speech example were run on a PDP 11/34 minicomputer at
the Stanford University Information Systems Laboratory. The
simple example of Table 1 was run in BASIC on a Cromemco
System 3 microcomputer. As a check,themicrocomputer
program was also used to design quantizers for the Gaussian
case of Figure 2 using the splitting method, k = 1, 2, and 3,
and a training sequence of 10,000 vectors. The results agreed
with the PDP 11/34 run to within one percent.
ACKNOWLEDGMENT
Theauthors would like to acknowledge thehelp of J.
Markel of Signal Technology, Inc. of Santa Barbara and A. H.
Gray, Jr., of the University of California at Santa Barbara in
both theanalysis and synthesis of thespeech example.
REFERENCES
[I] Lloyd, S. P., “LeastSquaresQuantization inPCM’s,”BellTelephone
Laboratories Paper, Murray Hill, NJ, 1957.
121 Gray,R.M.,J.C. KiefferandY.Linde.“LocallyOptimalBlock
Quantization for Sources without a Statistical
Model,’’
Stanford
UniversityInformationSystemsLabTechnicalReportNo.L-904-1,
Stanford, CA, May 1979 (submittedfor publication).
131 Itakura. F. and S. Saito.“AnalysisSynthesisTelephonyBasedUpon
MaximumLikelihood Method,” Repfs. ofthe 6thInrernar’l. Cong.
Acousr.. Y.Kohasi, ed., Tokyo, C-5-5, C17-20. 1%8.
141 Itakura. F.,“Maximum Prediction Residual Principle Applied to Speech
Recognition,” IEEE Trans. ASSP,23, pp. 67-72, Feb. 1975.
151 Chaffee. D.
L.. “Applications of Rate
Distortion
Theory
to
the
of Calif. at Los Angeles.1975.
Bandwidth Compression of Speech
Signals,” Ph.D. Dissertation, Univ.
[6]Gray,R.M., A.Buzo. A. H.Gray, Jr., and J. D. Markel,“Source
Coding and Speech
Compression,” Proc. of rhe 1978
Inrernar’l.
Telemetering Conf., pp. 371-878, 1978.
Matsuyama, Y.,A.
Buzo
and
R. M. Gray, “Spectral
Distortion
Measures for SpeechCompression,”StanfordUniv.Inform.Systems
Lab. Tech. Rept. 6504-3, Stanford, CA, April, 1978.
Buzo, A., “OptimalVectorQuantization for LinearPredictedCoded
Speech,” Stanford Univ., Ph.D. Dissertation,Dept.ofElec. Engrg.,
August, 1978.
Matsuyama, Y. A,, “Process
Distortion
Measures
and
Signal
Processing,” Ph.D. Dissertation, Dept.of Elec. Engrg., StanfordUniv.,
1978.
Markel, J. D. and A. H.Gray, Jr., LinearPredicrion o
f Speech.
Springer-Verlag. NY 1976.
Gray, A. H.,
Jr., R. M.Gray andJ. D. Markel. “Comparison of Optimal
Quantizations of Speech Reflection Coefficients,” IEEE Trans. ASSP,
Vol. 24, pp. 4-23. Feb. 1977.
Dalenius. T., “The Problem of Optimum Stratification,” Skndinavisk
Akruuvielidskrifr.Vol. 33, pp. 203-213, 1950.
Fisher, W. D., “On a PoolingProblemfromtheStatisticalDecision
Viewpoint,” Econometrica, Vol. 21, pp. 567-585, 1953.
Cox, D. R.. “Note on Grouping,’’ J . o
f the Amer. Sraris. Assoc.,
Vol.
52. pp. 543-547, 1957.
Max, J., “Quantizing for Minimum Distortion.” IRETrans. on Inform.
Theory,IT-6, pp. 7-12, March 1960.
Fleischer, P., “Sufficient Conditionsfor Achieving Minimum Distortion
inaQuantizer,”IEEEInr. Conv. Rec.,pp. 104-111, 1964.
Luenberger, D. G.,Oprimizarionby Vector SpaceMethods, John Wiley
&Sons, NY, 1969.
Luenberger, D. G., IntroductiontoLinear andNonlinear Programming.
Addison-Wesley, Reading, MA, 1973.
NJ, 1970.
Rockafellar, R. T.,Convex Analysis. Princeton Univ. Press, Princeton,
Zador, P., “Topics in theAsymptoticQuantizationofContinuous
Random Variables,” Bell
Telephone
Laboratories
Technical
Memorandum, Feb. 1966.
Gersho, A., “Asymptotically
Optimal
Block
Quantization,” IEEE
Trans.on Inform. Theory, Vol. IT-25, pp. 373-380, 1979.
Chen. D. T. S., “On Two or More Dimensional Optimum Quantizers,”
Proc. 1977 IEEE Inrernat’l. Conf. on Acousrics.Speech, & Signal
Processing, pp. 640-643, 1977.
Caprio, J. R., N. Westin and J. Esposito, “Optimum Quantization for
Minimum Distortion,” Proc. of rhe Inrernar’l.TelemereringConf., pp.
315-323. NOV.1978.
LINDE et al :ALGORITHM
FOR
VECTOR
QUANTIZER DESIGN 95
Menez, J., F. Bceri, and D. J. Esteban,“Optimum QuantizerAlgorithm
for Real-Time Block Quantizing,” Proc. of the I979 IEEE Inrernat’l.
Conf. onAcoustics, Speech, & SignulProcessing. pp. 980-984, 1979.
MacQueen, J., “Some Methods for Classification and
Analysis
of
Multivariate Observations,” Proc.of rhe Fifh Berkeley Symposium on
Math., Srar. andProb., Vol. 1 , pp. 281-296, 1967.
Ball, G. H. and D. J.
Hall, “Isodata-An Iterative Method
of
Multivariate Analysis and Pattern Classification,” in Proc. IFIPS
Congr., 1965.
Levinson, S . E., L. R. Rabiner,A. E. Rosenbergand J. G . Wilson,
“Interactive Clustering Techniques for Selecting Speaker-Independent
Techniques forSelecting Speaker-IndependentReferenceTemplatesfor
IsolatedWord Recognition,” IEEE Trans. ASSP.Vol. 27, pp. 134-141,
1979.
Berger, T., Rate DistortionTheory, Prentice-Hall, Englewood Cliffs,
NJ, 1971.
Yamada, Y.,S. Tazaki and R. M. Gray, “Asymptotic Performanceof
Block Quantizers with Difference Distortion Measures,’’ to appear,
IEEE Trans.on Inform. Theory.
Grenander, U. and G. Szego, Toeplirz Forms and Their Applications,
Univ. ofcalif. Press, Berkeley, 1958.
Forgy, E., “Cluster Analysis of Multivarate Data: Efficiency vs.
Interpretability of Classifications,” Abstract, Biometrics, Vol. 21, p.
768, 1965.
Chaffee, D. L. and J. K. Omura, “A Very Low Rate Voice
Compression
System,‘’ Abstract in Abstracts o
fPapers, 1974,IEEE Intern. Symp.on
Inform. Theory,Notre Dame, Oct. 28-31,
IEEE, 1974.
Dr. Linde is currently with Codex Corporation, Mansfxld, Massachusetts
where he is involved in research and development in the areas of digital signal
processing, modems and data networks.
*
And& Buzo(S’7&M’78) was born
in MexicoCity,
’ Mexico, on November 30, 1949.
He
received
the
electrical and mechanical engineer degree from the
National University of Mexico, Mexico City, in
1974, and theM.S. and Ph.D. degreesfrom Stanford
University, Stanford, California in1975and1978,
respectively.
In 1978hewas at Signal Technology in Santa
Barbara,California. Nowhe is at the Instituto de
Ingenieria ofthe National University
of
Mexico
whereheis engaged inresearchon digital signal
processing of speech signalsand data compression.
*
Buzo. A., A. H.Gray, Jr., R. M. Gray, and J. D. Markel, “Speech - -- Robert M.Gray (S’68-M’69SM’77) was
born
in
Coding Based on Vector Quantization,” submitted for publication. San Diego,
California, on
November 1. 1943.
He
received the B.S. and M.S. degrees in electrical
engineering from the Massachusetts Institute of
research assistantat the
involved in research in
codes.
YosephLinde(S’7&M’78)was born in
Sendzislaw,
Polandon June 14, 1947. He received the B.Sc.
degree from the Technion, Israel Institute of
Technology in 1970, the M.Sc. degree from theTel-
Aviv University in 1975 and the Ph.D. degree from
Stanford University, Stanford,California, in1977,
all in Electrical Engineering.
From 1970to 1975 he was with theSignalCorps.,
Israeli Defense Forces, wherehewasinvolved in
research and development of military
communications systems. In 1976 and 1977 he wasa
Information SystemsLaboratoryat StanfordUniversity
data compression systems, in particular tree and trellis
Technology, Cambridge, in 1966,
and
the
Ph.D.
degree from the University of Southern California,
Los Angeles, in 1969.
Since 1969
he has been
with
the Electrical
Engineering Department and
the
Information
Systems Laboratories, Stanford University,
Stanford, California,
where he is engagedin teaching
and research in communicationsand informationtheory.
Dr. Grayisa member of Sigma Xi, Eta Kappa Nu, theMathematical
Association of America, the Society for Industrial and Applied Mathematics,
the Institute of Mathematical Statistics, the American Mathematical Society,
and the Soci&?des Ingdnieurset Scientifiquesde France. He hasbeen amember
Theory since 1975 and an Associate Editor of the IEEE TRANSACTIONS ON
of the Board of Governors oftheIEEE Professional Group on Information
INFORMATIONTHEORY since September1977.

More Related Content

Similar to An Algorithm For Vector Quantizer Design (20)

vector QUANTIZATION
vector QUANTIZATIONvector QUANTIZATION
vector QUANTIZATION
aniruddh Tyagi
 
vector QUANTIZATION
vector QUANTIZATIONvector QUANTIZATION
vector QUANTIZATION
aniruddh Tyagi
 
MDCT audio coding with pulse vector quantizers
MDCT audio coding with pulse vector quantizersMDCT audio coding with pulse vector quantizers
MDCT audio coding with pulse vector quantizers
Ericsson
 
Waveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptxWaveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptx
KIRUTHIKAAR2
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
tullaomamiki
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
azulaycelill
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
cooks2liepecr
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
coiamadala
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
nyzamdagion57
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
tullaomamiki
 
Lassy
LassyLassy
Lassy
anithabalaprabhu
 
Pyramid Vector Quantization
Pyramid Vector QuantizationPyramid Vector Quantization
Pyramid Vector Quantization
ShahDhruv21
 
Vector quantization
Vector quantizationVector quantization
Vector quantization
Rajani Sharma
 
presentation
presentationpresentation
presentation
jie ren
 
Distortion criteria,models,scalar quantization
Distortion criteria,models,scalar quantizationDistortion criteria,models,scalar quantization
Distortion criteria,models,scalar quantization
nivetha10042003
 
G010424248
G010424248G010424248
G010424248
IOSR Journals
 
Allerton
AllertonAllerton
Allerton
mustafa sarac
 
Vector Quantization Vs Scalar Quantization
Vector Quantization Vs Scalar Quantization Vector Quantization Vs Scalar Quantization
Vector Quantization Vs Scalar Quantization
ManasiKaur
 
coba dl.pdf
coba dl.pdfcoba dl.pdf
coba dl.pdf
AlexBukit
 
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
IJERA Editor
 
MDCT audio coding with pulse vector quantizers
MDCT audio coding with pulse vector quantizersMDCT audio coding with pulse vector quantizers
MDCT audio coding with pulse vector quantizers
Ericsson
 
Waveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptxWaveform_codingUNIT-II_DC_-PPT.pptx
Waveform_codingUNIT-II_DC_-PPT.pptx
KIRUTHIKAAR2
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
tullaomamiki
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
azulaycelill
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
cooks2liepecr
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
coiamadala
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
nyzamdagion57
 
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions ManualFundamentals of Communication Systems 1st Edition Proakis Solutions Manual
Fundamentals of Communication Systems 1st Edition Proakis Solutions Manual
tullaomamiki
 
Pyramid Vector Quantization
Pyramid Vector QuantizationPyramid Vector Quantization
Pyramid Vector Quantization
ShahDhruv21
 
presentation
presentationpresentation
presentation
jie ren
 
Distortion criteria,models,scalar quantization
Distortion criteria,models,scalar quantizationDistortion criteria,models,scalar quantization
Distortion criteria,models,scalar quantization
nivetha10042003
 
Vector Quantization Vs Scalar Quantization
Vector Quantization Vs Scalar Quantization Vector Quantization Vs Scalar Quantization
Vector Quantization Vs Scalar Quantization
ManasiKaur
 
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
IJERA Editor
 

More from Angie Miller (20)

Writing Poetry In The Upper Grades Poetry Lessons,
Writing Poetry In The Upper Grades Poetry Lessons,Writing Poetry In The Upper Grades Poetry Lessons,
Writing Poetry In The Upper Grades Poetry Lessons,
Angie Miller
 
ReMarkable 2 Is A 10.3-Inch E-Paper Tablet With A Stylus, Starts At
ReMarkable 2 Is A 10.3-Inch E-Paper Tablet With A Stylus, Starts AtReMarkable 2 Is A 10.3-Inch E-Paper Tablet With A Stylus, Starts At
ReMarkable 2 Is A 10.3-Inch E-Paper Tablet With A Stylus, Starts At
Angie Miller
 
Printable Lined Paper For Kids That Are Soft Harper Blog
Printable Lined Paper For Kids That Are Soft Harper BlogPrintable Lined Paper For Kids That Are Soft Harper Blog
Printable Lined Paper For Kids That Are Soft Harper Blog
Angie Miller
 
Writing Your Introduction, Transitions, And Conclusion
Writing Your Introduction, Transitions, And ConclusionWriting Your Introduction, Transitions, And Conclusion
Writing Your Introduction, Transitions, And Conclusion
Angie Miller
 
Groundhog Day Writing Paper
Groundhog Day Writing PaperGroundhog Day Writing Paper
Groundhog Day Writing Paper
Angie Miller
 
5 Writing Tips To Help Overcome Anxiety Youn
5 Writing Tips To Help Overcome Anxiety Youn5 Writing Tips To Help Overcome Anxiety Youn
5 Writing Tips To Help Overcome Anxiety Youn
Angie Miller
 
How To Write An Essay In 6 Simple Steps ScoolWork
How To Write An Essay In 6 Simple Steps ScoolWorkHow To Write An Essay In 6 Simple Steps ScoolWork
How To Write An Essay In 6 Simple Steps ScoolWork
Angie Miller
 
Scroll Paper - Cliparts.Co
Scroll Paper - Cliparts.CoScroll Paper - Cliparts.Co
Scroll Paper - Cliparts.Co
Angie Miller
 
Hnh Nh Bn, S Tay, Vit, Cng Vic, Ang Lm Vic, Sch, Ngi
Hnh Nh Bn, S Tay, Vit, Cng Vic, Ang Lm Vic, Sch, NgiHnh Nh Bn, S Tay, Vit, Cng Vic, Ang Lm Vic, Sch, Ngi
Hnh Nh Bn, S Tay, Vit, Cng Vic, Ang Lm Vic, Sch, Ngi
Angie Miller
 
Recycling Essay Essay On Re
Recycling Essay Essay On ReRecycling Essay Essay On Re
Recycling Essay Essay On Re
Angie Miller
 
Pin On PAPER SHEETS
Pin On PAPER SHEETSPin On PAPER SHEETS
Pin On PAPER SHEETS
Angie Miller
 
Pin By Cloe Einam On Referencing Harvard Referencing, Essay, Essa
Pin By Cloe Einam On Referencing Harvard Referencing, Essay, EssaPin By Cloe Einam On Referencing Harvard Referencing, Essay, Essa
Pin By Cloe Einam On Referencing Harvard Referencing, Essay, Essa
Angie Miller
 
Pin Von Carmen Perez De La Cruz Auf German-BRIEF,
Pin Von Carmen Perez De La Cruz Auf German-BRIEF,Pin Von Carmen Perez De La Cruz Auf German-BRIEF,
Pin Von Carmen Perez De La Cruz Auf German-BRIEF,
Angie Miller
 
Powerful Quotes To Start Essays. QuotesGram
Powerful Quotes To Start Essays. QuotesGramPowerful Quotes To Start Essays. QuotesGram
Powerful Quotes To Start Essays. QuotesGram
Angie Miller
 
Can Essay Writing Services Be Trusted - UK Writing Experts Blog
Can Essay Writing Services Be Trusted - UK Writing Experts BlogCan Essay Writing Services Be Trusted - UK Writing Experts Blog
Can Essay Writing Services Be Trusted - UK Writing Experts Blog
Angie Miller
 
The SmARTteacher Resource Writing An Essa
The SmARTteacher Resource Writing An EssaThe SmARTteacher Resource Writing An Essa
The SmARTteacher Resource Writing An Essa
Angie Miller
 
Order Paper Writing Help 24
Order Paper Writing Help 24Order Paper Writing Help 24
Order Paper Writing Help 24
Angie Miller
 
How To Format A College Application Essay
How To Format A College Application EssayHow To Format A College Application Essay
How To Format A College Application Essay
Angie Miller
 
Thanksgiving Printable Worksheets Colorful Fall,
Thanksgiving Printable Worksheets Colorful Fall,Thanksgiving Printable Worksheets Colorful Fall,
Thanksgiving Printable Worksheets Colorful Fall,
Angie Miller
 
Writing Paper, Notebook Paper, , (2)
Writing Paper, Notebook Paper, ,  (2)Writing Paper, Notebook Paper, ,  (2)
Writing Paper, Notebook Paper, , (2)
Angie Miller
 
Writing Poetry In The Upper Grades Poetry Lessons,
Writing Poetry In The Upper Grades Poetry Lessons,Writing Poetry In The Upper Grades Poetry Lessons,
Writing Poetry In The Upper Grades Poetry Lessons,
Angie Miller
 
ReMarkable 2 Is A 10.3-Inch E-Paper Tablet With A Stylus, Starts At
ReMarkable 2 Is A 10.3-Inch E-Paper Tablet With A Stylus, Starts AtReMarkable 2 Is A 10.3-Inch E-Paper Tablet With A Stylus, Starts At
ReMarkable 2 Is A 10.3-Inch E-Paper Tablet With A Stylus, Starts At
Angie Miller
 
Printable Lined Paper For Kids That Are Soft Harper Blog
Printable Lined Paper For Kids That Are Soft Harper BlogPrintable Lined Paper For Kids That Are Soft Harper Blog
Printable Lined Paper For Kids That Are Soft Harper Blog
Angie Miller
 
Writing Your Introduction, Transitions, And Conclusion
Writing Your Introduction, Transitions, And ConclusionWriting Your Introduction, Transitions, And Conclusion
Writing Your Introduction, Transitions, And Conclusion
Angie Miller
 
Groundhog Day Writing Paper
Groundhog Day Writing PaperGroundhog Day Writing Paper
Groundhog Day Writing Paper
Angie Miller
 
5 Writing Tips To Help Overcome Anxiety Youn
5 Writing Tips To Help Overcome Anxiety Youn5 Writing Tips To Help Overcome Anxiety Youn
5 Writing Tips To Help Overcome Anxiety Youn
Angie Miller
 
How To Write An Essay In 6 Simple Steps ScoolWork
How To Write An Essay In 6 Simple Steps ScoolWorkHow To Write An Essay In 6 Simple Steps ScoolWork
How To Write An Essay In 6 Simple Steps ScoolWork
Angie Miller
 
Scroll Paper - Cliparts.Co
Scroll Paper - Cliparts.CoScroll Paper - Cliparts.Co
Scroll Paper - Cliparts.Co
Angie Miller
 
Hnh Nh Bn, S Tay, Vit, Cng Vic, Ang Lm Vic, Sch, Ngi
Hnh Nh Bn, S Tay, Vit, Cng Vic, Ang Lm Vic, Sch, NgiHnh Nh Bn, S Tay, Vit, Cng Vic, Ang Lm Vic, Sch, Ngi
Hnh Nh Bn, S Tay, Vit, Cng Vic, Ang Lm Vic, Sch, Ngi
Angie Miller
 
Recycling Essay Essay On Re
Recycling Essay Essay On ReRecycling Essay Essay On Re
Recycling Essay Essay On Re
Angie Miller
 
Pin On PAPER SHEETS
Pin On PAPER SHEETSPin On PAPER SHEETS
Pin On PAPER SHEETS
Angie Miller
 
Pin By Cloe Einam On Referencing Harvard Referencing, Essay, Essa
Pin By Cloe Einam On Referencing Harvard Referencing, Essay, EssaPin By Cloe Einam On Referencing Harvard Referencing, Essay, Essa
Pin By Cloe Einam On Referencing Harvard Referencing, Essay, Essa
Angie Miller
 
Pin Von Carmen Perez De La Cruz Auf German-BRIEF,
Pin Von Carmen Perez De La Cruz Auf German-BRIEF,Pin Von Carmen Perez De La Cruz Auf German-BRIEF,
Pin Von Carmen Perez De La Cruz Auf German-BRIEF,
Angie Miller
 
Powerful Quotes To Start Essays. QuotesGram
Powerful Quotes To Start Essays. QuotesGramPowerful Quotes To Start Essays. QuotesGram
Powerful Quotes To Start Essays. QuotesGram
Angie Miller
 
Can Essay Writing Services Be Trusted - UK Writing Experts Blog
Can Essay Writing Services Be Trusted - UK Writing Experts BlogCan Essay Writing Services Be Trusted - UK Writing Experts Blog
Can Essay Writing Services Be Trusted - UK Writing Experts Blog
Angie Miller
 
The SmARTteacher Resource Writing An Essa
The SmARTteacher Resource Writing An EssaThe SmARTteacher Resource Writing An Essa
The SmARTteacher Resource Writing An Essa
Angie Miller
 
Order Paper Writing Help 24
Order Paper Writing Help 24Order Paper Writing Help 24
Order Paper Writing Help 24
Angie Miller
 
How To Format A College Application Essay
How To Format A College Application EssayHow To Format A College Application Essay
How To Format A College Application Essay
Angie Miller
 
Thanksgiving Printable Worksheets Colorful Fall,
Thanksgiving Printable Worksheets Colorful Fall,Thanksgiving Printable Worksheets Colorful Fall,
Thanksgiving Printable Worksheets Colorful Fall,
Angie Miller
 
Writing Paper, Notebook Paper, , (2)
Writing Paper, Notebook Paper, ,  (2)Writing Paper, Notebook Paper, ,  (2)
Writing Paper, Notebook Paper, , (2)
Angie Miller
 
Ad

Recently uploaded (20)

Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..
faizanaltaf231
 
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSELET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
OlgaLeonorTorresSnch
 
LDMMIA Free Reiki Yoga S7 Weekly Workshops
LDMMIA Free Reiki Yoga S7 Weekly WorkshopsLDMMIA Free Reiki Yoga S7 Weekly Workshops
LDMMIA Free Reiki Yoga S7 Weekly Workshops
LDM & Mia eStudios
 
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdfপ্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
Pragya - UEM Kolkata Quiz Club
 
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
wygalkelceqg
 
POS Reporting in Odoo 18 - Odoo 18 Slides
POS Reporting in Odoo 18 - Odoo 18 SlidesPOS Reporting in Odoo 18 - Odoo 18 Slides
POS Reporting in Odoo 18 - Odoo 18 Slides
Celine George
 
Diana Enriquez Wauconda - A Wauconda-Based Educator
Diana Enriquez Wauconda - A Wauconda-Based EducatorDiana Enriquez Wauconda - A Wauconda-Based Educator
Diana Enriquez Wauconda - A Wauconda-Based Educator
Diana Enriquez Wauconda
 
Order: Odonata Isoptera and Thysanoptera.pptx
Order: Odonata Isoptera and Thysanoptera.pptxOrder: Odonata Isoptera and Thysanoptera.pptx
Order: Odonata Isoptera and Thysanoptera.pptx
Arshad Shaikh
 
LDMMIA Bonus GUEST GRAD Student Check-in
LDMMIA Bonus GUEST GRAD Student Check-inLDMMIA Bonus GUEST GRAD Student Check-in
LDMMIA Bonus GUEST GRAD Student Check-in
LDM & Mia eStudios
 
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
SweetytamannaMohapat
 
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
Sritoma Majumder
 
K-Circle-Weekly-Quiz-May2025_12345678910
K-Circle-Weekly-Quiz-May2025_12345678910K-Circle-Weekly-Quiz-May2025_12345678910
K-Circle-Weekly-Quiz-May2025_12345678910
PankajRodey1
 
Multicultural approach in education - B.Ed
Multicultural approach in education - B.EdMulticultural approach in education - B.Ed
Multicultural approach in education - B.Ed
prathimagowda443
 
Odoo 18 Point of Sale PWA - Odoo Slides
Odoo 18 Point of Sale PWA  - Odoo  SlidesOdoo 18 Point of Sale PWA  - Odoo  Slides
Odoo 18 Point of Sale PWA - Odoo Slides
Celine George
 
How to Manage Orders in Odoo 18 Lunch - Odoo Slides
How to Manage Orders in Odoo 18 Lunch - Odoo SlidesHow to Manage Orders in Odoo 18 Lunch - Odoo Slides
How to Manage Orders in Odoo 18 Lunch - Odoo Slides
Celine George
 
How to Setup Renewal of Subscription in Odoo 18
How to Setup Renewal of Subscription in Odoo 18How to Setup Renewal of Subscription in Odoo 18
How to Setup Renewal of Subscription in Odoo 18
Celine George
 
Order Lepidoptera: Butterflies and Moths.pptx
Order Lepidoptera: Butterflies and Moths.pptxOrder Lepidoptera: Butterflies and Moths.pptx
Order Lepidoptera: Butterflies and Moths.pptx
Arshad Shaikh
 
How to Create a Stage or a Pipeline in Odoo 18 CRM
How to Create a Stage or a Pipeline in Odoo 18 CRMHow to Create a Stage or a Pipeline in Odoo 18 CRM
How to Create a Stage or a Pipeline in Odoo 18 CRM
Celine George
 
PHYSIOLOGY & SPORTS INJURY by Diwakar Sir
PHYSIOLOGY & SPORTS INJURY by Diwakar SirPHYSIOLOGY & SPORTS INJURY by Diwakar Sir
PHYSIOLOGY & SPORTS INJURY by Diwakar Sir
Diwakar Kashyap
 
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT PatnaSwachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Quiz Club, Indian Institute of Technology, Patna
 
Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..
faizanaltaf231
 
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSELET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
LET´S PRACTICE GRAMMAR USING SIMPLE PAST TENSE
OlgaLeonorTorresSnch
 
LDMMIA Free Reiki Yoga S7 Weekly Workshops
LDMMIA Free Reiki Yoga S7 Weekly WorkshopsLDMMIA Free Reiki Yoga S7 Weekly Workshops
LDMMIA Free Reiki Yoga S7 Weekly Workshops
LDM & Mia eStudios
 
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdfপ্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
প্রত্যুৎপন্নমতিত্ব - Prottutponnomotittwa 2025.pdf
Pragya - UEM Kolkata Quiz Club
 
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
Active Surveillance For Localized Prostate Cancer A New Paradigm For Clinical...
wygalkelceqg
 
POS Reporting in Odoo 18 - Odoo 18 Slides
POS Reporting in Odoo 18 - Odoo 18 SlidesPOS Reporting in Odoo 18 - Odoo 18 Slides
POS Reporting in Odoo 18 - Odoo 18 Slides
Celine George
 
Diana Enriquez Wauconda - A Wauconda-Based Educator
Diana Enriquez Wauconda - A Wauconda-Based EducatorDiana Enriquez Wauconda - A Wauconda-Based Educator
Diana Enriquez Wauconda - A Wauconda-Based Educator
Diana Enriquez Wauconda
 
Order: Odonata Isoptera and Thysanoptera.pptx
Order: Odonata Isoptera and Thysanoptera.pptxOrder: Odonata Isoptera and Thysanoptera.pptx
Order: Odonata Isoptera and Thysanoptera.pptx
Arshad Shaikh
 
LDMMIA Bonus GUEST GRAD Student Check-in
LDMMIA Bonus GUEST GRAD Student Check-inLDMMIA Bonus GUEST GRAD Student Check-in
LDMMIA Bonus GUEST GRAD Student Check-in
LDM & Mia eStudios
 
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
IDSP(INTEGRATED DISEASE SURVEILLANCE PROGRAMME...
SweetytamannaMohapat
 
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
Sritoma Majumder
 
K-Circle-Weekly-Quiz-May2025_12345678910
K-Circle-Weekly-Quiz-May2025_12345678910K-Circle-Weekly-Quiz-May2025_12345678910
K-Circle-Weekly-Quiz-May2025_12345678910
PankajRodey1
 
Multicultural approach in education - B.Ed
Multicultural approach in education - B.EdMulticultural approach in education - B.Ed
Multicultural approach in education - B.Ed
prathimagowda443
 
Odoo 18 Point of Sale PWA - Odoo Slides
Odoo 18 Point of Sale PWA  - Odoo  SlidesOdoo 18 Point of Sale PWA  - Odoo  Slides
Odoo 18 Point of Sale PWA - Odoo Slides
Celine George
 
How to Manage Orders in Odoo 18 Lunch - Odoo Slides
How to Manage Orders in Odoo 18 Lunch - Odoo SlidesHow to Manage Orders in Odoo 18 Lunch - Odoo Slides
How to Manage Orders in Odoo 18 Lunch - Odoo Slides
Celine George
 
How to Setup Renewal of Subscription in Odoo 18
How to Setup Renewal of Subscription in Odoo 18How to Setup Renewal of Subscription in Odoo 18
How to Setup Renewal of Subscription in Odoo 18
Celine George
 
Order Lepidoptera: Butterflies and Moths.pptx
Order Lepidoptera: Butterflies and Moths.pptxOrder Lepidoptera: Butterflies and Moths.pptx
Order Lepidoptera: Butterflies and Moths.pptx
Arshad Shaikh
 
How to Create a Stage or a Pipeline in Odoo 18 CRM
How to Create a Stage or a Pipeline in Odoo 18 CRMHow to Create a Stage or a Pipeline in Odoo 18 CRM
How to Create a Stage or a Pipeline in Odoo 18 CRM
Celine George
 
PHYSIOLOGY & SPORTS INJURY by Diwakar Sir
PHYSIOLOGY & SPORTS INJURY by Diwakar SirPHYSIOLOGY & SPORTS INJURY by Diwakar Sir
PHYSIOLOGY & SPORTS INJURY by Diwakar Sir
Diwakar Kashyap
 
Ad

An Algorithm For Vector Quantizer Design

  • 1. 84 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-28, NO. 1, JANUARY 1980 An Algorithmfor Vector Quantizer Design YOSEPHLINDE, MEMBER. IEEE. ANDRES BUZO, MEMBER, EEE, A m ROBERTM.GRAY,SENIORMEMBER. EEE ’ Abstract-An efficient,andintuitivealgorithm is presented for the design of vector quantizersbased either on a known prohabitistic model or on a long training sequence of d a t a .The basic propertiesof the algorithm are discussed mid demonstrated by examples. Quite general distoriion measures and long blocklengthsareallowed,asexemplifiedby the design of parameter vector quantizers of tendiensional vectors arising in Linear Predictive Coded(LE) speechcompression witha complicated distortion measure arisiig in LPC analysis that does not depend only on the error vector. INTRODUCTION A- N efficient and intuitive algorithm for the design of good blockor vector quantizerswith ‘quite general distortion measures is developed for use oneitherknown probabilistic source descriptions or ona long training sequenceof data. The algorithm is based on in approach of Lloyd [I] ,is not a varia- tional technique, and involves no differentiation; hence it works well even whenthedistribution hasdiscrete compo- nents, as is the case when a sample distribution obtained from atrainingsequence is used. As with the common variational techniques, the algorithm produces a quantizer meeting neces- sary butnot sufficient conditions for optimality. Usually, however, at least localoptimality is assured in both approaches. We here motivate and describe the algorithm and relate it to a number of similar algorithms for special cases that have appeared in both the quantization and cluster analysis litera- ture. The basic operation of the algorithm is simple and intuitive in the general case considered here and it is clear that variational techniques are not required to develop nor to apply the algorithm. Several of the algorithm’s basic properties are developed using heuristicargumentsand demonstratedby example. In a companion theoretical paper [ 2 ] ,these properties are given precise mathematicalstatementsand are proved using -argu- mentsfromoptiniizationtheory andergodic theory. Those results will occasionally be quoted here to characterize the generality of certain properties. Paper approved by the Editor for Data Communication Systems of the IEEE Communications.Society for publication after presentation in part at the 1979 International Telemetering Conference, Los Angeles, CA, November 1978 and the 1979 International Symposium on hfor- mation Theory, Gringano, Italy; June 1979. Manuscriptreceived May 22, 1978; revised August 21,1979.Thisworkwassupportedby Air Force Contract F44620-73-0065, F49620-78C-0087, and F49620-79- C-0058 andbytheJoint Services Electronics Program atStanford University, Stanford, CA. Y.Linde is with the Codex Corporation, Mansfield, MA. A. BuzowaswithStanford University, Stanford, CA and Signal Technology Inc., Santa Barbara, CA.Heis nowwiththeInstitutode Ingenieria, National University of Mexico, MexicoCity, Mexico. R. M. Gray is with the Information Systems Laboratory, Stanford University, Stanford, CA 94305. In particular, the algorithm’s convergence properties are demonstrated herein by several examples. We consider the usual test cas. for such algorithms-quantizer, design for memoryless Gaussian sources with a mean-squared.error distor- tion measure; but we design and evaluate block quantizers with a rateof one bit per symbol and with blocklengths,of 1 through 6. Comparison with recently developed lower bounds to the optimaldistortion of such block quantizers (which provide strict improvement over the traditional bounds of rate-distor- tion theory) indicate that the resulting quantizers are indeed nearly optimalandnot simply locally optimal. We also con- sider a scalar case where local optima arise and show how a variation of the algorithm yields aglobal optimum. . The algorithm is also used to design a quantizer for 1Odi- mensional vectors- arising inspeech compression systems. A complicated distortion measure is used that does not simply depend on the errorvector. No probabilistic model is assumed, and hence the quantizer must be designed based on a training sequence of real speech. Here the convergence properties for both length ofthe training sequence and the number of iterations of the algorithm are demonstratedexperimentally; No theoretical optimumjs known for this case,but our system was used to compress the output of a traditional 6000 bit/s Linear Predictive Coded (LPC) speech system down to a rate of 1400 bit& with only a slight loss in quality as judged by untrained listeners in informal subjective tests. To theauthors’ knowledge,directapplication of variationaltechniques have not succeededin designing block quantizersforsuch large block lengths and suchcomplicated distortion measures. BLOCK QUANTIZERS An N-level k-dimensionalquantizer is amapping, q;that assigns to each input vector, x = (xo, -, X ~ L I ) , a reproduc- tion vector, i= q(x), drawn from a finite reproduction alphabet, A = bi; i = 1, -, N}. The quantizer 4 is compl$ely described bythereproduction alphabet(or codebook) A to- gether with the partition, S = {Si; i = 1, *-*, N), of the input vector space into the sets Si= {x: q(x) =vi)of input vectors mapping into the ith reproduction vector (or codeword)., Such quantizers are also called block quantizers, vector quantizers, and block source codes. DISTORTION MEASURES We assume thedistortion caused byreproducing an input vector x by a reproduction vector iis giveri by a nonnegative distortion measure d(x, 2). Many such distortion measures have beenproposedin theliterature. Themost commonfor 0090-6778/80/0100-0084$00.75 0 1980 IEEE
  • 2. LINDE et al. :ALGORITHM FOR VECTOR QUANTIZER DESIGN 85 reasons of mathematical convenience is the squarederror dis- tortion. k-1 i=0 Other common distortionmeasures are the l,, or Holder norm, and its vth power, the uth-law distortion: b-1 While both distortion measures (2) and (3) depend on theuth power of the errorsin the separate coordinates, themeasure of (2) is often more useful since it is adistance ormetricand hence satisfies the triangle inequality, d(x, 2) d d(x, y ) + d @ ,.?), for all y . The triangle inequality allows one to bound the overall distortion easily in a multi-step system by the sum of the individual distortions incurredin each step. The usual vth-law distortionof (3) does not have this property. Other distortion measures are the I,, or Minkowski norm, the weighted-squares distortion, k--3 where wi 2 0, i = 0 , .-e, k - 1,and the more general quadratic distortion d(x,i ) = (x -i)B(x -i ) t where B = {Bi,j}is a k X k positive definite symmetric matrix. All of the previously described distortion measures have the property that they depend on the vectors x and P only through the error vector x - i . Such distortion measures having the form d(x, i ) = L(x -2) are called difference distortion meas- ures. Distortion measures not having this form but depending on x and 2 in a more complicated fashion have also been pro- posed for data compression systems. Of interest here is a dis- tortion measure of Itakura and Saito [3, 41 and Chaffee [5,321 which arises inspeechcompressionsystems andhas the form d(x ,i )= (x -i ) R @)(x -i)f , (7) where for each x , R ( x )is a positive definite k X k symmetric matrix. This distortion resembles thequadraticdistortionof (6),but here the weighting matrix depends on the inputvector We are here concerned with the particular form andapplica- tion of this distortion measure rather than its origins, which are treated in depth in [3-91 and in a paper in preparation. For motivation, however, we briefly describe the context in which this distortion measure is used in speech systems. In the LPC approach to speech compression [IO] ,each frame of sampled speech is modeled as the output of a finite-order all-pole filter driven by either white noise (unvoiced sounds) or aperiodic pulse train (voiced sounds). LPC analysis has, as input, a frame of speech and produces parameters describing the model. These parameters are thenquantized and transmitted. One collection of suchparameters consists of a voiced/unvoiced decision together witha pitch estimate for voiced sounds, a gain term u (related to volume), and the sample response of the normalized inverse filter (1, al, a2, . * a , aK), thatis,the normalized all-pole model has transfer function or z-transform (Zf=o Q ~ z - ~ } - ~ . Other parameterdescriptionssuch as the reflection coefficients are also possible [101. In traditional LPC systems, the various parameters are quantized separately, but suchsystemshave effectively reached their theoretical performance limits [1I] .Hence it is natural to consider block quantization of these parameters and com- pare the performance with the traditional scalar quantization techniques. Here we consider the case where the pitch and gain are (as usual) quantized separately, but the parameters describ- ing the normalizedmodel are to be quantized together as a vector. Since the lead term is 1, we wish to quantize a vector (al,u2, .e-, u K ) & x = (xo, - - e , x ~ - ~ ) . A distortion measure, d(x, i), between x and a reproduction x , can then be viewed as a distortion measure between two normalized (unit gain) inverse filters or models. A distortion measure for such a case has been proposed by Itakura and Saito [3,4] and by Chaffee [5,321 and it has the form of (7) with R(x) the autocorrela- tion matrix (r,.(k -j ) ; k = 0, 1, a s - , K - 1; j = 0, 1, e - , K -1) defined by X. described by x when theinput has a flatunit amplitude spectrum. Many properties and alternative formsfor thisparticular distortion measure are developed in [3-91, where it is also shown thatstandard LPC systems implicitly minimize this distortion, which suggests that it is also an appropriate distor- tion measure for subsequent quantization. Here, however, theimportantfact is thatit is not adifference distortion measure; it is one for which the dependence on x and i is quite complicated. We also observe that various functions of the previously defined distortion measures have been proposed in thelit- erature,forexample,distortion measures of theforms IIx - i l l r and p( Ilx -ill), where p is a convex functionandthe norm is any of the previously defined norms. The techniques to be developed here are applicable to all of these distortion measures.
  • 3. 86 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL.COM-28, NO. 1, JANUARY 1980 PERFORMANCE k t x = (xo,..*,Xk-l) be a real random vector described by a cumulative distribution function F(x)= Pr{Xi Qxi;i = 0, 1, a**, k - 1). A measure of the performance of a quantizer 4 applied tothe random vector X isgiven by the expected distortion where E denotestheexpectationwith respect to the under- lying distribution F. This performance measure is physically meaningful if the quantizer q is to be used toquantize a sequence of vectors X , = ( X , K , e - , XnK+K-l) that are sta- tionary and ergodic, since then thetime-averaged distortion, converges with probabilityone to D(4) as n -+ 00 (from the ergodic theorem),that is, D(4) describes the long-run time- averaged distortion. An alternativeperformance measure is the maximum of d(x, 4(x)) over all x in A , but we use only the expected dis- tortion (9)since,in most problems of interest (to us), it is the average distortion and not the peak distortion that deter- mines subjective quality. In addition, the expected distortion is more easily dealt with mathematically. OPTIMAL QUANTIZATION An N-level quantizer will be said to be optimal (or globally optimal) if it minimizes theexpecteddistortion,that is, 4* is optimal if for all other quantizers 4 having N reproduction vectors D(q*) < D(q). A quantizer is said to be locally opti- mum if D(q) is only a local minimum, that is, slight changes in q cause an increase in distortion. The goal of block quan- tizer design is to obtain an optimal quantizer if possible and, if not,toobtain a locally optimal and hopefully “good” quantizer. .Several such algorithms have been proposed in the literatureforthecomputer-aided design of locally optimal quantizers. In a few special cases, it has been possible to demonstrate global optimalityeither analytically or byex- hausting alllocal optima.In 1957,in a classic but unfortunately unpublished Bell Laboratories’ paper, S. Lloyd [ l ] proposed two methods for quantizer design for the scalar case (k = 1) with a squarederror distortion criterion. His “Method II” was a straightforward variational approach wherein he took deriva- tives with respect tothereproductionsymbols,yi, andwith respect totheboundarypoints defining the Si andsetthese derivatives to zero.Thisin general yieldsonlya “stationary- point” quantizer (a multidimensional zero derivative) that satisfies necessary but not sufficient conditions for optimality. By second derivative arguments, however, it is easy to establish that such stationary-point quantizers are at least locally opti- mum for vth-power law distortion measures. In addition, Lloyd also demonstrated global optimalityfor certaindistri- butionsby a technique of exhaustively searching all local optima. Essentially the same technique was also proposed and used in the parallel problem of cluster analysis by Dalenius [12] in 1950, Fisher [13] in 1953, andCox [14] in 1957. The technique was also independently developed by Max [151 in1960 andthe resulting quantizer is commonlyknown as the Ldoyd-Max quantizer. This approach has proved quite useful for designing scalar quantizers, with power-law distor- tion criteria and withknown distributions that were suffi- ciently well behaved to ensure the existence of the derivatives in question. In addition, for this case, Fleischer [161 was able to demonstrate analytically that the resulting quantizers were globally optimum forseveral interesting probability densities. In some situations, however, the direct variational approach has not proved successful. First, if k is not equal to 1 or 2, the computational requirements become too complex. Simple combinations of one-dimensional differentiation will not work because of the possibly complicated surface shapes of the boundaries of the cells of the partition. In fact, the only suc- cessful applications of a direct variational approach to multi- dimensional quantization areforquantizers where theparti- tion cells are required to have aparticular simple form such as multidimensional “cubes” or, in two dimensions, “pie slices,” each described only by a radius and two angles. These shapes are amenable to differentiationtechniques, but only yield a local optimum within the constrained class. Secondly, if, in addition, more complex distortion measures such as those of (4)-(7) are desired, the required computation asso- ciated with the variational equations can become exorbitant. Thirdly, if the underlying probability distribution has discrete components, then the required derivatives may not exist, causing furthercomputational problems.Lastly, if one lacks a precise probabilisticdescription of the randomvector X and must base the design instead on an observed long training sequence of data, then there is no obvious way to apply the variational approach. If the underlying unknown process is stationaryand ergodic, then hopefullyasystem designed by using asufficientlylongtrainingsequenceshould also work well onfuturedata.To directlyapply the variational tech- nique in this case, one would first have to estimate the under- lying continuousdistribution based onthe observations and then take the appropriate derivatives. Unfortunately, however, most statisticaltechniquesfordensityestimation require an underlyingassumption on the class of allowed densities, e.g., exponential families. Thusthesetechniques are inappropriate when no such knowledge is available. Furthermore, a good fit of a continuous model to a finite-sample histogram may have ill-behaved differential behavior andhencemay not produce a good quantizer. To our knowledge, no one has successfully used such an approach nor has anyone demonstrated that this approach will yield the correct quantizerin the limit of a long training sequence. Lloyd [l] also proposed an alternativenonvariational ap- proach as his “Method I.” Not surprisingly, both approaches yield the same quantizer forthe special cases he considered, but we shall argue that a natural and intuitive extension of his Method I provides an efficientalgorithmfor the design of good vectorquantizers that overcomes the problems of the variational approach. In fact, variations of Lloyd’s Method I have been “discovered” several times in the literature for
  • 4. LINDE et al.:ALGORITHM FOR VECTOR QUANTIZER DESIGN . . 87 squared-error and magnitudeerror distortion criteria for both scalar and multidimensional cases (e.g., [22],[23],[24], [31]). Lloyd’s basic development, however, remains the sim- plest, yet it extendseasily to thegeneral case considered here. To describe Lloyd’s Method I in the general case, we first assume that the distribution is known, but we allow it to be eithercontinuousor discreteandmake no assumptions re- quiring the existence of derivatives. piven a quantizer 4 described by a reproduction alphabet A = bi; i = 1 , -, N} andpartitiof S = {Si; i = 1 , e-, N}, then the expected dis- tortion D((A,S}) 4D(4)can bewritten as where E(d(X,vi)1 X E Si) is the conditional expected distor- tion, given X E Si,or,,equivalently,given 4(X) =yi. Suppose that we are given a particular reproduction alpha- bet A , but a partition is not specified. A partition that is opti- mum for A is easily constructed by mapping each x into the y i E A minimizing the distortion d(x,vi), that is, by choosing the minimum distortion or nearest-neighbor codeword for each input. A tie-breaking rule such as choosing the reproduc- tion with the lowest index is required if more than one :ode- word minimizes thedistortion.Thepartition, say P(A) = {Pi; i = 1 , e,.., N} constructed inthis manner is suchthat x €Pi(or q(x) =y i ) only if d(x,yi)<d(x,yj),allj , and hence which, in turn, implies for any partitionS that Thusfor afixed reproduction alphabet A^, the best possible partition is P(A). Conversely, assume we are given a partition S = {Si;i = 1 , .-,N} describing a quantizer. For the moment, assume also that the distortion measure and distribution are such that, for each setSwith nonzero probability in k-dimensionalEuclidean space, there existsa minimum distortionvector A!($)for which Analogous tothe case of a squarederrordistortion measure and a uniform probability distribution, we call the vector i ( S ) thecentroid or center of gravity of the set S. If such pointsexist,then clearly for afixed partition S = {Si; i = 1, -, N}, no reproduction alphabet A = bi; i = 1, -, N} can yield a smaller average distortion then the reproduction alpha- bet j(S) 4 {i(Si);i = 1, -, N} containing the centroids of the sets inS since It is shown in [2] that the centroids of (13) ehst for all sets S with nonzeroprobabilityforquite general distortion measures including all of those considered here. In particular, if d(x, y) is convex iny, then centroidscan be computed using standard convex programming techniques.asdescribed, e.g., in Luenberger [17, 181 or Rockafellar [19]. Incertain cases, they canbe found easily using variational techniques. If the probabiiity of a set S is zero, then the centroidcan be defined in an arbitrary manner since then the conditional expectation given that Sin (13) has nounique definition. Equations (12) and (14) suggest anaturalalgorithm for designing a good quantizer by taking any given quantizer and iteratively improving it: Algorithm (Known Distribution) (0) Initialization: Given N = number s f levels, a distortion threshold e 2 0, and an initial N-level reproduction alphabet A , and a distributionF. Set m = 0 and DV1= -. (1) Given A, bi; i = 1, -a, N}, find its minimumdistor- tion partition P(A,) = {Si; i = 1,.-, N}:x E Siif d(x,yi)Q d(x, y j ) for all j . Computethe resulting average distortion, Dm =D({Am, P(Am)}) = E min,EAm d(X?y). (2) If (Dm-1 - D,,,)/D, < e, halt with A, and P(&) describing final quantizer. Otherwise continue. (3) Find theoptimalreproduction alphabet d(P(&)) = {$(si); i = 1, -, N} fir ~(i,). set Am+ e A?(P(A,,,)). Replace m by m + 1 and go to (1). If, at some point, the optimd partition P(A,) hasa cell Sisuch that Pr(X E Si)= 0, thenthe algorithm, as,stated, assigns an arbitrary vector as centroid and continues. Clearly, alternative rules are possible and may perform better in prac- tice. For exainple, one can simply remove the cell si. and the corresponding reproduction symbol from the quantizer without affectingperformance, andthen contiriue withan (N - 1) level quantizer. Alternatively, one could assign to Si its Euclidean centerof gravity orthe zth centroidfromthe previous iteration. One could also simply reassign the repro- duction vector corresponding to Sito another cell Sj axid con- tinue the algorithm. The stated technique isgiven simply for convenience, since zero probability cells were not aproblem in the examplesconsideredhere.They can, however,occur andin suchsituations alternativetechniquessuch as those described may well work better. In practice, a simple alterna- tive is that, if the final quantizer produced by the algorithm hasa zero probability(hence useless) cell, simply rerunthe algorithm with a different initialguess. 1
  • 5. 88 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL.COM-28,NO. 1 , JANUARY 1980 From (12) and (14), D , <Dm-l and hence eachiteration of the algorithm must either reduce the distortion or leave it unchanged. We shall later mention‘some minor additional details of the algorithm and discuss.techniquesfor choosing an initial guess, but the previously given description contains the essential ideas. Since Dm is nonincreasing and nonnegative, it must have a limit, say D,, as m * 00. It is shoTn in [2] that if a limiting quantizer A, exists in the sense A , +A, as m + in the usual Euclidean sense, then D({A,, P(A,)}) = Dm and A, has the property that A, = 3(P(A,)), that is, A, is exactly the centroid of its own optimal partition. In the language of optimizationtheory, {A,, P(A,)} is a fuced point under further iterations of the algorithm [17, 181. Hence the limit quantizer (if itexists) is called afixed-pointquantizer (in contrastto a stationaiy-point quantizer obtained bya varia- tional approach). In this light, the algorithm is simply a standardtechniquefor findinga fixed point via themethod of successive approximation (see, e.g., Luenberger [17, p. 2721). If E = 0 and the algorithm halts for finite m, then such a fixed point has been attained [2] . It is shown in [2] that a necessary condition for a quantizer to be optimal is that it be afixed-point quantizer.It is also shown in [2] that, as in Lloyd’s case, if a fixed-pointquantizer is suchthatthere is no probability ontheboundaryofthe partition cells, that is if Pr(d(X, vi)= d(X,y j ) ,for some i # j ) = 0, then the quantizer is locally optimum. This is always the case withcontinuousdistributions,but can inprinciple beviolated for discrete distributions.It was never found to occur in ourexperiments, however. As Lloyd suggests, the algorithm can easily be modified to test a fixed point for this condition and if there is nonzero probability of a vector on a boundary,the strategy would be to reassign the vector to another cell of the partition and continue the iteration. For theN = 1 case with a squared-error distortion criterion, the algorithm is simply Lloyd’s Method I, and his arguments apply immediately in the more general case considered herein. A similar technique was earlier proposed in 1953by Fisher [131 in a cluster analysis problem using Bayes decisions with a squarederror cost. For larger dimensions and distortion measures of the formd(x,2)= Ilx -2 11; ,r> 1,the relations (12) and (14) were observed by Zador [20] and Gersho [21] in their work on the asymptotic performance of optimal quantizers,and.hencethe algorithm is certainly implicit in their work. They did not, however, actually propose or apply the technique to design aquantizerfor fixed N. In 1965, Forgy [31] proposed the algorithm for cluster analysis for the multidimensional squarederrordistortion case anda sample distribution (see the discussion in MacQueen [25]). In 1977, Chen [22] proposed essentially the same algorithm forthe multidimensional case with the squarederror distortion meas- ure and used it to design two-dimensional quantizers for vectors uniformly distributed in a circle. Since the algorithm has nodifferentiability requirements, it is valid for purely discrete distributions.This has an important application to the case where onedoes not possess a priori a probabilistic description of the source to be compressed, and hence must base his design on an observed longtraining se- quence of thedatato be compressed. One approach w.ould be to use standarddensityestimationtechniques of statistics toobtain a “smooth”distribution of F and tothenapply variational techniques. As previously discussed, we donot adopt this approach as it requires additionalassumptions on the allowed densities. Instead we consider the following approach: Use the training sequence, say {xk;k = 0, e - , n - 1) to form thetime-average distortion and observe that this is exactly the expected distortion EGnd(X, 4(X)) with respect tothe sample distribution Gn determined by the training sequence, i.e., the distribution that assigns probability m/n to a vectorx that occurs in thetraining sequence m times. Thus we can design a quantizer that mini- mizes the time-average distortion for the training sequence by running the algorithm onthe sample distribution Gnc1p2). This yields the following variation of the algorithm: Algorithm (Unknown Distribution) (0) Initialization: Given N = number of levels, distortion threshold E > 0, an initial N-level reproduction alphabet Ao, and a training sequence {xj;j = 0, -, n - 1). Set m = 0 and (1) GivenA , bi; i = 1, - e , N}, find the minimum distor- tionpartition P(A,) = {Si; i = 1, e-, N} ofthe training sequence: xj E Siif d(xj,yi)< d(xj, yl),for all 1. Compute the average distortion D-1 = 0 0 . (2) If (Dm-1 -D,)/D, <E , halt with A, final reproduc- tion alphabet. Otherwise continue. (3) Find theoptimalreproduction alphabet i(P(Am))= {?(si); i = 1, .-, N) for P(A,). Set A,+ & ~(P(A,)). Replace m by m + 1 and go to (1). Observe above that while designing the quantizer, only partitions of the .training sequence (theinput alphabet) are considered. Once the final codebook A, is obtained, however, it is used on new data outside the training sequence with the optimum nearest-neighbor rule, that is, an optimum partition of k-dimensional Euclidean space. ( 1 ) Itwasobserved by a reviewerthat application of the algorithm tothe sample distribution provides a“Monte Carlo” design of the quantizer for a vector with a known distribution, thatis,thedesign is based on samples of the random vectors ratherthan on an explicit distribution. (2) During the period this paper was being reviewed for publication, two similar techniques were reported for special cases. In 1978, Capria, Westin,and Esposito (231 presented a similar technique for the scalar caseusing dynamic programmingarguments.Theirapproachwas for average squarederrordistortion and for maximum distortion over the training sequence. In 1979, Menez, Boeri, andEsteban (241 proposed a similar technique for scalar quantization using squarederror and magnitudeerror distortionmeasures.
  • 6. LINDE er al. :ALGORITHM FOR VECTOR QUANTIZER DESIGN 89 If the sequence of randomvectors is stationary andergodic, then it follows from the ergodic theorem that, with probabil- ity one, G, goes to the true underlying distribution F as n + -. Thus if the training sequence is sufficiently long,hopefully a good quantizer for thesample distribution G, should also be good for the true distribution F, and hence should yield good performance forfuturedata producedby the source. All of these ideas are made precise in [2] where it is shown that, subject to suitable mathematical assumptions, thequantizer produced by applying the algorithm to G , converges, as n + 00, to the quantizer produced by applying the algorithm tothetrue underlying distribution F. We observe thatno analogousresults are knowntotheauthorsforthe density estimation/variational approach, and that independence of successive blocks is not required for these results, only block stationarity andergodicity. We also pointoutthat,for finitealphabet distributions such as sample distributions, thealgorithm always converges to a fixed-point quantizerin a finitenumber of steps[2]. A similar technique used incluster analysis withsquared- error cost functions was developed by MacQueen in 1967 [25] and has been called the k-meansapproach.Amore involved technique using the k-means approach is the “ISODATA” approach of Ball and Hall [26]. The basic idea of finding minimum distortion partitions and centroids is the same, but the trainingsequence data is used ina different manner and the resulting quantizers will, ingeneral, be different. Their sequentialtechnique incorporatesthe trainingvectorsone at atime and ends when the lastvector is incorporated. This is in contrast to the previous algorithmwhichconsiders all of the training ve,ctors at each iteration.The k-means method can be described as follows: The goal is to produce a partition SO = {So, . - a , S N - ~ }of the training alphabet, A = {xi; i = 0, -, n - 1) consisting of all vectors in the !raining sequence. Thecorresponding reproductionalphabet A will then be the collection of the Euclidean centroidsofthe sets Si, that is, the final reproductionalphabet will be optimalforthe final partition (but the final partition may not be optimal for the final reproductionalphabet,except as n + -). Toobtain S , we first think of the each Sias a bin in which to place training sequencevectorsuntil all are placed.Initially, we startby placing the first N vectorsinseparatebins, i.e., xiE Si, i = 0, a * - , N - 1. We then proceed as follows: at each iteration, a new trainingvector x, is observed. We find the set Si for which thedistortion between x, and thecentroid ;(Si)is minimized andthen add x, to thisbin. Thus,ateachitera- tion,the new vector is added to the bin withthe closest centroid,and hence thenexttime, this bin will have a new centroid. This operation is continued until all sample vectors are incorporated. Although similar in philosophy, the k-means algorithm has some crucial differences. In particular, it is suited for the case where only the trainingsequence is to be classified, that is, wherealongsequence of vectors is to be groupedinalow distortion manner. The sequentialprocedure is computation- ally efficient for grouping, but a “quantizer” is not produced until theprocedure is stopped. In other words, in cluster analy- sis, one wishes to groupthings and the groups can change with time, but in quantization, one wishes to fix the groups (to get a time-invariant quantizer), and then use these groups (or the quantizer) on future data outside of the training sequence. An additionalproblem is thattheonlytheorems which guarantee convergence, in the limit of a long training sequence, require the assumption that successive vectors be independent [25], unlike the more general case for the proposed algorithm Recently Levenson et al. used avariation of the k-means and ISODATA algorithms with a distortion measure proposed by Itakura [4] to determine reference templatesfor speaker- independent word recognition [27]. They used, as a distortion measure, the logarithm of thedistortionof (7) (which is a gain-optimized Itakura-Saitodistortion [7]-our use ofthe distortion measure with unit-gain-norrnalizd models results in no such logarithmic function). In their technique,however, aminimax rule was used to select thereproduction vectors (or cluster points) rather than finding the “optimum” centroid vector. If instead, the distortion measure of (7) is used, then the centroidsare easily found, as will be seen. P I . CHOICE OFA ^ , There are several ways to choose the initial reproduction alphabet do requiredby the algorithm. One methodfor use on sample distributionsis that of thek-means method, namely choosing the first N vectors in the training sequence. We did nottry this approach as, intuitively, one would like these vectors to be well-separated,and N consecutive samples may not be. Twoothermethods were found to be useful in our examples.Thefirst is to use a uniform quantizer over all or most of the source alphabet (if it is bounded). For example, if used on a sample distribution,one uses a k-dimensional uniform quantizer on ak-dimensional Euclidean cube includ- ing all or most of thepoints in the trainingsequence.This technique was used in the Gaussian examples described later. Thesecondtechnique is useful when one wishes to design quantizers of successively higher rates until achieving an acceptable level of distortion. Here we consider M-level quan- tizers with M = 2R, R = 0, 1, * . e , and continue until we achieve an initial guess for an N-level quantizer as follows: INITIAL GUESS BY “SPLITTING” (0) Initialization: Set M = 1 anddefine Ao(l) = i(A), the centroid of the entire alphabet (the centroid of the train- ing sequence, if a sample distributionis u s e ! ) . (1) Given thereproductionalphabet Ao(M) containing M vectors hi; i = 1 , - e , M}, “split” each vector yiinto twoclose vectors yi + e and yi - e, where e is afixed perturbation vector. The collection 2of f y i + E , yi-E , i = 1, e-, M} has 21.1vectors. Replace M by q. (2) Is M =N ? If so, set A . =AIM)and halt.Jois then the initial reproduction alphabet for the N-level quantization algorithm. If not, run the algorithm for an M-level quantizer on i ( j l 4 ) to produce a good reproduction alphabetAo(M),and then return to step(1). Using the splittingalgorithm on atrainingsequence, one starts with a one-level quantizer consisting of the centroid of the training sequence. This vector is then split into twovectors
  • 7. 90 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-28, NO. 1 , JANUARY 1980 andthe two-level quantizeralgorithm is run on this pair to obtain a good (fixed-point) two-level quantizer. Each of these two vectors is then split and the algorithm is run to produce a good four-level quantizer. At the conclusion, one has fixed- point quantizers for1,2,4, 8, .-,Nlevels. EXAMPLES GaussianSources The algorithm was used initially to design quantizers for the classical example of memoryless Gaussian random var- iables with the squarederror distortion criterion of (l), based on a training sequence of data. The training and sample data were produced by a zero-mean, unit-variance memoryless sequence of Gaussian random variables. The initial guess was a unit quantizer on the k-dimensional cube, {x: Ixi I <4;i = 0, --,k - 1). A distortion threshold of 0.1%was used. The overall algorithm can be describedas follows: (0) Initialization: Fix N = number of levels, k = block length, n = length of trainingsequence, E = .001. Given a training sequence {xj;j = 0, . - a , n - 1): Let ,do be anN-level uniform quantizer reproduction alphabet for the k-dimensional cube,{u:IuiI~4,i=0,--,k-l}.Setm=OandD~1=~. (1) Given A , = Q i ; i = 1, * e . , N}, findtheminimum distortion partition P(A,) = {Si; i = 1, --,N}. For example, for each j = 0, * . a , n - 1,,compute d(xj,vi)for i = 1, * - a , N. If d(xj,yi) <d(xj,yr) forall 1, thenxj ESi.Compute: n-1 D, =D({A, ,P(A,)}) =n-1 min d(xj,y>. j=O YEA, (2) If (Dm-1 -D,)/D, <E = .001, halt with final quan- tizer described by a,. Otherwise continue. (3) Find the optimal rep5oduction alphabet i(P(km)) = {$(Si); i = 1, -, N} for P(A,). For the squarederror crite- rion, ;(Si) is the Euclidean center of gravity or centroid given by 1 x(si) = - C xi, II si It j:xjESi where llSill denotes the number of training vectors in the cell Si. If llSiII = 0, set ;(Si) = yi, the old codeword. Define A,+1 =3(P(A,)), replace m by m + 1,and go to (1). Table 1 presentsa simple but nontrivial example intended to demonstrate the basic operation of the algorithm. A two- dimensional quantizer with four levels is designed, based on a short training sequence of twelve training vectors. Because of the short training sequence in this case, the final distortion is lower than one would expect and the final quantizer may not work well on new data outside of the training sequence. The tradeoffs between the length of the training sequence and the performance inside and outside the training sequence are developed more carefully in the speech example. Observe that, in the example of Table 1, the algorithm could actually halt in step (1) of the m = 1 iteration since, if P(&) = P(,&,-l), it follows that A,+1 = 9 ( P ( A , ) ) = $(P(A?m-l)) = A,, and hence A, is the desired fixed point. Nore: Characterswith tildes underneath appear boldface in text. In other words, if the quantizer stays the same for two itera- tions,thenthe.twodistortions are equalandan ‘‘E = 0” threshold is satisfied. As a more realistic example, the algorithm was run for the scalar (k = 1) case with N = 2, 3, 4, 6 and 8, using a training sequence of 10,000samples per quantizer output froma zero- mean, unit-variance memoryless Gaussian source. The resulting quantizer outputs and distortion were within 1% of the optimal values reported by Max [15]. No more than 20
  • 8. LINDE et al. :ALGORITHM FOR VECTOR QUANTIZER DESIGN 91 3L 0.37 L OPTIMAL ---MINIMAL '.lo 2 4 6 8 - ITERATIONS Fig. 1. TheBasicAlgorithm:GaussianSource N = 4. iterations were required for N = 8 and,for smaller N, the number of iterations was considerably smaller. Figure 1 describes the convergence rate of one of the tests forthe caseN = 4. 'The algorithm was then tried for block quantizers for memoryless Gaussian variables withblocklengths k equal to 1, 2,3,4,5and 6 and a rate of onebit per sample,so that N = 2k. The distortion criterion was again the squared-error dis- tortion measure of (1). The algorithm used a training sequence of 100,000 samples. In each case, the algorithm converged in fewer than 50 iterations and the resulting distortion is plotted in Fig. 2, together with the onebit-per-symbol scalar case as a function of block length. For comparison, the rate-distortion bound [28, p. 991 D(R) = 2-2R for R = 1 bit-per-symbol is also plotted. As expected and as shown in Fig. 2 , the block quantizers outperform the scalar quantizer, butfor these block lengths, the performance is still far from the rate distortion bound (which is achievable, in principle, only in the limit as k + -). A more favorable comparison is obtained using a recent result of Yamada,Tazaki,andGray [29] which pro- vides a lower bound to the performance of an optimalN-level k-dimensionalquantizerwithadifference distortion measure when N is large. This bound provides strict improvement over the rate-distortion boundfor fixed kand tendstotherate- distortion bound as k + -. In the current case, the bound has the form where I ' is the gamma function. This bound is theoretically inappropriate for small k,yetit is surprisingly close forthe k = 1 result, which is known tobe almost optimal. Fork = 6, 0.33 z 2 0.31 0 a O QUANTIZER LOWER BOUND $d(k) BLOCKLENGTH k Fig. 2. BlockQuantizationRate I bit/symbolGaussianSource. N = 26 = 64 is moderately large and the closeness of the actual performance to the lower bound, compared to optimal performance provided by D,ck)(l), suggests thatthe algo- rithm is indeed providing a quantizer with block lengthsix and rateonebit-per-symbol that is nearly optimal(within 6%of the optimal). LLOYD'S EXAMPLE Lloyd [l] provides an example where both variational and fixed-pointapproaches can yield locally optimalquantizers instead of a globally optimum quantizer. We next proposea slight modification of the futed-point algorithm that indeed findsa globally optimum quantizer in Lloyd's example. We conjecture that thistechnique will workmore generally, but we have been unable to prove this theoretically. A similar tech- nique can be used with the stationary-pointalgorithm. Instead of using samples from the source that we wish to quantize, we use samples corrupted by additive independent noise, where the marginal distribution of the noise is such that onlyone locally optimum quantizer exists for it. As an ex- ample, for scalar quantization with thesquarederror distor- tion measure, we use Gaussian noise. In this case, any locally optimal quantizer is also globally optimum. Otherdistribu- tions, such as the uniform or a discrete amplitude noise with an alphabet size equal to the number of quantizer output levels, can also be used. When the noise power is much greater thanthe source power, thedistribution of their sum is essentially the distri- bution of the noise. We assume that, initially, the noise power is so large that only one locally optimum quantizer exists for the sum; hence, regardless of the initial guess, the algorithm w i l l converge to this optimum. On thenextstep,the noise power is reduced slightly and the quantizer resulting from the previous run is used as the initial guess. Intuitively, since the noise has been reduced by a small amount, the global optimum for the new sum should be close to that of the previous sum (we use the same source and noise samples with reduced noise power).Thus we expectthatthe algorithm will converge to the global optimum even though new local optimumpoints might have been introduced. We continue in the same manner reducing the noise gradually to zero.
  • 9. 92 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL.COM-28,NO. 1 , JANUARY 1980 tOPTIMAL QUANTIZER ‘/// I/ I : b STATIONARY -1 I QUANTIZER I POINT Fig. 3. TheProbabilityDensityFunction. To illustratehow the algorithmworks, we use asource with a probability density function as shown in Fig. 3. In this case, there are two locally optimum two-level quantizers. One has the outputlevels + O S and -0.5 and yields a mean-squared error of 0.083, the second (which is the global optimum) has output levels -0.71 and 0.13 and yields a mean-squared error 0.048. (This example is essentially the same as one of Lloyd’s The modified algorithm was tested on a sequence of 2,000 samples chosen according to the probability density shown in Fig. 3. Gaussian noise was added starting at unityvariance and reducing the variance by approximately 50%on each succes- sive run. The initial guess was+ 0.5 and-0.5 which is the non- global optimum.Each run was stopped when thedistortion was changed by less than 0.1%from itsprevious value. The resultsare given in Fig. 4 and it is seen that, inspite of the bad initial guess, the modifiedalgorithm converges to the globally optimum quantizer. P I J SPEECH EXAMPLE Inthenextexample, we consider the case of aspeech compressionsystemconsisting of an LPC analysis of 20 ms- long speech frames producing a voiced/unvoiced decision and a pitch, again,andanormalized inverse filter as previously described, followed. by quantization where the pitch and gain are separately quantized as usual, butthe normalizedfilter coefficients (al, --,aK) = (xo, xl, -, X K - ~ ) are quantized as a vector with K = 10, using the distortion measure of (7)- (8). Thetrainingsequenceconsisted of asequence ofnor- malized inverse fdter parameter vectors(3). The LPC analysis was digital, and hence the training sequence used was already “finely quantized” to 10 bits per sample or 100 bits for each vector. The original gain required 12 bitsper speech frame and the pitch used 8 bits per speechframe. The total rate of the LPC output (which we wish to further. compress by block quantization) is 6000 bits/s. No further compression of gain or pitch was attempted in these experiments as our goal was (3)The training sequence and additional testdata of LPC reflection coefficients were provided by Signal Technology Inc. of Santa Barbara andwere produced usingstandard LPC techniques on a single male speaker. --- STATIONARY POINTS 0.8 0 . 6 x ITERATION RESULT 0 NOISE VARIANCE CHANGE LEVEL 1 (VARIANCE INDICATED) - l , ~ ~ d u = 0 . 3 5 ~ ~ 0 . 5 -1.2 -1.4 LEVEL 2 Q= 0.7 Fig. 4. TheModifiedAlgorithm. to study only potential improvementwhenblockquantizing the normalized fiter parameters. The more complete problem including gain andpitch involves manyother issues and is the subject of a paperstill in preparation. Forthedistortion measure of (7)-(8), thecentroid of a subset S of a training sequence {xj,j = 0, 1, * e * , n - 1) is the vector u minimizing (Xj -U)R(Xj)(Xj -U ) f . ; : x j a We observe that the autocorrelation matrix R(x) is a natural byproduct of the LPC analysis and need not be recomputed. This minimization, however, is a minimumenergy-residual minimization problem in LPC analysis and it can be solved by standard LPC algorithms such asLevinson’s algorithm [lo]. Alternatively, it is a muchstudied minimizationproblemin Toeplitz matrixtheory[30] and thecentroid can be shown via variational techniques to be The splittingtechnique forthe initial guess anda distortion threshold of 0.5%were used. The complete algorithm for this example can thus bedescribed as follows: (0) Initialization: Fix N = 2 R ,R an integer, where N is the largest number of levels desired. Fix K = 10, n = length of training sequence, E = .005. SetM = 1. Given atrainingsequence {xi;j = 0, -*, n - l},set A = {xj;j = 0, e-, n - l}, the training sequence alphabet. Define A(1) = ;(A), the centroid of theentire trainingsequence using (15) or Levinson’s algorithm.
  • 10. LINDE et ai.:ALGORITHM FOR VECTOR QUANTIZER DESIGN 93 (1) (Splitting): Given A(w= bi,i = 1, *.. ,M},split each reproduction vector y i into yi + E and y i - e, where e is a fixed perturbation vector. Set Ao(2M) = b i + E , y i - E , i = 1, .-,M} and then replace M by 2 M . (2) Set m = 0 and Del = 00. (3) Given A,(M) = bl,* . e , y " ) , find its optimum parti- tion P(A,(M)) = {Si; i = 1, e.., M},that is, xiE Si if d(xj, vi)G&xi,ut),all 1. Compute theresulting distortion D, =D<Cjrn(M), P d m (M))}) (4) If (Dm-1 -D,)/D, <E = .005, then go to step (6). Otherwise continue. (5) Findtheoptimalreproduction alphabet A,+l(M) = 9(P(A,(M)) = {2(Si);i = 1, -, 1)for P(A,(M)). Replace m by m + 1 and go to (3). (6)Set A(M) = A,(M). The final M-level quantizer is described by A(w.If M = N, haltwith final quantizerde- scribed by A(N).Otherwise go to step(I). Table 2 describes the results of the algorithm for N = 64, and hence forone- to eight-bit quantizers trained on n = 19,000 frames of LPC speech producedby a single speaker. The distortion at the end of each iteration is given and, in all cases, the algorithm converged in fewer than14 iterations. When the resulting quantizers were applied to data from the same speaker outside of the trainingsequence, the resulting distortion was within 1%of that within the training sequence. A total of three andone-half hours of computertimeon a PDP 11/35 was required to obtain all of these codebooks. Figure 5 depicts the rate of convergence of the algorithm with a training sequence length for a 16-level quantizer. Note the marked difference between the distortion for 2400 frames inside the training se.quence and outside the training sequence for short training sequences. For alongtrainingsequence of over 12,000 frames, however,the distortionis nearly the same. Tapes of the synthesized speech at 8 bits per frame for the normalizedmodelsounded similar to those of the original LPC speech with 100 bits per frame for the normalized model (the gain and the pitch were both left at theoriginal LPC rate of 12and 8 bits per frame, respectively). While extensive subjective tests were not attempted, all informal listening tests judged the synthesized speech perfectly intelligible (when heard before the original LPC!) and the quality only slightly inferior when thetwo were compared. The overall compres- sion was from 6000 bits/s to 1400 bits/s. This is not startling as existing scalar quantizers thatoptimally allocate bits among the parameters and optimally quantize each parameter using a spectraldeviation distortion measures [111 also perform well inthis range. It is,however, promising as these were pre- liminaryresultswith noattempttofurther compress pitch and gain (which, taken together in our system, hadmore than twice the bit rate of the normalized model vector quantizer). Further results on applications of the algorithm to the overall speech compression system will be the subject of a forthcom- ing paper [33]. TABLE 2 ITAKURA-SAITO DISTORTION VS. NUMBER OF ITERATIONS. TRAINING SEQUENCE LENGTH = 19,000 FRAMES. NUMBER OF LEVELS DISTORTION NUMBER 2 10.33476 1 1.98925 2 1.78301 3 1.67244 4 1.55983 5 1.49814 6 1.48493 7 1.48249 8 4 1.38765 1 1.07906 2 1.04223 3 1.03252 4 1.02709 5 8 0.96210 1 0.85183 2 0.81353 3 0.79191 4 0.77472 5 0.76188 6 0.75130 7 0.74383 8 0.73341 9 0.71999 10 0.71346 11 0.70908 12 0.70578 13 0.70347 14 16 0.64653 1 0.55665 2 0.51810 3 0.50146 4 0.49235 5 0.48761 6 0.48507 7 32 0.44277 1 0.40452 2 0.39388 3 0.38667 4 0.38128 5 0.37778 6 0.37574 7 0.37448 8 64 0.34579 1 0.3178 6 2 0.30850 3 0.30366 4 0.30086 5 0.29891 6 0.29746 7 128 0.27587 1 0.2562 8 2 0.24928 3 0.24550 4 0.24309 5 0.24142 6 0.2402 1 7 0.23933 8 2 56 0.22458 1 0.20830 2 0.20226 3 0.19849 4 0.19623 5 0.19479 6 0.19386 7 0.19319 8
  • 11. 94 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-28, NO. 1, JANUARY 1980 A 1.2 - 1.1 - I n : - 0.9- 0 3 0.8- N IY 0.7- 0.6- 0 c e 0.5- 0.4 - 0.1 - 0.2 - 0.3 - 2 OUTSIDETRAINING SEQUENCE SEQUENCE INSIDE TRAINING I I I I 1 0 100 1000 10000 NUMBER OF VECTORS IN THE TRAININGSEQUENCE Fig. 5. ConvergencewithTrainingSequence. EPILOGUE The Gaussian example of Figure 2, Lloyd’s example,and the speech example were run on a PDP 11/34 minicomputer at the Stanford University Information Systems Laboratory. The simple example of Table 1 was run in BASIC on a Cromemco System 3 microcomputer. As a check,themicrocomputer program was also used to design quantizers for the Gaussian case of Figure 2 using the splitting method, k = 1, 2, and 3, and a training sequence of 10,000 vectors. The results agreed with the PDP 11/34 run to within one percent. ACKNOWLEDGMENT Theauthors would like to acknowledge thehelp of J. Markel of Signal Technology, Inc. of Santa Barbara and A. H. Gray, Jr., of the University of California at Santa Barbara in both theanalysis and synthesis of thespeech example. REFERENCES [I] Lloyd, S. P., “LeastSquaresQuantization inPCM’s,”BellTelephone Laboratories Paper, Murray Hill, NJ, 1957. 121 Gray,R.M.,J.C. KiefferandY.Linde.“LocallyOptimalBlock Quantization for Sources without a Statistical Model,’’ Stanford UniversityInformationSystemsLabTechnicalReportNo.L-904-1, Stanford, CA, May 1979 (submittedfor publication). 131 Itakura. F. and S. Saito.“AnalysisSynthesisTelephonyBasedUpon MaximumLikelihood Method,” Repfs. ofthe 6thInrernar’l. Cong. Acousr.. Y.Kohasi, ed., Tokyo, C-5-5, C17-20. 1%8. 141 Itakura. F.,“Maximum Prediction Residual Principle Applied to Speech Recognition,” IEEE Trans. ASSP,23, pp. 67-72, Feb. 1975. 151 Chaffee. D. L.. “Applications of Rate Distortion Theory to the of Calif. at Los Angeles.1975. Bandwidth Compression of Speech Signals,” Ph.D. Dissertation, Univ. [6]Gray,R.M., A.Buzo. A. H.Gray, Jr., and J. D. Markel,“Source Coding and Speech Compression,” Proc. of rhe 1978 Inrernar’l. Telemetering Conf., pp. 371-878, 1978. Matsuyama, Y.,A. Buzo and R. M. Gray, “Spectral Distortion Measures for SpeechCompression,”StanfordUniv.Inform.Systems Lab. Tech. Rept. 6504-3, Stanford, CA, April, 1978. Buzo, A., “OptimalVectorQuantization for LinearPredictedCoded Speech,” Stanford Univ., Ph.D. Dissertation,Dept.ofElec. Engrg., August, 1978. Matsuyama, Y. A,, “Process Distortion Measures and Signal Processing,” Ph.D. Dissertation, Dept.of Elec. Engrg., StanfordUniv., 1978. Markel, J. D. and A. H.Gray, Jr., LinearPredicrion o f Speech. Springer-Verlag. NY 1976. Gray, A. H., Jr., R. M.Gray andJ. D. Markel. “Comparison of Optimal Quantizations of Speech Reflection Coefficients,” IEEE Trans. ASSP, Vol. 24, pp. 4-23. Feb. 1977. Dalenius. T., “The Problem of Optimum Stratification,” Skndinavisk Akruuvielidskrifr.Vol. 33, pp. 203-213, 1950. Fisher, W. D., “On a PoolingProblemfromtheStatisticalDecision Viewpoint,” Econometrica, Vol. 21, pp. 567-585, 1953. Cox, D. R.. “Note on Grouping,’’ J . o f the Amer. Sraris. Assoc., Vol. 52. pp. 543-547, 1957. Max, J., “Quantizing for Minimum Distortion.” IRETrans. on Inform. Theory,IT-6, pp. 7-12, March 1960. Fleischer, P., “Sufficient Conditionsfor Achieving Minimum Distortion inaQuantizer,”IEEEInr. Conv. Rec.,pp. 104-111, 1964. Luenberger, D. G.,Oprimizarionby Vector SpaceMethods, John Wiley &Sons, NY, 1969. Luenberger, D. G., IntroductiontoLinear andNonlinear Programming. Addison-Wesley, Reading, MA, 1973. NJ, 1970. Rockafellar, R. T.,Convex Analysis. Princeton Univ. Press, Princeton, Zador, P., “Topics in theAsymptoticQuantizationofContinuous Random Variables,” Bell Telephone Laboratories Technical Memorandum, Feb. 1966. Gersho, A., “Asymptotically Optimal Block Quantization,” IEEE Trans.on Inform. Theory, Vol. IT-25, pp. 373-380, 1979. Chen. D. T. S., “On Two or More Dimensional Optimum Quantizers,” Proc. 1977 IEEE Inrernat’l. Conf. on Acousrics.Speech, & Signal Processing, pp. 640-643, 1977. Caprio, J. R., N. Westin and J. Esposito, “Optimum Quantization for Minimum Distortion,” Proc. of rhe Inrernar’l.TelemereringConf., pp. 315-323. NOV.1978.
  • 12. LINDE et al :ALGORITHM FOR VECTOR QUANTIZER DESIGN 95 Menez, J., F. Bceri, and D. J. Esteban,“Optimum QuantizerAlgorithm for Real-Time Block Quantizing,” Proc. of the I979 IEEE Inrernat’l. Conf. onAcoustics, Speech, & SignulProcessing. pp. 980-984, 1979. MacQueen, J., “Some Methods for Classification and Analysis of Multivariate Observations,” Proc.of rhe Fifh Berkeley Symposium on Math., Srar. andProb., Vol. 1 , pp. 281-296, 1967. Ball, G. H. and D. J. Hall, “Isodata-An Iterative Method of Multivariate Analysis and Pattern Classification,” in Proc. IFIPS Congr., 1965. Levinson, S . E., L. R. Rabiner,A. E. Rosenbergand J. G . Wilson, “Interactive Clustering Techniques for Selecting Speaker-Independent Techniques forSelecting Speaker-IndependentReferenceTemplatesfor IsolatedWord Recognition,” IEEE Trans. ASSP.Vol. 27, pp. 134-141, 1979. Berger, T., Rate DistortionTheory, Prentice-Hall, Englewood Cliffs, NJ, 1971. Yamada, Y.,S. Tazaki and R. M. Gray, “Asymptotic Performanceof Block Quantizers with Difference Distortion Measures,’’ to appear, IEEE Trans.on Inform. Theory. Grenander, U. and G. Szego, Toeplirz Forms and Their Applications, Univ. ofcalif. Press, Berkeley, 1958. Forgy, E., “Cluster Analysis of Multivarate Data: Efficiency vs. Interpretability of Classifications,” Abstract, Biometrics, Vol. 21, p. 768, 1965. Chaffee, D. L. and J. K. Omura, “A Very Low Rate Voice Compression System,‘’ Abstract in Abstracts o fPapers, 1974,IEEE Intern. Symp.on Inform. Theory,Notre Dame, Oct. 28-31, IEEE, 1974. Dr. Linde is currently with Codex Corporation, Mansfxld, Massachusetts where he is involved in research and development in the areas of digital signal processing, modems and data networks. * And& Buzo(S’7&M’78) was born in MexicoCity, ’ Mexico, on November 30, 1949. He received the electrical and mechanical engineer degree from the National University of Mexico, Mexico City, in 1974, and theM.S. and Ph.D. degreesfrom Stanford University, Stanford, California in1975and1978, respectively. In 1978hewas at Signal Technology in Santa Barbara,California. Nowhe is at the Instituto de Ingenieria ofthe National University of Mexico whereheis engaged inresearchon digital signal processing of speech signalsand data compression. * Buzo. A., A. H.Gray, Jr., R. M. Gray, and J. D. Markel, “Speech - -- Robert M.Gray (S’68-M’69SM’77) was born in Coding Based on Vector Quantization,” submitted for publication. San Diego, California, on November 1. 1943. He received the B.S. and M.S. degrees in electrical engineering from the Massachusetts Institute of research assistantat the involved in research in codes. YosephLinde(S’7&M’78)was born in Sendzislaw, Polandon June 14, 1947. He received the B.Sc. degree from the Technion, Israel Institute of Technology in 1970, the M.Sc. degree from theTel- Aviv University in 1975 and the Ph.D. degree from Stanford University, Stanford,California, in1977, all in Electrical Engineering. From 1970to 1975 he was with theSignalCorps., Israeli Defense Forces, wherehewasinvolved in research and development of military communications systems. In 1976 and 1977 he wasa Information SystemsLaboratoryat StanfordUniversity data compression systems, in particular tree and trellis Technology, Cambridge, in 1966, and the Ph.D. degree from the University of Southern California, Los Angeles, in 1969. Since 1969 he has been with the Electrical Engineering Department and the Information Systems Laboratories, Stanford University, Stanford, California, where he is engagedin teaching and research in communicationsand informationtheory. Dr. Grayisa member of Sigma Xi, Eta Kappa Nu, theMathematical Association of America, the Society for Industrial and Applied Mathematics, the Institute of Mathematical Statistics, the American Mathematical Society, and the Soci&?des Ingdnieurset Scientifiquesde France. He hasbeen amember Theory since 1975 and an Associate Editor of the IEEE TRANSACTIONS ON of the Board of Governors oftheIEEE Professional Group on Information INFORMATIONTHEORY since September1977.