Ch08
Ch08
Iain E. G. Richardson
Copyright q 2002 John Wiley & Sons, Ltd
ISBNs: 0-471-48553-5 (Hardback); 0-470-84783-2 (Electronic)
Entropy Coding
8.1 INTRODUCTION
A video encoder contains two main functions: a source model that attempts to represent a
video scene in a compact form that is easy to compress (usually an approximation of the
original video information) and anentropy encoder that compresses the output of the model
prior to storage and transmission. The source model is matched to the characteristics of the
input data (images or video frames), whereas the entropy coder may use ‘general-purpose’
statistical compressiontechniques that arenotnecessarilyuniqueintheirapplication to
image and video coding.
As with the functions described earlier (motion estimation and compensation, transform
coding,quantisation),thedesign ofan entropyCODECisaffectedbyanumber of
constraints including:
1. Compression eficiency: the aim is to represent the source model output using as few bits
as possible.
2. Computational eficiency: thedesignshouldbesuitableforimplementationonthe
chosen hardware or software platform.
3. Error robustness: if transmission errors are likely, the entropy CODEC should support
recovery from errors and should (if possible) limit error propagation at decoder (this
constraint may conflict with (1) above).
In a typical transform-based video CODEC, the data to be encoded by the entropy CODEC
falls into three main categories: transform coefficients (e.g. quantisedDCT coefficients),
motion vectors and ‘side’ information (headers, synchronisation markers, etc.). The method
of coding side information depends on the standard. Motion vectors can often be represented
compactly in a differential form due to the high correlation between vectors for neighbouring
blocks or macroblocks.Transformcoefficientscanberepresented efficiently with‘run-
level’ coding, exploiting the sparse nature of the DCT coefficient array.
An entropy encoder maps input symbols (for example, run-level coded coefficients) to a
compressed data stream. It achieves compression by exploiting redundancy in the set of
input symbols, representing frequently occurring symbols with a small number of bits and
infrequently occumng symbols with a larger number of bits. The two most popular entropy
encodingmethodsusedinvideocodingstandardsareHuffmancodingandarithmetic
coding.Huffmancoding (or ‘modified’Huffmancoding)representseachinputsymbol
by avariable-lengthcodewordcontaining an integralnumber of bits. It is relatively
164 ENTROPY CODING
straightforwardtoimplement,butcannotachieveoptimalcompressionbecause of the
restriction that each codeword must contain an integral number of bits. Arithmetic coding
maps aninputsymbolintoafractionalnumber of bits,enablinggreatercompression
efficiency at the expense of higher complexity (depending on the implementation).
8.2.1 Run-LevelCoding
The output of the quantiser stage in a DCT-based video encoder is a block of quantised
transform coefficients. The arrayof coefficients is likely to be sparse:if the image block has
been efficiently decorrelated by the DCT, most of the quantised coefficients in a typical
block are zero. Figure 8.1 shows a typical block of quantised coefficients from an MPEG-4
‘intra’block.Thestructure of thequantisedblock is fairlytypical. A few non-zero
coefficients remain after quantisation, mainly clustered around DCT coefficient (0,O): this
is the ‘DC’ coefficient and is usually the most important coefficient to the appearance of the
reconstructed image block.
The block of coefficients shown in Figure 8.1 may be efficiently compressed as follows:
1. Reordering. The non-zero values are clustered around the top left of the 2-D array and
this stage groups these non-zero values together.
2. Run-level coding. This stage attempts to find a more efficient representation for the large
number of zeros (48 in this case).
3. Entropy coding. The entropy encoder attempts to reduce the redundancy
of the data symbols.
Reordering
The optimum method of reordering the quantised data depends on the distribution of the
non-zero coefficients. If the original image (or motion-compensated residual) data is evenly
DC
distributedin the horizontal and vertical directions (i.e. thereis not a predominance of
‘strong’ image features in either direction), then the significant coefficients
will also tend to
be evenly distributed about the top left of the array (Figure 8.2(a)). In this case, a zigzag
reordering pattern such as Figure 8.2 (c) should group together the non-zero coefficients
8000 -
6000.
4000.
2000 -
O
0 L
i 2P o
2000 j
O
0 b
2 2
8 8
(b)
Figure 8.2 Typicaldatadistributionsandreorderingpatterns:(a)evendistribution;(b)field
distribution;(c) zigzag; (d) modified zigzag
166 ENTROPY CODING
efficiently. However, in some cases an alternative pattern performs better. For example, a
field of interlaced video tends to vary more rapidly in the vertical than in the horizontal
direction (because it has been vertically subsampled). In this case the non-zero coefficients
are likely to be ‘skewed’ as shown in Figure 8.2(b): they are clustered more to the left of the
array (corresponding to basis functions with a strong vertical variation, see for example
Figure 7.4). A modified reordering pattern such as Figure 8.2(d) should perform better at
grouping the coefficients together.
Run-level coding
The output of the reordering process is a linear array of quantised coefficients. Non-zero
coefficients are mainly grouped together near the start of the array and the remaining values
inthe array arezero.Longsequences of identicalvalues(zerosinthiscase)can be
represented as a (run, level) code, where (run) indicates the number of zeros preceding a
non-zero value and (level) indicates the sign and magnitude of the non-zero coefficient.
The following example illustrates the reordering and run-level coding process.
Example
Reordered array:
[102, -33, 21, -3, -2, -3, -4, - 3 , 0 , 2 , 1,0, 1,0, -2, - 1, -1,0, 0,0, -2, 0,0, 0,
0,0,0,0,0,0,0,0,1,0 ...l
Run-level coded:
(0, 102) (0, -33) (0, 21) (0, -3) (0, -2) (0, -3) (0, -4) (0, -3) (1, 2) (0, 1 ) (1, 1 )
(1, -2) (0, - 1) (0, -1) (4, -2) (11, 1)
DATA SYMBOLS 167
Two special cases need to be considered. Coefficient (0, 0) (the ‘DC’ coefficient) is impor-
tant to the appearance of the reconstructed image block and has no preceding zeros. In an
intra-coded block (i.e. coded without motion compensation), the DC coefficient is rarely
zero and so is treated differently from other coefficients. In an H.263 CODEC, intra-DC
coefficients are encoded with a fixed, relatively low quantiser setting (to preserve image
quality) and without (run, level) coding. Baseline JPEG takes advantage of the property that
neighbouringimageblockstendtohavesimilarmeanvalues(andhencesimilar DC
coefficientvalues)andeachDCcoefficientisencodeddifferentiallyfromtheprevious
DC coefficient.
The second special case is the final run of zeros in a block. Coefficient (7, 7) is usually
zero and so we need a special case to deal with the final run of zeros that has no terminating
non-zero value. In H.261 and baseline JPEG, a special code symbol, ‘end of block’ or EOB,
is inserted after the last (run, level) pair. This approach is known as ‘two-dimensional’ run-
level coding since each code represents just two values (run and level).The method doesnot
perform well underhighcompression:inthiscase,manyblockscontainonlya DC
coefficient and so the EOB codes make up a significant proportion of the coded bit stream.
H.263 and MPEG-4 avoid this problemby encoding a flag along with each (run, level)pair.
This ‘last’ flag signifies the final (run, level) pair in the block and indicates to the decoder
thattherest of theblockshouldbe filled withzeros.Eachcode now representsthree
values (run, level, last) and so this method is known as ‘three-dimensional’ run-level-last
coding.
In addition to run-level coded coefficient data, a number of other values need to be coded
and transmitted by the video encoder. These include the following.
Motion vectors
Example
Current Current
macroblock macroblock
Quantisation parameter
In order to maintain a target bit rate, it is common for a video encoder to modify the
quantisation parameter (scale factor or step size) during encoding. The change must be
signalled to the decoder. It isnotusuallydesirabletosuddenlychangethequantisation
parameter by a large amount during encodingof a video frame andso the parameter may be
encoded differentially from the previous quantisation parameter.
Example
Coded block pattern (CBP) indicates the blocks containing non-zero coefficients in an
inter-coded macroblock.
INumber of coefficients
I
non-zero block
in each I
I
I
I
YO Yl Y2 Y3 Cr Cb
CBP I
2 1 0 7 llO100 0 0
I I I I I I
0 6 9 1 1 3 011111
HUFFMAN CODING 169
Synchronisation markers
A video decoder may require to resynchronise in the eventof an error or interruption to the
stream of coded data. Synchronisation markers in the bit stream provide a means of doing
this. Typically, the differential predictions mentioned above (DC coefficient, motion vectors
and quantisation parameter) are reset after a synchronisation marker, so that the data after the
marker may be decoded independentlyof previous (perhaps errored) data. Synchronisation is
supported by restart markers in JPEG, group of block (GOB) headers in baseline H.263 and
MPEG-4 (at fixed intervals within the coded picture) and slice start codes in the MPEG-1,
MPEG-2 and annexes to H.263 and MPEG-4 (at user definable intervals).
Higher-level headers
8.3.1 ‘True’HuffmanCoding
In order to achieve the maximum compression of a set of data symbols using Huffman
encoding, it is necessary to calculate the probabilityof occurrence of each symbol. A set of
variablelengthcodewordsisthenconstructed for thisdataset. This processwillbe
illustrated by the following example.
A video sequence, ‘Carphone’, was encoded with MPEG-4 (short header mode). Table 8.1
liststheprobabilities of themostcommonlyoccurringmotionvectorsintheencoded
0.9
0.8
0.7
0.6
L.
-
._
.-
a
2 0.5
2
a
0.4
0.3
0.2
0.1
O L
-3 -2 -1 0 1 2 3
MVX or MVY
sequence and their information content, 10g2(1/P). To achieve optimum compression, each
value should be represented with exactly 10g2(llP) bits.
The vector probabilities are shown graphicallyin Figure 8.4 (the solid line).‘0’ is the most
common value and the probability drops sharply for larger motion vectors. (Note that there
are a small numberof vectors larger than+/ - 1.5 and so the probabilities in the table do not
sum to l.)
To generate a Huffman code table for this set of data, the following iterative procedure is
carried out (we will ignore any vector values that do not appear in Table 8.1):
Figure 8.5 Generating the Huffman code tree: ‘Carphone’ motion vectors
The procedure isrepeated until there is a single ‘root’ node that contains all other nodes and
data items listed ‘beneath’ it. This procedure is illustrated in Figure 8.5.
0 Original list: The data items are shown as square boxes. Vectors ( - 1S ) and (1S ) have
the lowest probability and these are the first candidates for merging to form node ‘A’.
0 Stage 1: The newly created node ‘A’ (shown as a circle) has a probability of 0.03 (from
the combined probabilities of ( - 1.5) and (1.5)) and the two lowest-probability items are
vectors ( - l ) and (1). These will be merged to form node ‘B’.
0 Stage 2: A and B are the next candidates for merging (to form ‘C’).
0 Stage 3: Node C and vector (0.5) are merged to form ‘D’.
0 Stage 4: (-0.5) and D are merged to form ‘E’.
0 Stage 5: There are two ‘top-level’ items remaining: node E and the highest-probability
vector (0). These are merged to form ‘F’.
0 Final tree: The data itemshave all been incorporated into a binary ‘tree’ containing seven
data values and six nodes. Each data item is a ‘leaf’ of the tree.
2. Encoding
Each ‘leaf’ of the binary tree is mapped to a VLC. To find this code, the tree is ‘traversed’
from the root node (F in this case) to the leaf (data item). For every branch, a 0 or 1 is
appended to the code:0 for an upper branch,1 for a lower branch (shown in thefinal tree of
Figure 8.5). This gives the following set of codes(Table 8.2). Encoding is achieved by
transmittingtheappropriate codeforeachdataitem.Note thatoncethetree has been
generated, the codes may be stored in a look-up table.
172 ENTROPY CODING
1. High probability data items are assigned short codes (e.g. 1 bit for the most common
vector ‘0’). However,thevectors ( - 1.5,1.5, - 1 , 1)areeach assigned5-bitcodes
(despite the fact that - 1 and - 1 have higher probabilities than 1.5 and 1.5). The lengths
of the Huffman codes (each an integral number of bits) do not match the ‘ideal’ lengths
given by log,( l/P).
2. No code contains any other code as a prefix,i.e. reading from the left-hand bit, each code
is uniquely decodable.
For example, the series of vectors (1, 0, 0.5) would be transmitted as follows:
3. Decoding
In order to decode the data, the decoder must have a local copy of the Huffman code tree (or
look-up table). This may be achieved by transmitting the look-up table itself,or sending the
list of data and probabilities, prior to sendingthe coded data. Each uniquely decodable code
may then be read and converted back to the original data. Following the example above:
Repeatingtheprocessdescribedaboveforthevideosequence‘Claire’givesadifferent
result. Thissequencecontains less motionthan‘Carphone’and so thevectorshavea
different distribution (shown in Figure8.4, dotted line). A much higher proportionof vectors
are zero (Table 8.3).
The corresponding Huffman tree is given in Figure 8.6. Note that the ‘shape’ of the tree
has changed (because of the distribution of probabilities) and this gives a different set of
HUFFMAN CODING 173
Figure8.6 Huffmantreefor‘Claire’motionvectors
Huffman codes (shown in Table 8.4). There are still six nodes in the tree, one less than the
number of data items (seven): this is always the case with Huffman coding.
If the probability distributions are accurate, Huffman coding provides a relatively compact
representation of the original data. In these examples, the frequently occurring (0) vector is
represented very efficiently as a single bit. However, to achieve optimum compression, a
separate code table is required for each of the two sequences ‘Carphone’ and ‘Claire’. The
loss of potential compression efficiency due to the requirement for integral length codes is
very obvious for vector ‘0’ in the ‘Claire’ sequence: the optimum number of bits (information
content) is 0.07 but the best that can be achieved with Huffman coding is 1 bit.
The Huffman coding process described above has two disadvantages for a practical video
CODEC. First, the decoder must use the same codeword set as the encoder. This means that
the encoder needs to transmit the information contained in the probability table before the
decoder can decode the bit stream, an extra overhead that reduces compression efficiency.
Second, calculating the probability table for a large video sequence (prior to generating the
Huffman tree) is a significant computational overhead and cannot be done until after the
video data is encoded. For these reasons, the image and video coding standards define sets of
codewords based on the probability distributionsof a large range of video material. Because
the tables are ‘generic’, compression efficiency is lower than that obtained by pre-analysing
the data to be encoded, especially if the sequence statistics differ significantly from the
‘generic’ probability distributions. The advantage of not requiring to calculate and transmit
individual probability tables usually outweighs this disadvantage. (Note: Annex C of the
original JPEG standard supports individually calculated Huffman tables, but most practical
implementations use the ‘typical’ Huffman tables provided in Annex K of the standard.)
8.3.3Table Design
The following two examples of VLC table design are taken from the H.263 and MPEG-4
standards. These tables are required for H.263 ‘baseline’ coding and MPEG-4 ‘short video
header’coding.
H.263andMPEG-4use‘3-dimensional’coding of quantisedcoefficients,whereeach
codeword represents a combination of (run,level,last) as described in Section 8.2.1. A
total of 102 specific combinations of (run, level, last) haveVLCs assigned to them. Table 8.5
shows 26 of these codes.
A further 76 VLCs are defined, each up to 13 bits long. Note that the last bit of each
codeword is the sign bit ‘S’, indicatingthesign of the decoded coefficient (O=positive,
1 = negative). Any (run, level, last) combination that is not listed in the table is codedusing
an escape sequence, a special ESCAPE code (000001 1 ) followed by a 13-bit fixed length
code describing the values of run, level and last.
The codes shown in Table 8.5 are represented in ‘tree’ form in Figure 8.7. A codeword
containingarun of morethaneightzerosisnotvalid, so anycodewordstartingwith
000000000. . . indicates an error in the bit stream (or possibly a start code, which begins
with a long sequence of zeros, occurring at an unexpected position in the sequence). All
other sequences of bits can be decoded as valid codes. Note that the smallest codes are
HUFFMAN CODING 175
allocated to short runs and small levels (e.g. code ‘10’ represents a run of 0 and a level of
+/- l), since these occur most frequently.
H.263/MPEG-4 motion vector difference (MVD)
The H.263MPEG-4 differentially coded motion vectors (MVD) described in Section 8.2.2
are each encoded as a pairof VLCs, one for the x-component and one for the y-component.
Part of the table of VLCs is shown in Table 8.6 and in ‘tree’ form in Figure 8.8. A further
49 codes (8-13 bits long) are not shown here. Note that the shortest codes represent small
motion vector differences (e.g. MVD = 0 is represented by a single bit code ‘l’).
The emerging H.26L standard takes a step away from individually calculated Huffman tables
by using a ‘universal’ set of VLCs for any coded element. Each codeword is generated from
176 ENTROPY CODING
000000000X (error)
000001 1 (escape)
0 1D
B
1 ...19 codes
0010000 (1,8,1)
0010001 (1,7,1)
0010010 (1,6,1)
D T 0010011 (1,5,1)
0 0010100 (O,lZ,l)
Start ~ 0010101 ( 0 , l l . l )
1 m 0010110(0,10,1)
00101 11 (0,0.4)
001100 (1,4,1)
001101 (1,3,1)
1 001110(1.2,1)
001111 ( l , l , l )
010000 (0,9,1)
ti-
if-
010001 (0,8,1)
010010 (0,7,1)
010011 (0,6,1)
010100 (0,1.2)
010101 (0,0,3)
0101 1 (0,5,1)
01 100 (0,4,1)
01 101 (0,3.1)
0111 (l,O,l)
10 (O,O,1)
‘. 1 :::9
110 (O,l,l)
...
where xk is a single bit. Hence there is one l-bit codeword; two 3-bit codewords; four 5-bit
codewords; eight 7-bit codewords; andso on. Table 8.7 shows the first 12 codes and these are
represented in tree form in Figure 8.9. The highly regular structure of the set of codewords
can be seen in this figure.
Any data element to be coded (transform coefficients, motion vectors, block patterns, etc.)
is assigned a code from the list of UVLCs. The codes are not optimised for a specific data
element (since the same set of codes is used for all elements): however, the uniform, regular
structure considerably simplifies encoder and decoder design since the same methods can be
used to encode or decode any data element.
8.3.4 EntropyCodingExample
This examplefollows the process of encoding and decoding a block of quantised coefficients
in an MPEG-4 inter-coded picture. Only six non-zero coefficients remain in the block: this
178 ENTROPY CODING
...39 codes
...10 codes
A
ooooo11o (3.5)
T om001 1 1 (-3.5)
aoooO1ooO (3)
oooo1001 (-3)
- b
1
oooO1010 (2.5)
T ooO01011 (-2.5)
A
ooo0110 (+2)
m 1 1 1 (-2)
Start ~
0
1
i 010 (+0.5)
011 (-0.5)
...etc
0000001 (7)
...etc
0000011 (8)
...etc
0001001 (9)
...etc
-
ooo1011 (10)
0001 1 (4)
c
001 (1)
...etc
01oooo1 (11)
...etc
1 01001 (5)
O l o o o l l (12)
...etc
0101001 (13)
...etc
0101011 (14)
- 1 (0)
- 011 (2)
0101 1 (6)
wouldbecharacteristic of eitherahighlycompressedblockorablockthathasbeen
efficiently predicted by motion estimation.
4 - 1 0 2 - 3 0 0 0 0 0 - 1 0 0 0 1 0 o...
TCOEF variable length codes: (from Table 8.5: note that the last bit is the sign)
00101110; 101; 0101000; 0101011; 010111; 0011010
Decoding of this sequence proceeds as follows. The decoder ‘steps’ through the TCOEF tree
(shown in Figure 8.7) until it reaches the ‘leaf’ 00101 11. The next bit (0) is decoded as the
sign and the (last, run, level) group (0, 0, 4) is obtained. The steps taken by the decoder for
this first coefficient are highlightedin Figure 8.10. The process is repeated with the ‘leaf’ 10
followed by sign (1) and so on until a ‘last’ coefficient is decoded. The decoder can now fill
the coefficient array and reverse the zigzag scan to restore the array of 8 x 8 quantised
coefficients.
8.3.5 VariableLengthEncoderDesign
Sofiware design
001OOOO (1,8,1)
0010001 (1,7,1)
0010010(1,6,1)
0010011 (1,5,1)
0 0010100 (0,12,1)
Start 0010101 (O,ll,l)
0010110 (O,lO,l)
0010111 (0,0,4)
f o r eachdatasymbol
findthecorrespondingVLCvalueandlength( i n b i t s )
packthisVLCintoanoutputregisterR
if the contents of RexceedLbytes
writeL (least significant) bytestotheoutputstream
s h i f t R by L b y t e s
Example
Using the entropy encoding example above, L = 1 byte, R is empty at start of encoding:
182 ENTROPY CODING
Thefollowingpackedbytesarewritten totheoutputstream:00101110,01000101,
10101 101,0010111 1. At the endof the above sequence, the output register R still contains
6 bits (001101). If encoding stops here, it willbe necessary to ‘flush’ the contents of R to
the output stream.
The MVD codes listed inTable 8.6 can be stored in a simple look-up table. Only64 valid
MVD values exist and the contents of the look-up table are as follows:
1. Large look-up table indexed by (last, run, level). The size of this table may be reduced
somewhatbecauseonlylevelsin therange 1-12 and runs in therange 0-40 have
individual VLCs. The look-up procedure is as follows:
i f ( \ l e v e l ] < 1 3 a n d r u n3<9 )
lookuptablebasedon (last, run, level)
returnindividualVLCor calculateEscape sequence
else
calculate Escape sequence
i f ( l a s t , r u n , l e v e l ) E { s e t A}
l o o k up t a b l e A
returnVLCor calculateEscape sequence
e l s e i f ( l a s t , r u n , l e v e l ) E { s e t B}
l o o k up t a b l e B
returnVLCor calculateEscapesequence
....
else
calculate Escape sequence
CODING HUFFMAN 183
For example, earlier versions of the H.263 ‘test model’ software used this approach to
reduce the number of entries in the partitioned look-up tables to 200 (i.e. 102 valid VLCs
and 98 ‘empty’ entries).
3. Conditional expression for every valid combination of (last, run, level). For example:
Comparing the three methods, method 1 lends itself to compact code, is easy to modify (by
changing the look-up table contents) and is likely beto computationally efficient; however, it
requires a large look-up table, most of which is redundant. Method 3, at the other extreme,
requires the most code and is the most difficult to change (since each valid combination is
‘hand-coded’) but requires the least data storage. On some platforms it may be the slowest
method. Method 2 offers a compromise between the other two methods.
Hardware design
byte or
word stream
calculate VLC
h table VLC select
8.3.6 VariableLengthDecoderDesign
Software design
Perhaps the most straightforward way of finding a valid VLC is to step through the relevant
Huffman code tree. For example, a H.263 / MPEG-4 TCOEF code may be decoded by
stepping through the tree shown in Figure 8.7, starting from the left:
i f ( f i r s t b i t= 1)
i f ( s e c o n d b i t = 1)
i f ( t h i r d b i t = 1)
if (fourthbit=l)
r e t u r n (0,0,2)
else
r e t u r n (0,2,1)
else
r e t u r n (O,l,l)
else
r e t u r n (0,0,1)
else
_ _ _ decode a l l VLCs s t a r t i n g w i t h 0
This approach requires a large nested if. . . else statement (or equivalent) that can deal with
104cases(102uniqueTCOEFVLCs,oneescapecode,plusanerrorcondition).This
method leads to a large code size, may be slow to execute and is difficult to modify (because
the Huffman tree is ‘hand-coded’ into the software); however, no extra look-up tables are
required.
An alternative is touse one or more look-up tables. The maximum lengthof TCOEF VLC
(excluding the sign bit and escape sequences) is 13 bits. We can construct a look-up table
whose index is a 13-bit number (the 13 Isbs of the input stream). Each entry of the table
contains either a (last, run, level) triplet or a flag indicating Escape or Error; 213= 8192
entries are required, most of which will be duplicates of other entries. For example, every
code beginning with ‘10. . .’ (starting with the Isb) decodes to the triplet (0, 0, 1).
An initial test of the range of the 13-bit number maybe used to select one of a number of
smaller look-up tables. For example, the H.263 reference model decoder described earlier
breaks the table into three smaller tables containing around 300 entries (about 200 of which
are duplicate entries).
HUFFMAN CODING 185
1 4
inputd
input
register
bitstream
Shift
one or
Ezrrbits
::1
Find VL
code
data unit
Hardware design
Hardware designs for variable length decoding fall into two categories: (a) those thatdecode
n bits from the input stream every m cycles (e.g. decoding 1 or 2bits per cycle) and (b) those
that decode n complete VL codewords every m cycles (e.g. decoding 1 codeword in one or
two cycles). The basic architecture of a decoder is shown in Figure 8.12 (the dotted line
‘code length L is only required for category (b) decoders).
Hence the decoder processes 1 bit per cycle (assuming that a state transition occursper clock
cycle).
186 ENTROPY CODING
Category (b), n codewords per m cycles This is analogous to the ‘large look-up table’
approach in a software decoder. K bits (stored in the input shift register) are examined per
cycle, where K is the largest possible VLC size (13, excluding the sign bit, in the example of
H.263MPEG-4 TCOEF). The‘Find VL code’ unitin Figure 8.12 checks all combinationsof
K bits and finds a matching valid code, Escape code or flags an error. The length of the
matching code (L bits) is fed back and the shift register shifts the input data by L bits (i.e.
L bitsareremovedfromtheinputbuffer).HenceacompleteL-bitcodewordcan be
processed in one cycle.
The shift register can be implemented using a barrel shifter (a shift-register circuit that
shifts its contents by L places in one cycle). The ‘Find VL code’ unit may be implemented
using logic (a PLA). The logic array should minimise effectively sincemost of the possible
inputcombinationsare‘don’tcares’. In theTCOEFexample, all 13-bitinputwords
‘IOXXXXXXXXXXX’ map to the output (0, 0, l). It is also possible to implement this
unit as a ROM or RAM look-up table with 213 entries.
A decoder that decodes one codeword per cycle is described by Lei and Sun2 and Chang
and Me~serschmitt~ examine the principles of concurrent VLC decoding. Further examples
of VL decoders can be found elsewhere.526
0
Start
1
c9
11 11 (0,0,2)
I
decoding errors may continue to occur (propagate) until a resynchronisation point occursin
the bit stream. The synchronisation markers describedin Section 8.2.2 limit the propagation
of errors at the decoder. Increasing the frequency of synchronisation markers in the bit
streamcanreducetheeffect of anerroronthedecodedimage:however,markersare
‘redundant’ overhead and so this also reduces compression efficiency. Transmission errors
and their effect on coded video are discussed further in Chapter 11.
Error-resilient alternatives to modified Huffman codes have been proposed. For example,
MPEG-4 (video) includes an option touse reversible variable length codes(RVLCs), a class
of codewords that may be successfully decoded in either a forward or backward direction
from a resynchronisation point. Whenan error occurs, it is usually detectable by the decoder
(since a serious decoder error is likely to violate the encoding syntax). The decoder can
decode the current section of data in both directions, forward from the previous synchro-
nisation point and backward from the next synchronisation point. Figure 8.14 shows an
example. Region (a) is decoded and then an error is identified. The decoder ‘skips’ to the
188 ENTROPY CODING
Example
Table 8.9 lists five motion vector values ( - 2 , - 1, 0, 1, 2 ) . The probability of occurrence of
each vector is listed in the second column. Each vector is assigned a subrange within the
Encoding procedure
Subrange Range
Encoding
procedure (L + H) Symbol (L + H) Notes
1. Set the initial range 0 + 1.0
2. For the first data (0) 0.3 + 0.7
symbol, find the
corresponding subrange
(low to high).
3. Set the new range (1) 0.3 + 0.7
to this subrange
4. For the next data symbol, (- 1)0.1 + 0.3 This is the subrange
find the subrange L to H within the interval 0-1
5. Set the new range (2) to 0.34 + 0.42 0.34 is 10% of the range;
this subrange within the 0.42 is 30% of the range
previous range
6. Find the next subrange (0) 0.3 + 0.7
7. Set the new range (3) 0.364 + 0.396 0.364 is 30% of the range;
within the previous range 0.396 is 70% of the range
8. Find the next subrange (2)
0.9 + 1.0
9. Set the new range (4) 0.3928 + 0.396 0.3928 is 90%of the range;
within the previous range 0.396 is 100% of the range
Each time a symbol is encoded, the range (L to H) becomes progressively smaller.At the end
of the encoding process (four steps in this example), we are left with a final range (L to H).
The entire sequence of data symbols can be fully represented by transmitting a fractional
number that lies within this final range. In the example above, we could send any number in
Total range
t
0.7 0.3 0 0.1 1
190 CODING ENTROPY
the range 0.3928-0.396: for example, 0.394. Figure 8.16 shows how the initial range (0-1) is
progressivelypartitionedintosmallerranges as eachdatasymbolisprocessed.After
encoding the first symbol (vector 0), the new range is (0.3, 0.7). The next symbol (vector -1)
selects the subrange (0.34, 0.42) which becomes the new range, andso on. The final symbol
(vector +2) selects the subrange (0.3928, 0.396) and the number 0.394 (falling within this
range) is transmitted. 0.394 can be represented as a fixed-point fractional number using 9
bits, i.e. our data sequence (0, - l , 0, 2) is compressed to a 9-bit quantity.
Decoding procedure
The sequence of subranges (and hence the sequence of data symbols) can be decoded from
this number as follows.
The principal advantage of arithmetic coding is that the transmitted number (0.394 in this
case, which can be represented as a fixed-point number with sufficient accuracy using 9 bits)
is not constrained to an integral number of bits for each transmitted data symbol.To achieve
optimal compression, the sequence of data symbols should be represented with:
In this example, arithmetic coding achieves 9 bits which is close to optimum. A scheme
using an integral number of bits for each data symbol (such as Huffman coding) would not
come so close to theoptimumnumber of bits and in general,arithmeticcodingcan
outperform Huffman coding.
ARITHMETIC CODING 191
0 0.1 0.9 1
4,
0.3 0.34 0.42
1
(2)
Figure8.16 Arithmeticcodingexample
A number of practical issues need to be taken into account when implementing arithmetic
coding in software or hardware.
Probability distributions
As with Huffman coding, it is not always practical to calculate symbol probabilities prior to
coding. In several video coding standards(e.g. H.263, MPEG-4, H.26L), arithmetic coding is
provided as anoptionalalternativetoHuffmancodingandpre-calculatedsubranges
are defined by thestandard(basedon‘typical’probabilitydistributions).This has the
advantage of avoiding the need to calculate and transmit probability distributions, but the
disadvantage that compression will be suboptimal for a video sequence that doesnot exactly
follow the standard probability distributions.
Termination
In our example, we stopped decoding after four steps. However, there is nothing containedin
the transmitted number (0.394) to indicate the number of symbols that must be decoded: it
could equally be decoded as three symbolsor five. The decoder must determine when to stop
decoding by some other means. In thearithmeticcodingoption specified in H.263, for
example, the decoder can determinethenumber of symbolstodecodeaccordingtothe
syntax of the coded data. Decodingof transform coefficients in a block halts when an end-of-
block code is detected. Fixed-length codes (such as picture start code) are included in the bit
stream and these will ‘force’ the decoder to stop decoding (for example, if a transmission
error has occurred).
192 ENTROPY CODING
Fixed-point arithmetic
Floating-point binary arithmetic is generally less efficient than fixed-point arithmetic and
some processors do not support floating-point arithmetic at all. An efficient implementation
with fixed-point arithmetic can be achieved by specifying the subranges as fixed-precision
binary numbers. For example, in H.263, each subrange is specified as an unsigned 14-bit
integer(i.e.atotalrange of 0-16383). Thesubrangesforthe differentialquantisation
parameter DQUANT are listed as an example:
Incremental encoding
Patent issues
A number of patents have been filed that cover aspects of arithmetic encoding (such as
IBM’s‘Q-coder’arithmeticcoding algorithm”). It is not entirelyclearwhetherthe
arithmetic coding algorithms specified in the image and video coding standards are covered
by patents. Some developers of commercial video coding systems have avoided the use of
arithmeticcodingbecause of concernsaboutpotentialpatentinfringements,despiteits
potential compression advantages.
8.5 SUMMARY
An entropy coder maps a sequence of data elements to a compressed bit stream, removing
statistical redundancy in the process. In a block transform-based video CODEC, the main
REFERENCES 193
data elements are transform coefficients (run-level coded to efficiently represent sequences
of zero coefficients), motion vectors (which may be differentiallycoded)andheader
information. Optimum compression requires the probability distributions of the data to be
analysed prior to coding; for practical reasons, video CODECs use standard pre-calculated
look-up tables for entropy coding.
The two most popularentropycodingmethodsfor video CODECsare ‘modified’
Huffman coding (in whicheachelementismappedtoaseparate VLC) and arithmetic
coding(in which a series of elements arecoded to formafractionalnumber). Huffman
encoding may be carried using a series of tablelook-upoperations; a Huffmandecoder
identifies each VLC and this is possible because the codes are designed such that no code
forms the prefix of any other. Arithmetic coding is carried out by generating and encoding a
fractional number to represent a series of data elements.
This concludes the discussion of the main internal functions of a video CODEC (motion
estimation and compensation, transform coding and entropy coding). The performance of a
CODEC in a practical video communication system can often be dramatically improved by
filtering the source video (‘pre-filtering’) and/or the decoded video frames (‘post-filtering’).
REFERENCES
1. D. A. Huffman, ‘A method for the construction of minimum-redundancy codes’,Proceedings ofthe
Institute of Electrical and Radio Engineers, 40(9), September 1952.
2. S. M. Lei and M-T. Sun, ‘An entropy coding system for digital HDTV applications‘, IEEE Trans.
CSW, 1(1), March 1991.
3. Hao-ChiehChang,Liang-GeeChen,Yung-ChiChangandSheng-ChiehHuang, ‘A VLSI archi-
tecture design of VLC encoder for high data rate videohmage coding’, 1999 IEEE International
Symposium on Circuits and Systems (ISCAS ’99).
4. S. F. Chang and D. Messerschmitt, ‘Designing high-throughput VLC decoder, Part I-concurrent
VLSI architectures’, IEEE Trans. CSVT, 2(2),June 1992.
5 . J. Jeon, S. Park and H. Park, ‘A fast variable-length decoder using plane separation’, IEEE Trans.
CSVT, 10(5),August 2000.
6. B-J. Shieh, Y-S. Lee and C-Y. Lee,‘A high throughput memory-based VLC decoder with codeword
boundary prediction’, IEEE Trans. CSW, lo@),December 2000.
7. A. KopanskyandM.Bystrom,‘SequentialdecodingofMPEG-4codedbitstreamsforerror
resilience’, Proc. Con$ on Information Sciences and Systems, Baltimore, 1999.
8. J. Wen and J. Villasensor, ‘Utilizing soft information in decoding of variable length codes’, Proc.
IEEE Data Compression Conference, Utah, 1999.
9. S. KaiserandM.Bystrom,‘Softdecodingofvariable-lengthcodes’, Proc. IEEE International
Communications Conference, New Orleans, 2000.
IO. I. Witten, R. Neal and J. Cleary, ‘Arithmetic coding for data compression’, Communications ofthe
ACM, 30(6), June 1987.
1 1. J. Mitchell and W. Pennebaker, ‘Optimal hardware and software arithmetic coding procedures for
the Q-coder’, IBM Journal of Research und Development, 32(6), November 1988.