0% found this document useful (0 votes)
28 views62 pages

Arithmetic, Run Length, Compression

Uploaded by

19PD34 - SNEHA P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
28 views62 pages

Arithmetic, Run Length, Compression

Uploaded by

19PD34 - SNEHA P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 62
Outline = Huffman Coding = Arithmetic Coding = Lempel Ziv Coding = Run Length Coding = Examples Huffman Coding Each time we perform the combination of two symbols we reduce the total number of symbols by one. Whenever we tie together two probabilities (nodes) we label the two branches with a ‘1’ and a ‘0’. Continue the procedure until only one probability is left {and it should be 1 if your addition is right!). This completes the construction of the Huffman tree. High Poort Pai * Pry Working blockwise * In the above examples. encoding is done symbol by symbol. A more efficient procedure is to encode blocks of B symbols at a time. In this case the bounds of the source coding theorem ON (Bos Example ‘Suppose we wish to code the string: 101011011010101011. We will begin by parsing it into comma-separated phrases that represent strings that can be represented by a previous string as a prefix, plus a bit The first bit, a 1, has no predecessors, so, it has a null prefix string andthe one extra bit is itself: 1, 01011011010101011 ‘The same goes for the 0 that follows since it can't be expressed in terms of the only existing prefix: 1, 0, ee Example So far our dictionary contains the strings ‘1’ and ‘0’. Next we encounter a 1, but it already exists in our dictionary. Hence we proceed further. The following 10 is obviously a combination of the prefix 1 and a 0, so we now have: 4, 0, 10, 1101101010101 Continuing in this way we eventually parse the whole string as follows. 4, 0, 10, 11, 01, 101, 010, 1011 Now, since we found 8 phrases, we will use a three bit code to label the null phrase and the first seven phrases for a total of 8 numbered phrases (with the ninth and last phrase we found being expressed in the others, hence not needing to be numbered). Exampie 4,0, 10, 11, 01, 101, 010, 1011 * Now, since we found 8 phrases, we will use a three bit code to label the null phrase and the first seven phrases for a total of 8 numbered phrases (with the ninth and last phrase we found being expressed in the others, hence not needing to be numbered). + Next, we write the string in terms of the number of the prefix phrase plus the new bit needed to create the new phrase. ‘Wee will use parentheses and commas to separate these at first, in order to aid our visualization of the process. * The eight phrases can be described by: Example 1, 1),(010,1),(014,1),(101 ,0),(110. » Itcan be read out as: (codeword at location 0,1), (codeword at location 0,0), (codeword at location 1,0), (codeword at location 1,1), (codeword at location 2.1), (codeword at location 3.1) .. = Thus the coded version of the above string is: & Example _(000.4,(000.0),(004.0)(004.1).(040.1) (011.1).101.0)(110.1) Dictionary | Dictionary | Fixed Length Location content Codeword 1 001 1 0001 2 010 0 0000 3 oll 10 0010 le 4 100 ul ool 5 101 or 101 6 110 101 oll im ui 10 1010 - Toil 1101 Length of the Table What should be the length of the table? In practical application, regardless of the length of the table, it will eventually overflow. This problem can be solved by pre-deciding a large enough size of the dictionary. The encoder and decoder can update their dictionaries by periodically substituting the less used phrases from their dictionaries by more frequently used ones. Run Len Encodin: Run-length Encoding, or RLE is a technique used to reduce the size of a repeating string of characters. This repeating string is called a run. RLE can compress any type of data regardless of its information content, but the content of data to be compressed affects the compression ratio. Run-length encoding is supported by most bitmap file formats such as TIFF, JPG, BMP, PCX and fax machines. xample Consider the following bit stream: 11111111111111100000000000000000001 111. This can be represented as: fifteen 1’s, nineteen 0°s, four V's. ne. (15.1). (19. 0). (4.1). a Since the maximum number of repetitions is 19. which can be represented with 5 bits. we can encode the bit stream as (OLL11.1). (10011.0). (00100. 1). The compression ratio in this case is 18:38 = 1:2.11 Optimum Quantizer Consider a continuous amplitude signal whose amplitude is not uniformly distributed, but varies according to a certain probability density function, p(x). We wish to design the optimum scalar quantizer that minimizes some function of the quantization error g = x —x, where is the quantized value of x. The distortion resulting due to the quantization can be expressed as D=[. fR—x)ploids where /(%—x) is the desired function of the error. Optimum Quantizer = Anoptimum quantizer is one that minimizes D by optimally selecting the output levels and the corresponding input range of each output level. = The resulting optimum quantizer is called the Lioyd-Max quantizer. = Foran Level quantizer the distortion is given by pre D=D" SG —npeodr Optimum Quantizer = The necessary conditions for minimum distortion are obtained by differentiating D with respect to { x,} and {*,}. * Asa result of the differentiation process we end up with the following system of equations f@-=)=fG,.—m) F=12,..[-1 [FGu-p@id, k= 12,1 imum intizer The non uniform quantizers are optimized with respect to the distortion. However, each quantized sample is represented by equal number of bits (say, 2 bits/sample). It is possible to have a more efficient variable length coding. The discrete source outputs that result from quantization can be characterized by a set of probabilities p,. These probabilities can then be used to design efficient variable length codes (source céding). In order to compare the performance of different nonuniform quantizers, we first fix the distortion, D, and cli the average number of bits required per ‘sample. Exampte een ree = Consider an eight level quantizer for a Gaussian random variable. * The random variable has zero mean and variance equal to unity. * Fora mean square error minimization: Level, k x, Pex) Huffman xe * | code 1 —1748 =2.152 0.040 | 0010 2 =1.050_ | =1344 | 0107 | O11 3 =0500 | -0756 | 0162 | 010 4 0 =0.245 | 0.191 10 5 0.500 0.245 0.191 11 6 1.050 0.756 0.162 007 T 1.748 1.344 0.107 | 0000 8 2.152 0.040 | 0011 Example = Forthese values, D = 0.0345 which equals —14.62 dB. = The number of bits/sample for this optimum 8-level quantizer is R = 3. * On performing Huffman coding, the average number of bits per sample required is R,, = 2.88 bits/sample. = The theoretical limit is H(X) = 2.82 bits/sample. Level, k EA EA os 1 = 1748 2.152 a (0010 2 = 1.050 1344 0.107 on 3 =0.500 =0756 0.162 "010 4 0 = 0.245 0.191 10 5 0.500 0.245 0.191 it 6 1.050. (0.756 0.162 00% Zz 1.788. 1.344 0.107 ‘0000 8 = 2182, (0.040 001t Entropy Rate = Asimple extension of the source coding theorem tells us that n//(\) bits are sufficient, on an average. to describe independent and identically distributed random variable, each with entropy H(X). |= But. in the real world, we do encounter Ttandom variables that are dependent. = What if the random variables form a stationary process? Entropy Rate = Ifwe have a sequence of » random variables, it is interesting to explore how the entropy of the sequence grows with 7. * Entropy rate is used to define this rate of growth. = The Entropy Rate of a stochastic process X is given by H(X)=limH(X,..,,4 X,) provided the limit exists.” 2 For Stationary Markov Chain, the entropy rate is given by H(X)= lim H(X, | X_.A X,)= lim H(X, |X_,)= H(X.|X) Example * For stationary distribution. the net probability distribution across any cut set in the state transition graph should be zero. = Let wand be the stationary probabilities of the two states. = The stationary distribution is given b¥ Ps PAPs Ps = The entropy of the state_Y,, at time 7 will be Application of Source Coding Lets consider the lossless compression option of the Joint Photographic Experts Group (JPEG) image compression standard. The JPEG image compression standard is actually a description of 29 distinct coding systems for compression of images. Why are there so many approaches? Itis because the needs of users vary so much with respect to quality versus compression and compression computation time ~ We shall briefly discuss here two methods that use entropy coding. JPEG Compression = The two lossless JPEG compression options differ only in the form of the entropy code that is applied to the innovations data. = The user can choose to use either a Huffman code or an Arithmetic code. = Some compression can be achieved if we can predict the next pixel using the previous pixels. = In this way we just have to transmit the prediction coefficients (or difference in the values) instead of the entire pixel. JPEG Compression: DCT yy = a YRD= Av, i) Zaisn)oo{ 2 a7+0) oo 2N 2M * All DCT multiplications are real. = This lowers the number of required multiplications. as compared to the diserete Fourier transform. = For most images. much of the signal energy lies at low frequencies. which appear in the upper left comer of the DCT. = The lower right values represent higher frequencies. and are often small (usually small enough to be neglected with little visible distortion). JPEG Compression © = DCTis applied to 8 by 8 pixel blocks of the image. = Hence, if the image is 256 by 256 pixels in size, we break it into 32 by 32 square blocks of 8 by 8 pixels and treat each one independently. = The 64 pixel values in each block are transformed by the DCT into a new set of 64 values. = These new 64 values, known also as the DCT coefficients, form a whole new way of representing an image. = The DCT coefficients represent the spatial frequency of the image sub-block. JPEG Compression © Differential Coding between neighboring blocks a (@p2 3.01 [2.41 (Or 301 [241 + 274 | 211 |1.92 |1.55 221/213 [145 [1.29 | Huffman Coding 241 | 133/032 0.11 25 |143 |0.21 |0.57 1.62 |0.44 | 0.03 | 0.02 109 |0.41 |0.46 | 0.97

You might also like