Information Theory Module 4
Information Theory Module 4
Dr. Markkandan S
1
LEMPEL-ZIV ALGORITHM
Lempel-Ziv Algorithm
• Huffman coding requires symbol probabilities. But most real life scenarios do not
provide the symbol probabilities in advance. Huffman coding is optimal for DMS
source where the occurrence of one symbol does not alter the probabilities of the
subsequent symbols.
• It might be more efficient to use the statistical inter-dependence of the letters in
the alphabet along with their individual probabilities of occurrence.
• Lempel-ziv algorithm does not need the source statistics.
• It is variable -to-fixed length source coding algorithm and belongs to the class of
universal source coding algorithm
• Step-6: Construct the Dictionary as follows using the strings parsed. Since we
have 8 strings 3 bit bianry positions are used
String Position Number Position Number in Binary
1 1 001
0 2 010
10 3 011
11 4 100
01 5 101
101 6 110
010 7 111
1011 8 -
Step-9: Write the Codeword such a way that, Position Number of that prefix with last
bit of the string we considered
String Position Position No. Prefix Position No. Code
No. in Binary of prefix word
1 1 001 No 000 0001
0 2 010 No 000 0000
10 3 011 1 001 0010
11 4 100 1 001 0011
01 5 101 0 010 0101
101 6 110 10 011 0111
010 7 111 01 101 1010
1011 8 - 101 110 1101
The coded String for THIS IS HIS HIT is 0T 0H0I 0S0 3S5H6 2I 1T .
• Lempel-ziv algorithm is widely used in practice. The compress and uncompress
utilities of the UNIX operating system use a modified version of this algorithm.
• The standard algorithms for compressing binary files use codewords of 12 bits and
transmit 1 extra bit to indicate a new sequence.
• Using such code, Lempel-Ziv algorithm can compress transmissions of English text
by about 55 percent.
• Used to reduce the size of a repeating string of characters . This repeating string
is called run
• RLE encodes a run of symbols in to two bytes , a count and a symbol
• RLE can compress any type of data regardless of its information content, but the
content of data to be compressed affects the compression ratio.
• RLE cannot achieve high compression ratios compared to other compression
methods, but it is easy to implement and is quick to execute. It is supported by
most bit map file formats such as TIFF, JPG, BMP, PCX and FAX machines
• Although we live in an anlog world, most of the communication takes place in the
digital form. Since most natural sources are analog, they are first sampled,
quantized and then processed
• However, this representation of an arbitrary real number requires an infinite
number of bits. Thus a finite representation of a continuous random variable can
be never be perfect
• Consider an Analog message waveform x(t), which is a sample waveform of a
stochastic process X(t). Assuming X(t) is a band limited, statioanry process, it
can be represented by a sequence of non uniform samples taken at Nyquist rate.
• These samples are quantized in amplitude and encoded as a sequence of bianry
digits.
A simple encoding strategy can be used to define L levels and encode every sample
using
R = log2 L bits, if L is a power of 2 or
• That is, by trading off some of the quality of the image we might obtain lossy
compression, as opposed to the lossless compression
• Lossy compression can only be applied to data such as images and audio for which
human beings will tolerate some loss of fidelity.
• That is, by trading off some of the quality of the image we might obtain lossy
compression, as opposed to the lossless compression
• JPEG compression standard is actually a description of 29 distinct coding systems
fro compression images
There are eight prediction methods available in the JPEG coding standards. One of the
eight (which is the no prediction option) is not used for the lossless coding option that
we are examining here. The other seven may be divided into the following categories:
• Predict the next pixel on the line as having the same value as the last one.
• Predict the next pixel on the line as having the same value as the pixel in this
position on the previous line (that is, above it).
• Predict the next pixel on the line as having a value related to a combination of
the previous, above and previous to the above pixel values. One such combination
is simply the average of the other three.
Where input image is NXM Pixels, y(i,j) is the intensity of the pixel in row i and
column j.
• For most iamges, much of the signal energy lies at lower frequencies, which
appear in the upper left corner of the DCT.
• The lower right values represent higher frequencies, often small
• DCT is computationally intensive with complexity of O(N 2 ). Hence images are
divided in to blocks
Dr. Markkandan S Module-4 Non Probability based Source Coding 27/36
JPEG Standard for Lossy Compression - image Reduction
• Due to nature of most natural iamges, maximum energy lies in low frequency as
opposed to high frequency.
• For lossy compression following steps are followed
1. First the lowest weights are trimmed by setting them to zero
2. The remaining weights are quantized (that is, rounded off to the nearest of some
number of discrete code represented values), some more coarsely than others
according to observed levels of sensitivity of viewers to these degradations.
3. Then several losssless compression methods are applied. DC coeeficients, vary slowly
from one block to next block, Hence predicion is performed
4. We have to send One DC coeeficient and difference between DC coefficients of
surrounding blocks
• The purpose of Zig-Zag coding is that we gradually move from the low frequency
to high frequency, avoiding abrupt jumps in the values.
• Zig-Zag coding will lead to long runs of 0’s, which are ideal for RLE followed by
Huffman or Arithmetic Coding