Chapter 4 Multi
Chapter 4 Multi
1 04/14/2024
Outline
Introduction
Basics of Information Theory
Run-Length Coding
Variable-Length Coding (VLC)
2 04/14/2024
Introduction
DATA COMPRESSSION
Definition: Data compression is the process of encoding information using fewer number of bits so that
it takes less memory area (storage) or bandwidth during transmission.
Two types of compression:
Lossy data compression
Lossless data compression
Lossless Data Compression: In lossless data compression, the original content of the data is not
lost/changed when it is compressed (encoded).
Examples:
RLE (Run Length Encoding)
Dictionary Based Coding
Variable-Length Coding (VLC)
Arithmetic Coding
3 04/14/2024
Cont…
Lossy data compression: The original content of the data is lost to certain degree
when compressed. Part of the data that is not much important is discarded/lost.
The loss factor determines whether there is a loss of quality between the original
image and the image after it has been compressed and played back (decompressed).
The more compression, the more likely that quality will be affected.
Even if the quality difference is not noticeable, these are considered lossy compression
methods.
Examples:
JPEG (Joint Photographic Experts Group)
MPEG (Moving Pictures Expert Group)
ADPCM
5 04/14/2024
Cont. …
Doing some simple calculations (see below) it can be shown that with 24-bit color
video, with 640 by 480 resolutions, at 30 fps, requires an astonishing 26 megabytes of
data per second! Not only does this surpass the capabilities of the many home
computer systems, but also overburdens existing storage systems.
7 04/14/2024
Run-Length Coding
• Run length encoding (RLE) is a simple Consider the following bitmap "picture":
technique to compress digital data by
representing successive runs of the
same value in the data as the count
followed by the value, rather than the
original run of values.
• This encoding method is frequently
applied to images. RLE: 5*0 , 5*2 , 1*4 , 3*6 , 3*0 , 3*6
Example
WWWWWWWWWWWWBWWWWWWWW
WWWWBBBWWWWWWWWWWWWWWW
WWWWWWWWWBWWWWWWWWWWW
WWW
12W1B12W3B24W1B14W
8 04/14/2024
Cont. …
RL
C
9 04/14/2024
Cont. …
JPEG Zigzag ( RLE Cont. …
1 0 1 0 0 1
1 1 0 0 0 0
0 0 0 0 0
1
I 0 0 1 0 0 0
1 0 1 0 0 0
0 1 0 0 0 0
(1,1)(1, 0)(4,1)(4, 0)(1,1)(4, 0)(1,1)(2, 0)(1,1)(2, 0)(2,1)(13,
10
0) 10 04/14/2024
Cont. …
11 04/14/2024
Cont. …
12 04/14/2024
Variable-Length Coding (VLC)
Shannon-Fano Algorithm — a top-down approach
1. Sort the symbols according to the frequency count of their occurrences.
2. Recursively divide the symbols into two parts, each with approximately the same
number of counts, until all parts contain only one symbol.
An Example: coding of “HELLO”
Symbol H E L O
Count 1 1 2 1
13 04/14/2024
Cont. …
14 04/14/2024
Cont. …
Coding Tree for HELLO by Shannon-Fano.
15 04/14/2024
Cont. …
Result of Performing Shannon-Fano on HELLO
L 2 1.32 0 1
H 1 2.32 10 2
E 1 2.32 110 3
O 1 2.32 111 3
TOTAL # of bits: 10
16 04/14/2024
Cont. …
The probability is already arranged in non-increasing order. First we divide the message
into AB and CDE. Why? This gives the smallest difference between the total
probabilities of the two groups.
S1={A,B} P={0.35,0.17}=0.52
S2={C,D,E} P={0.17,0.16,0.15}=0.48
18 04/14/2024
Cont. …
The difference is only 0.52-0.48 = 0.04. This is the smallest possible difference when
we divide the message.
Attach 0 to S1 and 1 to S2.
Subdivide S1 into sub groups.
S11={A} attach 0 to this
S12={B} attach 1 to this
Again subdivide S2 into subgroups considering the probability again.
S21={C} P={0.17}=0.17
S22={D,E} P={0.16,0.15}=0.31
Attach 0 to S21 and 1 to S22. Since S22 has more than one letter in it, we have to
subdivide it.
S221={D} attach 0
S222={E} attach 1
19 04/14/2024
Cont. …
Huffman Coding
Huffman Coding Algorithm— a bottom-up approach
1. Initialization: Put all symbols on a list sorted according to their frequency counts.
2.Repeat until the list has only one symbol left:
(1)From the list pick two symbols with the lowest frequency counts. Form a
Huffman subtree that has these two symbols as child nodes and create a parent
node.
(2)Assign the sum of the children’s frequency counts to the parent and insert it into
the list such that the order is maintained.
(3)Delete the children from the list.
3.Assign a codeword for each leaf based on the path from the root.
20 04/14/2024
Cont. …
21 04/14/2024
Cont. …
new symbols P1, P2, P3 are created to refer to the parent nodes in the Huffman coding
tree. The contents in the list are illustrated below:
After initialization: L H E
O
After iteration (a): L P1 H
After iteration (b): L P2
After iteration (c): P3
22 04/14/2024
Cont. …
Another Example
23 04/14/2024
Cont. …
2. Optimality: minimum redundancy code - proved optimal for a given data model (i.e.,
a given, accurate, probability distribution):
• The two least frequent symbols will have the same length for their Huffman
codes, differing only at the last bit.
• Symbols that occur more frequently will have shorter Huffman codes than
symbols that occur less frequently.
• The average code length for an information source S is strictly less than
η + 1.
l 1
25 04/14/2024
Cont. …
Extended Huffman Coding
• Motivation: All codewords in Huffman coding have integer
lengths. It is wasteful when p iis very large and hence log 2 p1i is close
bit
to 0.
S (k ) {s s s , s s s ,, s s s , s s s s ,, s s s
}.
1 1 1 1 1 2 1 1 n 1 1 2 1
n n n
l
k
1
An improvement over the original Huffman coding, but
not much.
27 04/14/2024
Cont. …
Adaptive Huffman Coding
• Adaptive Huffman Coding: statistics are gathered and updated
dynamically as the data stream arrives.
ENCODER DECODER
------- -------
Initial_code(); Initial_code();
28 04/14/2024
Cont. …
Adaptive Huffman Coding (Cont’d)
• Initial_code assigns symbols with some initially agreed upon codes,
without any prior knowledge of the frequency counts.
(a) increments the frequency counts for the symbols (including any new ones).
(b) updates the configuration of the tree.
The encoder and decoder must use exactly the same initial_code and update_tree
routines.
29 04/14/2024
Cont. …
Notes on Adaptive Huffman Tree Updating
Nodes are numbered in order from left to right, bottom to top. The
numbers in parentheses indicates the count.
The tree must always maintain its sibling property, i.e., all nodes (internal and leaf)
are arranged in the order of increasing counts.
If the sibling property is about to be violated, a swap procedure is
invoked to update the tree by rearranging the nodes.
30 04/14/2024
Cont. …
32 04/14/2024
Cont. …
Table: Initial code assignment for AADCCDD using adaptive
Huffman coding.
Initial Code
---------------------
NEW: 0
A: 00001
B: 00010
C: 00011
D: 00100
..
..
..
• It is important to emphasize that the code for a particular symbol changes during the
adaptive Huffman coding process.
• For example, after AADCCDD, when the character D overtakes A as the most
frequent symbol, its code changes from 101 to 0.
• The “Squeeze Page” on this book’s web site provides a Java applet for adaptive
Huffman coding.
35 04/14/2024
Cont. …
Exercise-1
36 04/14/2024
Cont. …
Exercise-1
(a)What is the entropy η of the image below, where numbers (0, 20, 50, 99) denote the
gray-level intensities?
(b)Show how to construct the Huffman tree to encode the above four intensity values in
this image. Show the resulting code for each intensity value.
(c)What is the average number of bits needed for each pixel using your Huffman code?
How does it compare to η?
37 04/14/2024
Cont. …
η = 1.75
38 04/14/2024
Cont. …
Exercise
What are the advantages of Adaptive Huffman Coding compared
to the original Huffman Coding algorithm?
Like any other adaptive compression algorithms,
1. It is more dynamic, therefore offers better compression
2. It works even when prior statistics of the data distribution is unavailable as it is in
most multimedia applications.
3. It also saves the overhead since no symbol table needs to be transmitted.
39 04/14/2024
Dictionary Encoding
Dictionary coding uses groups of symbols, words, and phrases with corresponding
abbreviation.
It transmits the index of the symbol/word instead of the word itself.
LZW (Lempel-Ziv-Welch)
LZW Compression
ALGORITHM – LZW Compression
BEGIN
s = next input character;
while not EOF{
c = next input character;
if s + c exists in the dictionary
s = s + c;
else{
output the code for s;
add string s + c to the dictionary with a new code;
s = c;
}}
output the code for s;
END
40 04/14/2024
Cont. …
Example LZW compression for string “ABABBABCABABBA”
Let’s start with a very simple dictionary (also referred to as a “string table”), initially
containing only 3 characters, with codes as follows:
Code String
1 A
2 B
3 C
Now if the input string is “ABABBABCABABBA” the ABABBABCABABBA ,
LZW compression algorithm works as follows:
41 04/14/2024
Cont. …
Now
if the input string is “ABABBABCABABBA” the ABABBABCABABBA ,
LZW compression algorithm works as follows:
42 04/14/2024
The below table shows the difference between lossless and lossy data compression -
43 04/14/2024
Summary
Lossless data compression algorithms are Run Length Encoding, Huffman encoding,
Shannon fano encoding, Arithmetic encoding, Lempel Ziv Welch encoding, etc. Lossy
data compression algorithms are: Transform coding, Discrete Cosine Transform,
Discrete Wavelet Transform, fractal compression, etc.
44 04/14/2024
End of Chapter 4
Next Chapter 5
Lossy Compression Algorithms
45 04/14/2024