0% found this document useful (0 votes)
2 views

M2_prefixCode

The document outlines a syllabus for a course on data compression, focusing on statistical methods and prefix codes, including techniques like Huffman coding and arithmetic coding. It details course outcomes aimed at understanding and applying data compression principles, as well as exercises to reinforce learning. Additionally, it discusses the characteristics of prefix codes and their importance in coding theory.

Uploaded by

jamiemathew1303
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

M2_prefixCode

The document outlines a syllabus for a course on data compression, focusing on statistical methods and prefix codes, including techniques like Huffman coding and arithmetic coding. It details course outcomes aimed at understanding and applying data compression principles, as well as exercises to reinforce learning. Additionally, it discusses the characteristics of prefix codes and their importance in coding theory.

Uploaded by

jamiemathew1303
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Prefix Codes

Neena Raj N. R.

Department of Computer Science and Engineering


Mar Baselios College of Engineering and Technology, Nalanchira

January 2024
Syllabus

Module 2
Run length encoding, RLE Text compression, Statistical methods-Prefix
Codes, Binary Huffman coding, Illustration of Binary Huffman coding,
Non-binary Huffman Algorithms, Arithmetic Coding algorithm, Illustration of
Arithmetic Coding algorithm,

Neena Raj N. R. CS1U43D DCT January 2024 2 / 44


Course Outcomes

Course Outcomes
CO1 Describe the fundamental principles of data Understand
compression.
CO2 Apply
Make use of statistical and dictionary based
compression techniques for various applications
CO3 Illustrate various image compression standards. Apply
CO4 Summarize video compression mechanisms to re- Understand
duce the redundancy in video.
CO5 Use the fundamental properties of digital audio Understand
to compress audio data.

Neena Raj N. R. CS1U43D DCT January 2024 3 / 44


Coding

Coding

Coding:Assignment of binary sequences to elements of an alphabet.


Code : The set of binary sequences.
Codewords : The individual members of the set.
For example, the alphabet used in writing most books consists of the
26 lowercase letters, 26 uppercase letters, and a variety of
punctuation marks. The ASCII code for the letter ”a” is 1000011, the
letter ”A” is coded as 1000001, and the letter “,” is coded as
0011010. Notice that the ASCII code uses the same number of bits
to represent each symbol. Such a code is called a fixed-length code.

Neena Raj N. R. CS1U43D DCT January 2024 4 / 44


Coding

Coding

If we want to reduce the number of bits required to represent


different messages, we need to use a different number of bits to
represent different symbols.
If we use fewer bits to represent symbols that occur more often, on
the average we would use fewer bits per symbol.
The average number of bits per symbol is often called the rate of the
code.

Neena Raj N. R. CS1U43D DCT January 2024 5 / 44


Coding

Coding

Neena Raj N. R. CS1U43D DCT January 2024 6 / 44


Coding

Coding

The average length of the code is not the only important point in
designing a “good” code.
The selection of codewords for different messages or symbols xi , is
done according to the following two considerations:
1 The average code length L must be as minimum as possible.
L = Σni=1 li p(xi ) bits/message or bits/symbol
where l is the length of the codeword for message or symbol xi (li is in
bits for binary coding, or in digits for nonbinary coding).
2 The codewords at the receiver must be uniquely decodable. Uniquely
decodable means that any codeword for any symbol( or message) must
not be the beginning from the left for any other codeword of higher
length.

Neena Raj N. R. CS1U43D DCT January 2024 7 / 44


Coding

Coding

Neena Raj N. R. CS1U43D DCT January 2024 8 / 44


Coding

Coding

Neena Raj N. R. CS1U43D DCT January 2024 9 / 44


StatisticalMethods

Statistical Methods

Statistical methods use variable-size codes, with the shorter codes


assigned to symbols or groups of symbols that appear more often in
the data (have a higher probability of occurrence).

Table. Example of Variable-Size Codes

Neena Raj N. R. CS1U43D DCT January 2024 10 / 44


StatisticalMethods

Statistical Methods

Designers and implementors of variable-size codes have to deal with


the two problems of
1 assigning codes that can be decoded unambiguously
2 assigning codes with the minimum average size.
Examples of statistical algorithms : Huffman coding, arithmetic
coding

Neena Raj N. R. CS1U43D DCT January 2024 11 / 44


StatisticalMethods

Statistical Methods
Consider the following variable size code

Assume a 20-symbol bit string


a1 a3 a2 a1 a3 a3 a4 a2 a1 a1 a2 a2 a1 a1 a3 a1 a1 a2 a3 a1
1 What is the average size when encoded using code1?
2 Find the encoded data using code1.
3 Find the decoded data using code1.
4 What is the average size when encoded using code2?
5 Find the encoded data using code2.
6 Find the decoded data using code2.

Neena Raj N. R. CS1U43D DCT January 2024 12 / 44


Exercises

Exercises

1 Consider the four different codes for four letter alphabet as shown in
table

1 Find the entropy of the source. Ans:1.75 bits/Symbol

Neena Raj N. R. CS1U43D DCT January 2024 13 / 44


Exercises

Exercises

2 Find the average length for each code.


Ans:

Neena Raj N. R. CS1U43D DCT January 2024 14 / 44


Exercises

Exercises

3 Check the codes that are uniquely decodable.


Solution:

Neena Raj N. R. CS1U43D DCT January 2024 15 / 44


Exercises

Exercises

Neena Raj N. R. CS1U43D DCT January 2024 16 / 44


Exercises

Exercises

2 Check whether code5 is uniquely decodable.

Neena Raj N. R. CS1U43D DCT January 2024 17 / 44


Exercises

Exercises
Solution:
Prefix and Dangling suffix:
Suppose we have two binary codewords a and b, where a is k bits long,
b is n bits long, and k < n. If the first k bits of b are identical to a,
then a is called a prefix of b. The last n - k bits of b are called the
dangling suffix.
For example, if a = 010 and b = 01011, then a is a prefix of b and the
dangling suffix is 11.
Test for Uniquely Decodable Codes
Construct a list of all the codewords.
Examine all pairs of codewords to see if any codeword is a prefix of
another codeword.

Neena Raj N. R. CS1U43D DCT January 2024 18 / 44


Exercises

Exercises

Whenever you find such a pair, add the dangling suffix to the list unless
you have added the same dangling suffix to the list in a previous
iteration.
Now repeat the procedure using this larger list. Continue in this
fashion until one of the following two things happens:
1 You get a dangling suffix that is a codeword.
2 There are no more unique dangling suffixes.
If you get the first outcome, the code is not uniquely decodable.
However, if you get the second outcome, the code is uniquely
decodable.

Neena Raj N. R. CS1U43D DCT January 2024 19 / 44


Exercises

Exercises

Neena Raj N. R. CS1U43D DCT January 2024 20 / 44


Exercises

Exercises

Check whether code6 is uniquely decodable.

Neena Raj N. R. CS1U43D DCT January 2024 21 / 44


Exercises

Exercises

Neena Raj N. R. CS1U43D DCT January 2024 22 / 44


Prefix Codes

Prefix Codes

A prefix code is a variable-size code that satisfies the prefix property.


This property requires that once a certain bit pattern has been
assigned as the code of a symbol, no other codes should start with
that pattern (the pattern cannot be the prefix of any other code). ie,
A code in which no codeword is a prefix to another codeword is called
a prefix code.
The binary representation of the integers does not satisfy the prefix
property.
Another disadvantage of this representation is that the size n of the
set of integers has to be known in advance, since it determines the
code size, which is (1 + log2 n).

Neena Raj N. R. CS1U43D DCT January 2024 23 / 44


Prefix Codes

Prefix Codes

In some applications, a prefix code is required to code a set of


integers whose size is not known in advance.

Neena Raj N. R. CS1U43D DCT January 2024 24 / 44


Prefix Codes

Prefix Codes

A simple way to check if a code is a prefix code is to draw the rooted


binary tree corresponding to the code.
Draw a tree that starts from a single node (the root node) and has a
maximum of two possible branches at each node.
One of these branches corresponds to a 1 and the other branch
corresponds to a 0.

Neena Raj N. R. CS1U43D DCT January 2024 25 / 44


Prefix Codes

Prefix Codes

Note that apart from the root node, the trees have two kinds of
nodes—nodes that give rise to other nodes and nodes that do not.
The first kind of nodes are called internal nodes, and the second kind
are called external nodes or leaves.
In a prefix code, the codewords are only associated with the external
nodes.
A code that is not a prefix code will have codewords associated with
internal nodes.

Neena Raj N. R. CS1U43D DCT January 2024 26 / 44


Prefix Codes

Prefix Codes

code 1 and code 4 aren’t prefix code.

Neena Raj N. R. CS1U43D DCT January 2024 27 / 44


Prefix Codes

Prefix Codes

The code for any symbol can be obtained by traversing the tree from
the root to the external node corresponding to that symbol.
Each branch on the way contributes a bit to the codeword: a 0 for
each left branch and a 1 for each right branch.

Neena Raj N. R. CS1U43D DCT January 2024 28 / 44


Prefix Codes

Prefix Codes

Neena Raj N. R. CS1U43D DCT January 2024 29 / 44


Prefix Codes

Prefix Codes

Neena Raj N. R. CS1U43D DCT January 2024 30 / 44


Prefix Codes

Prefix Codes

Neena Raj N. R. CS1U43D DCT January 2024 31 / 44


Prefix Codes

Prefix Codes

Neena Raj N. R. CS1U43D DCT January 2024 32 / 44


Prefix Codes

Prefix Codes

Neena Raj N. R. CS1U43D DCT January 2024 33 / 44


Prefix Codes The Unary Code

The Unary Code

The unary code of the nonnegative integer n is defined as n - 1 ones


followed by a single 0 or, alternatively, as n - 1 zeros followed by a
single one.

Table. Some Unary Codes.


The length of the unary code for the integer n is thus n bits.

Neena Raj N. R. CS1U43D DCT January 2024 34 / 44


Prefix Codes The Unary Code

The Unary Code

Stone-age people indicated the integer n by marking n adjacent


vertical bars on a stone, so the unary code is sometimes called a
stone-age binary and each of its n - 1 ones is called a stone-age bit.

Neena Raj N. R. CS1U43D DCT January 2024 35 / 44


Prefix Codes The Unary Code

The Unary Code

It is also possible to define general unary codes, also known as


start-step-stop codes. Such a code depends on a triplet (start, step,
stop) of integer parameters and is defined as follows:
Codewords are created to code symbols used in the data, such that the
nth codeword consists of n ones, followed by one 0, followed by all the
combinations of a bits where a = start + n × step. If a = stop, then
the single 0 preceding the a bits is dropped.
The number of codes for a given triplet is finite and depends on the
choice of parameters.

Neena Raj N. R. CS1U43D DCT January 2024 36 / 44


Prefix Codes The Unary Code

The Unary Code

Table. The General Unary Code (3,2,9).

Neena Raj N. R. CS1U43D DCT January 2024 37 / 44


Prefix Codes The Unary Code

The Unary Code

Table. The General Unary Code (2,1,10).


The number of different general unary codes is

Neena Raj N. R. CS1U43D DCT January 2024 38 / 44


Prefix Codes The Unary Code

The Unary Code

Notice that this expression increases exponentially with parameter


“stop,” so large sets of these codes can be generated with small
values of the three parameters.
Some Examples:
1 The triplet (n, 1, n) defines the standard n-bit binary codes, as can be
verified by direct construction. The number of such codes is easily seen
n+1
−2n
to be 2 21 −1 = 2n
2 The triplet (0, 0, ∞) defines the codes 0, 10, 110, 1110,. . . which are
the unary codes but assigned to the integers 0, 1, 2,. . . instead of 1,
2, 3,. . . .
31 1
3 The triplet (1, 1, 30) produces 221−2−1 ≈ A billion codes.

Neena Raj N. R. CS1U43D DCT January 2024 39 / 44


Prefix Codes Other Prefix Codes

Other Prefix Codes

Selfstudy : Any four prefix codes other than unicode. [Reference : D.


Solomon, Data compression: the complete reference. 2007.]

Neena Raj N. R. CS1U43D DCT January 2024 40 / 44


Prefix Codes Other Prefix Codes

Other Prefix Codes

Neena Raj N. R. CS1U43D DCT January 2024 41 / 44


Syllabus

Module 2
Run length encoding, RLE Text compression, Statistical methods-Prefix
Codes, Binary Huffman coding, Illustration of Binary Huffman coding,
Non-binary Huffman Algorithms, Arithmetic Coding algorithm, Illustration of
Arithmetic Coding algorithm,

Neena Raj N. R. CS1U43D DCT January 2024 42 / 44


Course Outcomes

Course Outcomes
CO1 Describe the fundamental principles of data Understand
compression.
CO2 Apply
Make use of statistical and dictionary based
compression techniques for various applications
CO3 Illustrate various image compression standards. Apply
CO4 Summarize video compression mechanisms to re- Understand
duce the redundancy in video.
CO5 Use the fundamental properties of digital audio Understand
to compress audio data.

Neena Raj N. R. CS1U43D DCT January 2024 43 / 44


References

References

[1] K. Sayood, Introduction to data compression. Morgan Kaufmann,


2003.
[2] D. Solomon, Data compression: the complete reference. Springer,
2007.

Neena Raj N. R. CS1U43D DCT January 2024 44 / 44

You might also like