0% found this document useful (0 votes)
8 views

Data Compression Basic Concepts of Data Compression Data Compression

Lectures

Uploaded by

1234 Ма
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Data Compression Basic Concepts of Data Compression Data Compression

Lectures

Uploaded by

1234 Ма
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Lecture 4 - Data compression.

Basic
Concepts of Data Compression . Data
compression algorithms.
Review ● Relative entropy
● Mutual information
● Properties of mutual
information
● Relationship between entropy
and mutual information
● Chain rule for
entropy,relative entropy and
mutual information
● Source coding
● Channel coding
Plan ●

Data compression
Basic concepts of data
compression
● Uniquely decodable codes
● Prefix codes
● Algorithm to test uniquely
decodable codes
● Properties of prefix code
● Shannon Fano coding
Why compression?

● Multimedia applications generates a lot of data


○ Need to compress data for efficient storage
○ Need to compress data for efficient transmission.
● Examples of applications that use compression:
○ Video: DVD, video conferencing
○ Image: JPEG
○ Audio: MP3
○ Text: Winzip
Data compression

● Source Encoding or Data Compression: the process of efficiently converting the


output of whether analog or digital source into a sequence of binary digits is
known as source encoding.
● Typically, a device that performs data compression is referred to as an encoder,
and one that performs the reversal of the process (decompression) as a
decoder.
Data compression

● Any particular compression is either lossy or lossless.


○ Lossless compression reduces bits by identifying and eliminating
statistical redundancy. No information is lost in lossless compression.
○ Lossy compression reduces bits by removing unnecessary or less
important information.
Codes

● Alphabet: is a collection of symbols.


● Letters (symbols): is an element of an alphabet.
● Coding: the assignment of binary sequences to elements of an alphabet.
● Code: A set of binary sequences.
● Codewords: Individual members of the set of binary sequences.
Uniquely decodable codes
UDC - sequence of codes
that can be decoded in only
one way
Prefix code

● Prefix of the codeword


means the any sequence
which is the initial part of
the codeword.
● A code is called a prefix
code or an instantaneous
code if no codeword is a
prefix of any other
codeword.
Prefix code example
Prefix and dangling suffix

● Consider two codewords: 011 and 011101


● Prefix: 011
● Dangling suffix: 101
Algorithm to test uniquely decodable codes

1. Group the codewords


2. If no prefix found -> then it is UDC

Else if prefix found -> then add the dangling suffix

3. If DS = some codeword -> then it is not UDC

Else if DS are all over -> then it is UD


Example

Consider {0,01,11}
● Dangling suffix is 1 from 0 and 01
● New list: {0,01,11,1}
● Dangling suffix is 1 (from 0 and 01, and also 1 and 11), and
is already included in previous iteration.
● Since the dangling suffix is not a codeword, {0,01, 11} is
uniquely decodable
Example

Consider {0,01,10}

● Dangling suffix is 1 from 0 and 01


● New list: {0,01,10,1}
● The new dangling suffix is 0 (from 10 and 1).
● Since the dangling suffix 0 is a codeword, {0,01, 10} is not uniquely
decodable.
Properties of prefix code

1. Prefix code satisfies the Kraft-McMillan inequality

Here l_k is the codeword length of the k-th symbol s_k. And K is the total number of symbols in
alphabet.
Properties of prefix code

2.
Shannon Fano coding

An efficient code can be obtained by the following simple procedure,


known as Shannon Fano algorithm:

1. List the source symbols in order of decreasing probability.

2. Partition the set into two sets that are as close to equiprobables
as possible, and assign 0 to the upper set 1 to the lower set.

3. Continue this process, time partitioning the sets with as nearly


equal probabilities as possible until further partitioning is not
possible.
Shannon-Fano coding example
Code efficiency

You might also like