100% found this document useful (2 votes)

6K views

Data Compression Techniques

The document discusses data compression techniques including lossless compression where the original files can be perfectly retrieved (e.g. zip files) and lossy compression where files can be approximately retrieved (e.g. mp3 files). It aims to save storage space and bandwidth. Key concepts covered include defining codecs, remarks on compressing relevant vs random data, the Kraft-McMillan inequality for prefix codes, entropy as a measure of uncertainty, and Shannon's theorem relating expected code length to entropy.

Uploaded by

Abhishek kumar singh

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

6K views

Data Compression Techniques

Uploaded by

Abhishek kumar singh

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Compression

Techniques

By…

Sukanta behera
Reg. No. 07SBSCA048
Data Compression
Lossless data compression:
Store/Transmit big files using few bytes so
that the original files can be perfectly
retrieved. Example: zip.

Loosely data compression:

Store/Transmit big files using few bytes so
that the original files can be approximately
retrieved. Example: mp3.

Motivation: Save storage space and/or

bandwidth.
Definition of Codec
Let Σ be an alphabet and let S µ Σ*
be a set of possible messages.

 A lossless codec (c,d) consists of

A coder c : S ! {0,1}*
A decoder d: {0,1}* ! Σ*
so that
8 x 2 S: d(c(x))=x
Remarks
 It is necessary for c to be an injective map.

 If we do not worry about efficiency, we don’t have

to specify d if we have specified c.

 Terminology: Some times we just say “code”

rather than “codec”.

Terminology: The set c(S) is called the set of

code words of the codec. In examples to follow,
we often just state the set of code words.
Proposition
 Let S = {0,1}n. Then, for any codec
(c,d) there is some x 2 S, so that |
c(x)| ¸ n.

 “Compression is impossible”
Proposition
 For any message x, there is a codec
(c,d) so that |c(x)|=1.

“The Encyclopedia Britannica can be

compressed to 1 bit”.
Remarks
We cannot compress all data. Thus, we must
concentrate on compressing “relevant” data.

It is trivial to compress data known in advance.

We should concentrate on compressing data about
which there is uncertainty.

We will use probability theory as a tool to model

uncertainty about relevant data.
Can random data be
compressed?
 Suppose Σ = {0,1} and S = {0,1}2.

We know we cannot compress all data, but

can we do well on the average?

Let us assume the uniform distribution on

S and look at the expected length of the
code words.
Definition of prefix codes
 A prefix code c is a code with the property that
for all different messages x and y, c(x) is not a
prefix of c(y).

 Example: Fixed length codes (such as ascii).

 Example: {0,11,10}

 All codes in this course will be prefix codes.

Proposition
 If c is a prefix code for S = Σ1 then cn
is a prefix code for S = Σn where

cn(x1 x2 .. xn) = c(x1)¢ c(x2) ….¢

c(xn)
Prefix codes and trees
 Set of code words of a prefix code:
{0,11,10}.

0 1

0 1
Alternative view of prefix
codes
 A prefix code is an assignment of the
messages of S to the leaves of a
rooted binary tree.

 The codeword of a message x is

found by reading the labels on the
edges on the path from the root of the
tree to the leaf corresponding to x.
Binary trees and the interval
[0,1)
0 1

[0,1/2) 0 1

[1/2,3/4) [3/4,1)

0 1/4 1/2 3/4 1

Alternative view of prefix
codes
 A prefix code is an assignment of the
messages of S to disjoint dyadic
intervals.

 A dyadic interval is a real interval of

the form [ k 2- m, (k+1) 2- m ) with
k+1 · 2m. The corresponding code
word is the m-bit binary
representation of k.
Kraft-McMillan Inequality
 Let m1, m2, … be the lengths of the
code words of a prefix code. Then, ∑ 2-
mi
· 1.

 Let m1, m2, … be integers with ∑ 2- mi

· 1. Then there is prefix code c so that
{mi} are the lengths of the code words
of c.
Probability
A probability distribution p on S is a


map p: S ! [0,1] so that ∑x 2 S p(x) = 1.

A U-valued stochastic variable is a map

Y: S ! U.

If Y: S ! R is a stochastic variable, its

expected value E[Y] is ∑x 2 S p(x) Y(x).
Self-entropy
 Given a probability distribution p on S, the self-entropy of
x 2 S is the defined as
H(x) = – log2 p(x).

 The self-entropy of a message with probability 1 is 0 bits

 The self-entropy of a message with probability 0 is +1.

 The self-entropy of a message with probability ½ is 1 bit

 We often measure entropy is unit “bits”

Entropy
Given a probability distribution p on S, its
entropy H[p] is defined as E[H], i.e.
H[p] = – ∑x 2 S p(x) log2 p(x).

For a stochastic variable X, its entropy H[X]

is the entropy of its underlying distribution:
H[X] = – ∑i Pr[X=i] log2 Pr[X=i]
Facts
The entropy of the uniform distribution on
{0,1}n is n bits. Any other distribution on
{0,1}n has strictly smaller entropy.

If X1 and X2 are independent stochastic

variables, then H(X1, X2) = H(X1) + H(X2).

 For any function f, H(f(X)) · H(X).

Shannon’s theorem
Let S be a set of messages and let X be an S-
valued stochastic variable.

 For all prefix codes c on S,

E[ |c(X)| ] ¸ H[X].

 There is a prefix code c on S so that

E[ |c(X)| ] < H[X] + 1
In fact, for all x in S, |c(x)| < H[x] + 1.

Fritz Machiup & Una Mansfield - The Study of Information
No ratings yet
Fritz Machiup & Una Mansfield - The Study of Information
776 pages
Lossless Data Compression
No ratings yet
Lossless Data Compression
24 pages
Ex 06 e
No ratings yet
Ex 06 e
7 pages
Ecs452 2
No ratings yet
Ecs452 2
16 pages
Ees452 2021 2.1
No ratings yet
Ees452 2021 2.1
9 pages
Untitled
No ratings yet
Untitled
4 pages
Lec 2
No ratings yet
Lec 2
17 pages
Source Coding Ompression
No ratings yet
Source Coding Ompression
34 pages
Report Lucas Slot Sebastian Zur
No ratings yet
Report Lucas Slot Sebastian Zur
13 pages
28 Entropy III Shannon
No ratings yet
28 Entropy III Shannon
7 pages
Chord 2 Vec
No ratings yet
Chord 2 Vec
5 pages
ECE 154C Homework Assignment #2: Reading Assignment: Recommended: Programming Assignment
No ratings yet
ECE 154C Homework Assignment #2: Reading Assignment: Recommended: Programming Assignment
8 pages
Entropy 2
No ratings yet
Entropy 2
3 pages
Uniquely Decodable Codes
No ratings yet
Uniquely Decodable Codes
10 pages
Reduction and Efficient Solution of MILP Models of Mixed Hamming Packings Yielding Improved Upper Bounds
No ratings yet
Reduction and Efficient Solution of MILP Models of Mixed Hamming Packings Yielding Improved Upper Bounds
8 pages
Notes
No ratings yet
Notes
32 pages
MATH 291T Coding Theory: California State University, Fresno
No ratings yet
MATH 291T Coding Theory: California State University, Fresno
74 pages
Data Compression Arithmetic Coding
No ratings yet
Data Compression Arithmetic Coding
38 pages
CS174: Note11
No ratings yet
CS174: Note11
6 pages
Lecture14
No ratings yet
Lecture14
6 pages
Information Theory and Coding 2marks
No ratings yet
Information Theory and Coding 2marks
12 pages
Shannon's Theory of Secure Communication: CSG 252 Fall 2006 Riccardo Pucella
No ratings yet
Shannon's Theory of Secure Communication: CSG 252 Fall 2006 Riccardo Pucella
23 pages
EEE 431 - Homework 2 - Final
No ratings yet
EEE 431 - Homework 2 - Final
3 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
32 pages
Polynomial Codes and BCH Codes: Alvin Dizon, Harold Jeff Espineda, Joseph Jimenez
No ratings yet
Polynomial Codes and BCH Codes: Alvin Dizon, Harold Jeff Espineda, Joseph Jimenez
8 pages
Introduction To Coding Theory: Basic Codes and Shannon'S Theorem
No ratings yet
Introduction To Coding Theory: Basic Codes and Shannon'S Theorem
7 pages
lec04
No ratings yet
lec04
4 pages
tut2425_1
No ratings yet
tut2425_1
2 pages
Sophistication As Randomness Deficiency
No ratings yet
Sophistication As Randomness Deficiency
10 pages
5 Data Compression
No ratings yet
5 Data Compression
6 pages
Input Source Encoder Channel Encoder Binary Interface
No ratings yet
Input Source Encoder Channel Encoder Binary Interface
29 pages
Information Theory Lecture Notes
No ratings yet
Information Theory Lecture Notes
37 pages
15ec54 PDF
No ratings yet
15ec54 PDF
56 pages
Information Theory
No ratings yet
Information Theory
108 pages
Chap 3 Capacity of AWGN Channels
No ratings yet
Chap 3 Capacity of AWGN Channels
12 pages
Coding
No ratings yet
Coding
61 pages
DC Assign
No ratings yet
DC Assign
13 pages
03-practice-problems
No ratings yet
03-practice-problems
2 pages
A SAT Attack On The Erd Os Discrepancy Conjecture: N D 2d 3d KD K I 1 Id
No ratings yet
A SAT Attack On The Erd Os Discrepancy Conjecture: N D 2d 3d KD K I 1 Id
8 pages
E2 201: Information Theory (2019) Homework 4: Instructor: Himanshu Tyagi
No ratings yet
E2 201: Information Theory (2019) Homework 4: Instructor: Himanshu Tyagi
3 pages
Final Practice
No ratings yet
Final Practice
12 pages
Coding Theory and Its Applications
100% (1)
Coding Theory and Its Applications
45 pages
Lec40 - 210102096 - VEDIKA GARG
No ratings yet
Lec40 - 210102096 - VEDIKA GARG
5 pages
HW3
No ratings yet
HW3
4 pages
Moore-Smith Convergence PDF
No ratings yet
Moore-Smith Convergence PDF
4 pages
a2
No ratings yet
a2
2 pages
Ordered Trees: 4.1 Uniquely Decipherable Codes
No ratings yet
Ordered Trees: 4.1 Uniquely Decipherable Codes
13 pages
CSC474/574 - Information Systems Security: Homework1 Solutions Sketch
No ratings yet
CSC474/574 - Information Systems Security: Homework1 Solutions Sketch
5 pages
ECCWorksheet2SelectedSolutions
No ratings yet
ECCWorksheet2SelectedSolutions
4 pages
(Paper) The Discrete Noiseless Channel Revisited
No ratings yet
(Paper) The Discrete Noiseless Channel Revisited
16 pages
Channel Coding Theorem
No ratings yet
Channel Coding Theorem
23 pages
Lecture 3: Linearity Testing
No ratings yet
Lecture 3: Linearity Testing
7 pages
E2 201: Information Theory (2018) Homework 1: Instructor: Himanshu Tyagi
No ratings yet
E2 201: Information Theory (2018) Homework 1: Instructor: Himanshu Tyagi
2 pages
Digital Code
No ratings yet
Digital Code
30 pages
InfTh Vorl e
No ratings yet
InfTh Vorl e
96 pages
Signal Coding Estimation Theory 2011
No ratings yet
Signal Coding Estimation Theory 2011
7 pages
Channel Capacity: 1 Preliminaries and Definitions
No ratings yet
Channel Capacity: 1 Preliminaries and Definitions
5 pages
Complexity Theory and Cryptography
No ratings yet
Complexity Theory and Cryptography
7 pages
COL226 12 Variables, Sigma Algebras, Homomorphic Extension Theorem
No ratings yet
COL226 12 Variables, Sigma Algebras, Homomorphic Extension Theorem
6 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
HTML
100% (8)
HTML
75 pages
Sliding Window Protocols
100% (4)
Sliding Window Protocols
9 pages
DNS and Email
100% (1)
DNS and Email
23 pages
Ham 3
No ratings yet
Ham 3
12 pages
Construction of Nfa and Dfa From R
100% (2)
Construction of Nfa and Dfa From R
15 pages
Abhishek Kumar Singh 07RWSCA001
100% (1)
Abhishek Kumar Singh 07RWSCA001
9 pages
Wars Hall's and Floyd's Algorithm
100% (2)
Wars Hall's and Floyd's Algorithm
14 pages
Oracle9i Database Summary
No ratings yet
Oracle9i Database Summary
35 pages
EES452 - Syllabus
No ratings yet
EES452 - Syllabus
4 pages
Source Coding of Discrete Sources: 1-The Average Code Length L Must Be As Minimum As Possible. This Average Length Is
No ratings yet
Source Coding of Discrete Sources: 1-The Average Code Length L Must Be As Minimum As Possible. This Average Length Is
17 pages
Digital Transmission Techniques
No ratings yet
Digital Transmission Techniques
85 pages
BPJ Lesson 13
No ratings yet
BPJ Lesson 13
3 pages
The Data Link Layer
No ratings yet
The Data Link Layer
53 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
Computer Networks Lab: Delhi Technological University
No ratings yet
Computer Networks Lab: Delhi Technological University
5 pages
A Comparitive Study of Text Compression Algorithms PDF
No ratings yet
A Comparitive Study of Text Compression Algorithms PDF
9 pages
Ca529 Cns-Module 3
No ratings yet
Ca529 Cns-Module 3
51 pages
Untitled
No ratings yet
Untitled
519 pages
Error Correction
No ratings yet
Error Correction
15 pages
Jadwal Shalat Berjamaah Siswa
No ratings yet
Jadwal Shalat Berjamaah Siswa
13 pages
Vi Dieu Khien
No ratings yet
Vi Dieu Khien
81 pages
Module - 5 Image Data Compression
No ratings yet
Module - 5 Image Data Compression
70 pages
Wiley Encyclopedia of Telecommunications Vol II.
No ratings yet
Wiley Encyclopedia of Telecommunications Vol II.
596 pages
HMAC Generator - Online Hash Encryption
No ratings yet
HMAC Generator - Online Hash Encryption
1 page
Mathematics of Life
No ratings yet
Mathematics of Life
2 pages
Multimedia System: Chapter Eight: Multimedia Data Compression
No ratings yet
Multimedia System: Chapter Eight: Multimedia Data Compression
29 pages
A Tutorial On LDPC
No ratings yet
A Tutorial On LDPC
15 pages
A Survey On FEC Codes For 100G and Beyon
No ratings yet
A Survey On FEC Codes For 100G and Beyon
15 pages
Image Compression: Presented by Nermine Salama & Mohamed Hagras
No ratings yet
Image Compression: Presented by Nermine Salama & Mohamed Hagras
24 pages
4 LZW
No ratings yet
4 LZW
7 pages
Data Compression
No ratings yet
Data Compression
25 pages
Multimedia Data Compression: 1 IT 3rd Year, Compiled by Waggari M
No ratings yet
Multimedia Data Compression: 1 IT 3rd Year, Compiled by Waggari M
29 pages
AMEEncoding Error Log
No ratings yet
AMEEncoding Error Log
14 pages
HARQ Process
No ratings yet
HARQ Process
6 pages
Quantum Computation and Quantum Information: Michael A. Nielsen & Isaac L. Chuang
No ratings yet
Quantum Computation and Quantum Information: Michael A. Nielsen & Isaac L. Chuang
8 pages
Data Comm Assisgnment
No ratings yet
Data Comm Assisgnment
6 pages
Compare Blocks - Results
No ratings yet
Compare Blocks - Results
19 pages

Data Compression Techniques

Uploaded by

Data Compression Techniques

Uploaded by

Data Compression

Loosely data compression:

Motivation: Save storage space and/or

 A lossless codec (c,d) consists of

 If we do not worry about efficiency, we don’t have

 Terminology: Some times we just say “code”

Terminology: The set c(S) is called the set of

“The Encyclopedia Britannica can be

It is trivial to compress data known in advance.

We will use probability theory as a tool to model

We know we cannot compress all data, but

Let us assume the uniform distribution on

 Example: Fixed length codes (such as ascii).

 All codes in this course will be prefix codes.

cn(x1 x2 .. xn) = c(x1)¢ c(x2) ….¢

 The codeword of a message x is

0 1/4 1/2 3/4 1

 A dyadic interval is a real interval of

 Let m1, m2, … be integers with ∑ 2- mi

map p: S ! [0,1] so that ∑x 2 S p(x) = 1.

A U-valued stochastic variable is a map

If Y: S ! R is a stochastic variable, its

 The self-entropy of a message with probability 1 is 0 bits

 The self-entropy of a message with probability 0 is +1.

 The self-entropy of a message with probability ½ is 1 bit

 We often measure entropy is unit “bits”

For a stochastic variable X, its entropy H[X]

If X1 and X2 are independent stochastic

 For any function f, H(f(X)) · H(X).

 For all prefix codes c on S,

 There is a prefix code c on S so that

You might also like