0% found this document useful (0 votes)

24 views

Lecture 2-Print

The document discusses data compression techniques including modeling data to extract redundancies, coding the residual differences, and estimating source entropy based on symbol probabilities and block sizes. It provides examples of how predictive coding and analyzing data in blocks rather than individually can reveal structure and reduce the estimated source entropy for more efficient compression. The document also discusses designing optimal prefix codes to minimize bit rates for encoding symbol strings.

Uploaded by

raj singh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Lecture 2-Print

Uploaded by

raj singh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Data Compression

Lecture 2
Exercise Encode/Decode
0 1
a 0 1

d
0 1

b c

• Player 1: Encode a symbol string

• Player 2: Decode the string
• Check for equality

2
How Good is the Code
0 1
c
0 1 5/
a b
1/ 1/ 8
8 4
bit rate = (1/8)2 + (1/4)2 + (5/8)1 = 11/8 = 1.375
bps Entropy = 1.3 bps
Standard code = 2 bps

(bps = bits per symbol)

3
Modeling and Coding
• Development of data compression algorithms
for a variety of data can be divided into two
phases.
– Modeling: try to extract information about any
redundancy that exists in the data and describe the
redundancy in the form of a model
– Coding: Representation of Data
• The difference between the data and the
model is often referred to as the residual
Example 1

• Consider the following

sequence of numbers
x1x2x3:
• If we were to transmit
or store the binary
representations of
these numbers, we
would need to use 5
bits per sample.
• Model:

11-Jan-24 5
• The residual sequence consists of only three
numbers −1 0 1. If we assign a code of 00 to
−1, a code of 01 to 0, and a code of 10 to 1,
we need to use 2 bits to represent each
element of the residual sequence.
• Therefore, we can obtain compression by
transmitting or storing the parameters of the
model and the residual sequence.
11-Jan-24 6
Example 2

11-Jan-24 7
• The decoder adds each received value to the
previous decoded value to obtain the
reconstruction corresponding to the received
value.
• Techniques that use the past values of a
sequence to predict the current value and
then encode the error in prediction, or
residual, are called predictive coding schemes.

11-Jan-24 8
Example 3

• In order to represent eight symbols, we need to

use 3 bits per symbol

11-Jan-24 9
Self Information

• Shannon defined a quantity called self-

information.
• Suppose we have an event A, which is a set of
outcomes of some random experiment. If PA
is the probability that the event A will occur,
then the self-information associated with A is
given by
• Log(1) = 0, and −log(x) increases as x decreases from one to zero.
Therefore, if the probability of an event is low, the amount of self-
information associated with it is high; if the probability of an event
is high, the information associated with it is low.

11-Jan-24 10
Definitions
• Identically distributed (iid) means that there are
no overall trends–the distribution does not
fluctuate and all items in the sample are taken
from the same probability distribution.
• Independent means that the sample items are all
independent events. In other words, they are not
connected to each other in any way; knowledge
of the value of one variable gives no information
about the value of the other and vice versa.

11-Jan-24 11
Example
• it is not possible to know the entropy for a
physical source, so we have to estimate the
entropy.
• Consider the following sequence:
1 2 3 2 3 4 5 4 5 6 7 8 9 8 9 10
• Assuming the frequency of occurrence of each
number is reflected accurately in the number
of times it appears in the sequence

11-Jan-24 12
• Assuming the sequence is iid, entropy can then be calculated as

• With our stated assumptions, the entropy for this source is 325
bits. This means that the best scheme we could find for coding this
sequence could only code it at 3.25 bits/sample.
• However, if we assume that there was sample-to-sample
correlation between the samples and we remove the correlation by
,
taking differences of neighboring sample values we arrive at the
residual sequence
1 1 1−1 1 1 1 −1 1 1 1 1 1 −1 1 1
• This sequence is constructed using only two values with
probabilities P(1) = 13 /16 and P(−1) = 3/ 16 . The entropy in this
case is 0.70 bits per symbol .
13
11-Jan-24
Contd…
• Of course, knowing only this sequence would not be enough for
the receiver to reconstruct the original sequence.
• The receiver must also know the process by which this sequence
was generated from the original sequence.
• The process depends on our assumptions about the structure of
the sequence.
• These assumptions are called the model for the sequence. In this
case, the model for the sequence is

• This model is called a static model because its parameters do not

change with n.
• A model whose parameters change or adapt with n to the changing
characteristics of the data is called an adaptive model.

14
11-Jan-24
• The entropy of the source is a measure of the amount of information
generated by the source.
• Basically, we see that knowing something about the structure of the
data can help to “reduce the entropy.”
• As long as the information generated by the source is preserved (in
whatever representation), the entropy remains the same.
• What we are reducing is our estimate of the entropy.
• The “actual” structure of the data in practice is generally
unknowable, but anything we can learn about the data can help us to
estimate the actual source entropy.
• We accomplish this in our definition of the entropy by picking larger
and larger blocks of data to calculate the probability over, letting the
size of the block go to infinity.

11-Jan-24 15
Example
• Consider the following sequence
12123333123333123312
• if we look at it one symbol at a time, the
structure is difficult to extract. Consider the
probabilities: P(1) = P(2) = 1/4, and P(3) = ½
• The entropy is 1.5 bits/symbol.
• This particular sequence consists of 20
symbols; therefore, the total number of bits
required to represent this sequence is 30

11-Jan-24 16
• Now let’s take the same sequence and look at it in
blocks of two.
• Obviously, there are only two symbols, (1 2), and
(3 3).
• The probabilities are P(1 2) = 1/2 , P(3 3) = 1/2 ,
and the entropy is 1 bit/symbol.
• As there are 10 such symbols in the sequence, we
need a total of 10 bits to represent the entire
sequence—a reduction of a factor of three.

11-Jan-24 17
Design a Prefix Code 1
• abracadabra
• Design a prefix code for the 5 symbols
{a,b,r,c,d} which compresses this string the
most.

18
Design a Prefix Code 2
• Suppose we have n symbols each with
probability 1/n. Design a prefix code with
minimum average bit rate.
• Consider n = 2,3,4,5,6 first.

How To Do HyTek MM-ARES 3330.504.02
No ratings yet
How To Do HyTek MM-ARES 3330.504.02
9 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
chap2
No ratings yet
chap2
47 pages
Lossless Math
No ratings yet
Lossless Math
32 pages
Dce Easy Solution
0% (1)
Dce Easy Solution
87 pages
Data Compression: Chapter - 2 Mathematical Preliminaries For Lossless Compression
100% (2)
Data Compression: Chapter - 2 Mathematical Preliminaries For Lossless Compression
26 pages
Source Coding
No ratings yet
Source Coding
29 pages
Chapter 2 - Mathematical Preliminaries For Lossless Compression
No ratings yet
Chapter 2 - Mathematical Preliminaries For Lossless Compression
56 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Data Compression Basics: Discrete Source
No ratings yet
Data Compression Basics: Discrete Source
34 pages
Entropy 3
No ratings yet
Entropy 3
10 pages
Chapter Five Lossless Compression
No ratings yet
Chapter Five Lossless Compression
49 pages
Information Coding Techniques
No ratings yet
Information Coding Techniques
42 pages
3 Information Theory
No ratings yet
3 Information Theory
48 pages
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
No ratings yet
2201.01741v2 - Understanding Entropy Coding With Asymmetric Numeral Systems (ANS) - Statistician Perspective
26 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Lecture 7 Source Coding 2024
No ratings yet
Lecture 7 Source Coding 2024
28 pages
Introduction To Information Theory and Coding
No ratings yet
Introduction To Information Theory and Coding
46 pages
Information Theory: Dr. Muhammad Imran Farid
No ratings yet
Information Theory: Dr. Muhammad Imran Farid
32 pages
Information Theory
No ratings yet
Information Theory
38 pages
Channel Coding Theorem
No ratings yet
Channel Coding Theorem
23 pages
Chapter 2 - Edited
No ratings yet
Chapter 2 - Edited
45 pages
PMIT-6214: Information Coding: Instructor: M. Shamim Kaiser Email: Text Phone: 01511000555
No ratings yet
PMIT-6214: Information Coding: Instructor: M. Shamim Kaiser Email: Text Phone: 01511000555
76 pages
Text Compression
No ratings yet
Text Compression
16 pages
cp467_12_lecture14_compression1
No ratings yet
cp467_12_lecture14_compression1
146 pages
Data Compression
No ratings yet
Data Compression
113 pages
Sayood DataCompression
No ratings yet
Sayood DataCompression
22 pages
Multimedia Systems: Chapter 7: Data Compression
No ratings yet
Multimedia Systems: Chapter 7: Data Compression
25 pages
Data compression
No ratings yet
Data compression
26 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Arithmetic Coding: Implementation Details and Examples
No ratings yet
Arithmetic Coding: Implementation Details and Examples
11 pages
Advanced Multimedia Infrastructure
No ratings yet
Advanced Multimedia Infrastructure
32 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
Lossless Compression: Lesson 1
No ratings yet
Lossless Compression: Lesson 1
10 pages
Materi Source Coding
No ratings yet
Materi Source Coding
39 pages
Group Presentation Digital Communication Systems
No ratings yet
Group Presentation Digital Communication Systems
29 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
CHAPTER 7
No ratings yet
CHAPTER 7
36 pages
09 Basic Compression
No ratings yet
09 Basic Compression
81 pages
Source Coding: Source Encoder Channel Encoder Digital Source Source Entropy Symbols Binary Sequence Modulator
No ratings yet
Source Coding: Source Encoder Channel Encoder Digital Source Source Entropy Symbols Binary Sequence Modulator
18 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
01-Syllabus and Intro
No ratings yet
01-Syllabus and Intro
21 pages
Data Compression: Reference: Proakis Salehi (II Ed.) Cap.4
No ratings yet
Data Compression: Reference: Proakis Salehi (II Ed.) Cap.4
30 pages
Your Dataset is a Multiset And
No ratings yet
Your Dataset is a Multiset And
8 pages
ECEVSP L03 Compression2
No ratings yet
ECEVSP L03 Compression2
40 pages
3 Source Coding
No ratings yet
3 Source Coding
31 pages
Book-Chapter-07 (Lossless Compression Algorithms) Merged
No ratings yet
Book-Chapter-07 (Lossless Compression Algorithms) Merged
25 pages
CE Notes
No ratings yet
CE Notes
32 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Entropy, Coding and Data Compression
No ratings yet
Entropy, Coding and Data Compression
33 pages
Agenda For The Lecture: C Himanshu Tyagi. Feel Free To Use With Acknowledgement
No ratings yet
Agenda For The Lecture: C Himanshu Tyagi. Feel Free To Use With Acknowledgement
7 pages
Data Compression (Pt2)
No ratings yet
Data Compression (Pt2)
22 pages
DC M3
No ratings yet
DC M3
14 pages
Intro To ICT 11
No ratings yet
Intro To ICT 11
31 pages
CH 6
No ratings yet
CH 6
21 pages
Precalculus: A Self-Teaching Guide
From Everand
Precalculus: A Self-Teaching Guide
Steve Slavin
4.5/5 (5)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Paper-1 - 1 - Plato and Aristotle
100% (1)
Paper-1 - 1 - Plato and Aristotle
17 pages
Global Patterns of Organic Carbon Transfer and Accumulation Across The Land-Ocean Continuum Constrained by Radiocarbon Data
No ratings yet
Global Patterns of Organic Carbon Transfer and Accumulation Across The Land-Ocean Continuum Constrained by Radiocarbon Data
17 pages
Math at Top Speed: Exploring and Breaking Myths in The Drag Racing Folklore
No ratings yet
Math at Top Speed: Exploring and Breaking Myths in The Drag Racing Folklore
138 pages
REFERENCES
No ratings yet
REFERENCES
8 pages
Guideline On Reuse of Existing Piles
100% (1)
Guideline On Reuse of Existing Piles
9 pages
Wireless Discrete Time Receivers Massoud Tohidian - The ebook with all chapters is available with just one click
100% (1)
Wireless Discrete Time Receivers Massoud Tohidian - The ebook with all chapters is available with just one click
68 pages
(Ebook) Two Oceans: A guide to the marine life of southern Africa by George Branch, Charles Griffiths, Margo Branch, Lynnath Beckley ISBN 9781775842750, 1775842754 2024 Scribd Download
100% (7)
(Ebook) Two Oceans: A guide to the marine life of southern Africa by George Branch, Charles Griffiths, Margo Branch, Lynnath Beckley ISBN 9781775842750, 1775842754 2024 Scribd Download
81 pages
L02 Series Product Specification_V2.0_20240906
No ratings yet
L02 Series Product Specification_V2.0_20240906
13 pages
Dadria: Federal Negarit Gazette
No ratings yet
Dadria: Federal Negarit Gazette
8 pages
Patience and Love Quotes (54 Quotes) 3
No ratings yet
Patience and Love Quotes (54 Quotes) 3
1 page
NTPC Quality Management 1 TQM
No ratings yet
NTPC Quality Management 1 TQM
20 pages
Time, Speed & Distance Concepts
No ratings yet
Time, Speed & Distance Concepts
11 pages
SW-SPEC-220KV Cable PD Testing System PD-7000KVA-350KV
No ratings yet
SW-SPEC-220KV Cable PD Testing System PD-7000KVA-350KV
8 pages
Cankerfret Diagram
No ratings yet
Cankerfret Diagram
2 pages
Farnell, LR - The Cults of The Greek States I (1896)
No ratings yet
Farnell, LR - The Cults of The Greek States I (1896)
509 pages
ARIAS Act01
No ratings yet
ARIAS Act01
2 pages
ICT Grade 2 Revison Note
No ratings yet
ICT Grade 2 Revison Note
6 pages
ME8692 - FEA Important Questions
No ratings yet
ME8692 - FEA Important Questions
11 pages
Submitted To: Submitted By:: Waste Management System Using Vehicle Tracking System
No ratings yet
Submitted To: Submitted By:: Waste Management System Using Vehicle Tracking System
38 pages
DLP G1 Particles of Matter
No ratings yet
DLP G1 Particles of Matter
3 pages
English 9 Q3 Periodic Exam Blooms Taxo With Answer Key
No ratings yet
English 9 Q3 Periodic Exam Blooms Taxo With Answer Key
11 pages
WPS Sample
No ratings yet
WPS Sample
6 pages
Unit 7 Trignometry Assessment
No ratings yet
Unit 7 Trignometry Assessment
4 pages
Mathematics for Machine Learning
No ratings yet
Mathematics for Machine Learning
270 pages
'Keep Your Distance' Aby Warburg On Myth and Modern Art
No ratings yet
'Keep Your Distance' Aby Warburg On Myth and Modern Art
16 pages
Post Event Activities
No ratings yet
Post Event Activities
17 pages
Db-066e Revf Radeyeprd-Er
No ratings yet
Db-066e Revf Radeyeprd-Er
124 pages
Nylosolv A EN
No ratings yet
Nylosolv A EN
1 page
Unit 4 Discovering Plants and Animals
No ratings yet
Unit 4 Discovering Plants and Animals
13 pages

Lecture 2-Print

Uploaded by

Lecture 2-Print

Uploaded by

Data Compression

• Player 1: Encode a symbol string

(bps = bits per symbol)

• Consider the following

• In order to represent eight symbols, we need to

• Shannon defined a quantity called self-

• This model is called a static model because its parameters do not

You might also like