0% found this document useful (0 votes)
24 views

Lecture 2-Print

The document discusses data compression techniques including modeling data to extract redundancies, coding the residual differences, and estimating source entropy based on symbol probabilities and block sizes. It provides examples of how predictive coding and analyzing data in blocks rather than individually can reveal structure and reduce the estimated source entropy for more efficient compression. The document also discusses designing optimal prefix codes to minimize bit rates for encoding symbol strings.

Uploaded by

raj singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 2-Print

The document discusses data compression techniques including modeling data to extract redundancies, coding the residual differences, and estimating source entropy based on symbol probabilities and block sizes. It provides examples of how predictive coding and analyzing data in blocks rather than individually can reveal structure and reduce the estimated source entropy for more efficient compression. The document also discusses designing optimal prefix codes to minimize bit rates for encoding symbol strings.

Uploaded by

raj singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Data Compression

Lecture 2
Exercise Encode/Decode
0 1
a 0 1

d
0 1

b c

• Player 1: Encode a symbol string


• Player 2: Decode the string
• Check for equality

2
How Good is the Code
0 1
c
0 1 5/
a b
1/ 1/ 8
8 4
bit rate = (1/8)2 + (1/4)2 + (5/8)1 = 11/8 = 1.375
bps Entropy = 1.3 bps
Standard code = 2 bps

(bps = bits per symbol)

3
Modeling and Coding
• Development of data compression algorithms
for a variety of data can be divided into two
phases.
– Modeling: try to extract information about any
redundancy that exists in the data and describe the
redundancy in the form of a model
– Coding: Representation of Data
• The difference between the data and the
model is often referred to as the residual
Example 1

• Consider the following


sequence of numbers
x1x2x3:
• If we were to transmit
or store the binary
representations of
these numbers, we
would need to use 5
bits per sample.
• Model:

11-Jan-24 5
• The residual sequence consists of only three
numbers −1 0 1. If we assign a code of 00 to
−1, a code of 01 to 0, and a code of 10 to 1,
we need to use 2 bits to represent each
element of the residual sequence.
• Therefore, we can obtain compression by
transmitting or storing the parameters of the
model and the residual sequence.
11-Jan-24 6
Example 2

11-Jan-24 7
• The decoder adds each received value to the
previous decoded value to obtain the
reconstruction corresponding to the received
value.
• Techniques that use the past values of a
sequence to predict the current value and
then encode the error in prediction, or
residual, are called predictive coding schemes.

11-Jan-24 8
Example 3

• In order to represent eight symbols, we need to


use 3 bits per symbol

11-Jan-24 9
Self Information

• Shannon defined a quantity called self-


information.
• Suppose we have an event A, which is a set of
outcomes of some random experiment. If PA
is the probability that the event A will occur,
then the self-information associated with A is
given by
• Log(1) = 0, and −log(x) increases as x decreases from one to zero.
Therefore, if the probability of an event is low, the amount of self-
information associated with it is high; if the probability of an event
is high, the information associated with it is low.

11-Jan-24 10
Definitions
• Identically distributed (iid) means that there are
no overall trends–the distribution does not
fluctuate and all items in the sample are taken
from the same probability distribution.
• Independent means that the sample items are all
independent events. In other words, they are not
connected to each other in any way; knowledge
of the value of one variable gives no information
about the value of the other and vice versa.

11-Jan-24 11
Example
• it is not possible to know the entropy for a
physical source, so we have to estimate the
entropy.
• Consider the following sequence:
1 2 3 2 3 4 5 4 5 6 7 8 9 8 9 10
• Assuming the frequency of occurrence of each
number is reflected accurately in the number
of times it appears in the sequence

11-Jan-24 12
• Assuming the sequence is iid, entropy can then be calculated as

• With our stated assumptions, the entropy for this source is 325
bits. This means that the best scheme we could find for coding this
sequence could only code it at 3.25 bits/sample.
• However, if we assume that there was sample-to-sample
correlation between the samples and we remove the correlation by
,
taking differences of neighboring sample values we arrive at the
residual sequence
1 1 1−1 1 1 1 −1 1 1 1 1 1 −1 1 1
• This sequence is constructed using only two values with
probabilities P(1) = 13 /16 and P(−1) = 3/ 16 . The entropy in this
case is 0.70 bits per symbol .
13
11-Jan-24
Contd…
• Of course, knowing only this sequence would not be enough for
the receiver to reconstruct the original sequence.
• The receiver must also know the process by which this sequence
was generated from the original sequence.
• The process depends on our assumptions about the structure of
the sequence.
• These assumptions are called the model for the sequence. In this
case, the model for the sequence is

• This model is called a static model because its parameters do not


change with n.
• A model whose parameters change or adapt with n to the changing
characteristics of the data is called an adaptive model.

14
11-Jan-24
• The entropy of the source is a measure of the amount of information
generated by the source.
• Basically, we see that knowing something about the structure of the
data can help to “reduce the entropy.”
• As long as the information generated by the source is preserved (in
whatever representation), the entropy remains the same.
• What we are reducing is our estimate of the entropy.
• The “actual” structure of the data in practice is generally
unknowable, but anything we can learn about the data can help us to
estimate the actual source entropy.
• We accomplish this in our definition of the entropy by picking larger
and larger blocks of data to calculate the probability over, letting the
size of the block go to infinity.

11-Jan-24 15
Example
• Consider the following sequence
12123333123333123312
• if we look at it one symbol at a time, the
structure is difficult to extract. Consider the
probabilities: P(1) = P(2) = 1/4, and P(3) = ½
• The entropy is 1.5 bits/symbol.
• This particular sequence consists of 20
symbols; therefore, the total number of bits
required to represent this sequence is 30

11-Jan-24 16
• Now let’s take the same sequence and look at it in
blocks of two.
• Obviously, there are only two symbols, (1 2), and
(3 3).
• The probabilities are P(1 2) = 1/2 , P(3 3) = 1/2 ,
and the entropy is 1 bit/symbol.
• As there are 10 such symbols in the sequence, we
need a total of 10 bits to represent the entire
sequence—a reduction of a factor of three.

11-Jan-24 17
Design a Prefix Code 1
• abracadabra
• Design a prefix code for the 5 symbols
{a,b,r,c,d} which compresses this string the
most.

18
Design a Prefix Code 2
• Suppose we have n symbols each with
probability 1/n. Design a prefix code with
minimum average bit rate.
• Consider n = 2,3,4,5,6 first.

19

You might also like