0% found this document useful (0 votes)

31 views

CPS 296.3:algorithms in The Real World: Data Compression: Lecture 2.5

The document discusses data compression techniques using probability coding models. It describes how probability models generate probabilities that coders use to assign optimal prefix codes. Context can help skew probabilities to improve compression by lowering entropy. Huffman and arithmetic codes are discussed as optimal prefix codes that can adapt to changing message probabilities. The document then provides examples of specific compression algorithms like run length coding, move-to-front coding, and prediction by partial matching (PPM) that apply probability coding techniques.

Uploaded by

Rakesh Inani

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

CPS 296.3:algorithms in The Real World: Data Compression: Lecture 2.5

Uploaded by

Rakesh Inani

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 22

CPS 296.

3:Algorithms in the Real World

Data Compression: Lecture 2.5

296.3

Page 1

Summary so far
Model generates probabilities, Coder uses them
Probabilities are related to information. The more
you know, the less info a message will give.
More skew in probabilities gives lower Entropy H
and therefore better compression
Context can help skew probabilities (lower H)
Average length la for optimal prefix code bound by

H la H 1

Huffman codes are optimal prefix codes

Arithmetic codes allow blending among messages
296.3

Page 2

Encoding: Model and Coder

Model
Static Part

Compress
{p(s) | s S}
Coder

Dynamic
Part

Codeword
|w| iM(s)
= -log p(s)

Message
s S
The Static part of the model is fixed
The Dynamic part is based on previous messages allows
probability distribution to change
The optimality of the code is relative to the probabilities.
If they are not accurate, the code is not going to be efficient

296.3

Page 3

Decoding: Model and Decoder

Uncompress
{p(s) | s S}
Codeword

Decoder

Model
Static Part
Dynamic
Part
Message
s S

The probabilities {p(s) | s S} generated by the model need to

be the same as generated in the encoder.
Note: consecutive messages can be from different message
sets, and the probability distribution can change

296.3

Page 4

Codes with Dynamic Probabilities

Huffman codes:
Need to generate a new tree for new probabilities.
Small changes in probability typically make small
changes to the Huffman tree.
Adaptive Huffman codes update the tree without
having to completely recalculate it.
Used frequently in practice
Arithmetic codes:
Need to recalculate the f(m) values based on current
probabilities.
Can be done with a balanced tree.

296.3

Page 5

Compression Outline
Introduction: Lossy vs. Lossless, Benchmarks,
Information Theory: Entropy, etc.
Probability Coding: Huffman + Arithmetic Coding
Applications of Probability Coding: PPM + others
Transform coding: move to front, run-length,
Context coding: fixed context, partial matching
Lempel-Ziv Algorithms: LZ77, gzip, compress, ...
Other Lossless Algorithms: Burrows-Wheeler
Lossy algorithms for images: JPEG, MPEG, ...
Compressing graphs and meshes: BBK
296.3

Page 6

Applications of Probability Coding

How do we generate the probabilities?
Using character frequencies directly does not work
very well (e.g. 4.5 bits/char for text).
Technique 1: transforming the data
Run length coding (ITU Fax standard)
Move-to-front coding (Used in Burrows-Wheeler)
Residual coding (JPEG LS)
Technique 2: using conditional probabilities
Fixed context (JBIGalmost)
Partial pattern matching (PPM)
296.3

Page 7

Run Length Coding

Code by specifying message value followed by the
number of repeated values:
e.g. abbbaacccca => (a,1),(b,3),(a,2),(c,4),(a,1)
The characters and counts can be coded based on
frequency, ensures only a few bits of overhead for
low counts such as 1.
Run length coding exploits context to skew
probabilities.

296.3

Page 8

Facsimile ITU T4 (Group 3)

Standard used by all home Fax Machines
ITU = International Telecommunications standards body
Run length encodes sequences of black+white pixels
Fixed Huffman Code for all documents. e.g.
Run length
1
2
10

White
Black
000111 010
0111
11
00111
0000100

Since alternate black and white, no need to specify but

use different codewords anyway. Why?
296.3

Page 9

Facsimile ITU T4 (Group 3)

Transform: (run length)
input : binary string
output : interleaving of run lengths of black and
white pixels
Probabilities: (on the output of the transform)
Static probabilities of each run length based on
large set of test documents.
Coding: Huffman coding

296.3

Page 10

Move to Front Coding

Transforms message sequence into sequence of integers, that
can then be probability coded
Like run-length, takes advantage of temporal locality
Probability distribution is dynamic
Start with values in a total order: e.g.: [a,b,c,d,]
For each message
output the position in the order
move to the front of the order.
e.g.: c => output: 3, new order: [c,a,b,d,e,]
a => output: 2, new order: [a,c,b,d,e,]
Probability code the output, assuming a bias for small numbers.

296.3

Page 11

BZIP
Transform 0: Embedded run length: AAAAAAA AAAA\3
Transform 1: (Burrows Wheeler) covered later
input : character string (block) 100KBto 900KB
output : reordered character string
Transform 2: (move to front)
input : character string
output : MTF numbering
Transform 3: (run length)
input : MTF numbering
output : sequence of run lengths
Probabilities: (on run lengths)
Dynamic based on counts for each block.
Coding: Originally arithmetic, but changed to Huffman in bzip2
due to patent concerns

296.3

Page 12

Residual Coding
Typically used for message values that represent
some sort of amplitude:
e.g., gray-level in an image, or amplitude in audio.
Basic Idea: guess next value based on current
context. Output difference between guess and
actual value. Use probability code on the output.
Consider compressing a stock value over time.

296.3

Page 13

JPEG-LS
JPEG Lossless (not to be confused with lossless JPEG)
Recently completed standardization process.
Codes in Raster Order. Uses 4 pixels as context:
NW N NE
W *
Tries to guess value of * based on W, NW, N and NE.
Works in two stages

296.3

Page 14

JPEG LS: Stage 1

Uses the following equation:

min( N , W )

P max( N , W )
N W NW

if NW max( N , W )
if NW min( N , W )
otherwise

Averages neighbors and captures edges. e.g.

30 40
20

40
296.3

Page 15

JPEG LS: Stage 2

Uses 3 gradients (differences): NE-N, N-NW, and
NW-W
Classifies each into one of 9 roughly equiprobable
regions.
This gives 93=729 contexts, of which only 365 are
needed because of symmetry.
Each context has a bias term that is used to adjust
the previous prediction
After correction, the residual between guessed and
actual value is found and coded using a Golomb-like
code. (Golomb codes are similar to Gamma codes.
Small values have shorter codes.)
296.3

Page 16

JPEG LS
Transform: (residual)
input : gray-level image (8 bits/pixel)
output : difference from guess at each pixel
Probabilities: (on the differences)
Static probabilities based on golomb code --something like p(n) = c/n2.
Coding: Golomb code

296.3

Page 17

Using Conditional Probabilities: PPM

Use previous k characters as the context.
Makes use of conditional probabilities
Base probabilities on counts:
e.g., if seen th 12 times followed by e 7 times, then
the conditional probability p(e|th) = 7/12.
Need to keep k small so that dictionary does not get
too large (typically less than 8).
Note that 8-gram Entropy of English is about
2.3bits/char while PPM does as well as 1.7bits/char

296.3

Page 18

PPM: Prediction by Partial Matching

Problem: What do we do if we have not seen the
context followed by the character before?
Cannot code 0 probabilities!
The key idea of PPM is to reduce context size if
previous match has not been seen.
If character has not been seen before with
current context of size 3, try context of size
2, and then context of size 1, and then no
context
Keep statistics for each context size < k

296.3

Page 19

PPM: Changing between context

How do we tell the decoder to use a smaller context?
Send an escape message. Each escape tells the
decoder to reduce the size of the context by 1.
The escape can be viewed as special character, but
needs to be assigned a probability.
Different variants of PPM use different
heuristics for the probability.

296.3

Page 20

PPM: Example Contexts

Context
Empty

Counts
A = 4
B = 2
C = 5
$ = 3

Context
A
B
C

k=2 String = ACCBACCACBA

Counts
C = 3
$ = 1
A = 2
$ = 1
A = 1
B = 2
C = 2
$ = 3

Context
AC
B
C
$
BA
C
$
CA
C
$
CB
A
$
CC
A
B
$

Counts
= 1
= 2
= 2
= 1
= 1
= 1
= 1
= 2
= 1
= 1
= 1
= 2

$ means escape, values from PPMC

296.3

Page 21

PPM: Other important optimizations

If context has not been seen before, automatically
escape (no need for an escape symbol since
decoder knows previous contexts)
Also, can exclude certain possibilities when switching
down a context. This can save 20% in final length!
For example, to code an A, send two escapes $,
then exclude possibility that character is C in zero
context list (use a count of 0 for C.)
It is critical to use arithmetic codes since the
probabilities are small.
296.3

Page 22

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Lemp El Ziv Compression
No ratings yet
Lemp El Ziv Compression
6 pages
Sanjay B - 212222060217
No ratings yet
Sanjay B - 212222060217
9 pages
DC Lab Manual Simulation Experiments
No ratings yet
DC Lab Manual Simulation Experiments
29 pages
Digital Communication Systems: ECE-4001 TASK-3
No ratings yet
Digital Communication Systems: ECE-4001 TASK-3
12 pages
Digital Communication Systems: ECE-4001 TASK-3
No ratings yet
Digital Communication Systems: ECE-4001 TASK-3
11 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
Fast Implementations of RSA Cryptography
No ratings yet
Fast Implementations of RSA Cryptography
8 pages
Low-Power Design of Reed-Solomon Encoders: Wei Zhang, Jing Wang Xinmiao Zhang
No ratings yet
Low-Power Design of Reed-Solomon Encoders: Wei Zhang, Jing Wang Xinmiao Zhang
4 pages
Viterbi Decoding
No ratings yet
Viterbi Decoding
4 pages
Ee4140 Comp Assg1 Sep 2024 v2
No ratings yet
Ee4140 Comp Assg1 Sep 2024 v2
2 pages
Illuminating The Structure Code and Decoder Parallel Concatenated Recursive Systematic Codes
No ratings yet
Illuminating The Structure Code and Decoder Parallel Concatenated Recursive Systematic Codes
6 pages
None
No ratings yet
None
23 pages
Software-Defined Radio Lab 2: Data Modulation and Transmission
No ratings yet
Software-Defined Radio Lab 2: Data Modulation and Transmission
8 pages
Tornado Codes and Luby Transform Codes PDF
No ratings yet
Tornado Codes and Luby Transform Codes PDF
12 pages
09 CM0340 Basic Compression Algorithms
No ratings yet
09 CM0340 Basic Compression Algorithms
73 pages
Mid-Term Q. Paper DCN
No ratings yet
Mid-Term Q. Paper DCN
2 pages
Expt-2 - Lab Manual - TE Lab
No ratings yet
Expt-2 - Lab Manual - TE Lab
8 pages
Tutorial Python 02
No ratings yet
Tutorial Python 02
2 pages
PCM 1
No ratings yet
PCM 1
5 pages
Exam #1 For Computer Networks (CIS 6930) SOLUTIONS : Problem #1
No ratings yet
Exam #1 For Computer Networks (CIS 6930) SOLUTIONS : Problem #1
5 pages
DC4_lab1_py
No ratings yet
DC4_lab1_py
5 pages
Data Compression 2
No ratings yet
Data Compression 2
19 pages
Sandhu09 Coding Theory Project
No ratings yet
Sandhu09 Coding Theory Project
10 pages
Rawan Ashraf 2019 - 13921 Comm. Lab5
No ratings yet
Rawan Ashraf 2019 - 13921 Comm. Lab5
17 pages
7 - PCM - Baseband Signaling - Line Codes
No ratings yet
7 - PCM - Baseband Signaling - Line Codes
21 pages
Low-Density Parity-Check Codes
No ratings yet
Low-Density Parity-Check Codes
6 pages
A Matlab Implementation of A DPCM Encoder and Decoder
No ratings yet
A Matlab Implementation of A DPCM Encoder and Decoder
11 pages
Kasraian ECET375 Week2
No ratings yet
Kasraian ECET375 Week2
42 pages
Data Communication - Spring'17
No ratings yet
Data Communication - Spring'17
60 pages
Lets Discuss (How, What and Why) : 16 QAM Simulation in MATLAB
No ratings yet
Lets Discuss (How, What and Why) : 16 QAM Simulation in MATLAB
7 pages
PCM
No ratings yet
PCM
9 pages
Chapter 5 - Digital Modulation
No ratings yet
Chapter 5 - Digital Modulation
63 pages
Sequential Decoding by Stack Algorithm
100% (1)
Sequential Decoding by Stack Algorithm
9 pages
Soft Output Demapper
No ratings yet
Soft Output Demapper
5 pages
2003-159
No ratings yet
2003-159
10 pages
Gray Level Count Probabil Ity 21 12 3/8 95 4 1/8 169 4 1/8 243 12 3/8
No ratings yet
Gray Level Count Probabil Ity 21 12 3/8 95 4 1/8 169 4 1/8 243 12 3/8
51 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
When To Use System Objects Instead of MATLAB Functions: Randi
No ratings yet
When To Use System Objects Instead of MATLAB Functions: Randi
14 pages
SC 12
No ratings yet
SC 12
45 pages
390 Hills System
No ratings yet
390 Hills System
25 pages
20.5 Arithmetic Coding
No ratings yet
20.5 Arithmetic Coding
6 pages
Exam in Computer Networks A (Datornätverk A) : Magnus - Eriksson@miun - Se
No ratings yet
Exam in Computer Networks A (Datornätverk A) : Magnus - Eriksson@miun - Se
6 pages
book electric [300-305]
No ratings yet
book electric [300-305]
6 pages
8 - DPCM-Linear Prediction-DM
No ratings yet
8 - DPCM-Linear Prediction-DM
37 pages
Pulse Code Modulation
No ratings yet
Pulse Code Modulation
16 pages
Performance Analysis of 16 Qam Modulation Systems in Awgn Channel Using Matlab
No ratings yet
Performance Analysis of 16 Qam Modulation Systems in Awgn Channel Using Matlab
6 pages
I Am Sharing 'EC3501 - WC LAB 2021 R' With You
No ratings yet
I Am Sharing 'EC3501 - WC LAB 2021 R' With You
47 pages
Wireless Comm Systems2
No ratings yet
Wireless Comm Systems2
23 pages
Data and Voice Coding
No ratings yet
Data and Voice Coding
20 pages
Principles Comm EXP 9 Student Manual NEW
No ratings yet
Principles Comm EXP 9 Student Manual NEW
13 pages
SPA Against An FPGA-Based RSA Implementation With A High-Radix Montgomery Multiplier
No ratings yet
SPA Against An FPGA-Based RSA Implementation With A High-Radix Montgomery Multiplier
4 pages
Compression II
No ratings yet
Compression II
51 pages
Wireless Com. Report 20111115 - English
No ratings yet
Wireless Com. Report 20111115 - English
23 pages
Slot24 25 26 TextProcessing 2021 04
No ratings yet
Slot24 25 26 TextProcessing 2021 04
58 pages
(Y2006) A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching
No ratings yet
(Y2006) A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching
26 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
From Everand
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
Analog Dialogue
No ratings yet
Firpm
No ratings yet
Firpm
5 pages
Image Enhancement
No ratings yet
Image Enhancement
38 pages
L10 - Walsh & Hadamard Transforms
100% (1)
L10 - Walsh & Hadamard Transforms
25 pages
DFT Properties
No ratings yet
DFT Properties
37 pages
Frequency Domain Filters
No ratings yet
Frequency Domain Filters
43 pages
JPEG
No ratings yet
JPEG
29 pages
Difference Between ZIP and GZIP - Difference Between - ZIP Vs GZIP
No ratings yet
Difference Between ZIP and GZIP - Difference Between - ZIP Vs GZIP
2 pages
Histogram Processing
No ratings yet
Histogram Processing
27 pages
Concave Function: From Wikipedia, The Free Encyclopedia
No ratings yet
Concave Function: From Wikipedia, The Free Encyclopedia
3 pages
Cyclic Redundancy Check
No ratings yet
Cyclic Redundancy Check
9 pages
ARC (File Format)
No ratings yet
ARC (File Format)
5 pages
Deflate: From Wikipedia, The Free Encyclopedia
No ratings yet
Deflate: From Wikipedia, The Free Encyclopedia
9 pages
Convex Function: From Wikipedia, The Free Encyclopedia
No ratings yet
Convex Function: From Wikipedia, The Free Encyclopedia
7 pages
Mathematical Prelims
No ratings yet
Mathematical Prelims
13 pages
Run-Length, Golomb, and Tunstall Codes: Thinh Nguyen Oregon State University
No ratings yet
Run-Length, Golomb, and Tunstall Codes: Thinh Nguyen Oregon State University
26 pages
Entropy Coding
No ratings yet
Entropy Coding
18 pages
Burrows-Wheeler Transform - Wikipedia, The Free Encyclopedia
No ratings yet
Burrows-Wheeler Transform - Wikipedia, The Free Encyclopedia
10 pages
The Burrows-Wheeler Transform
No ratings yet
The Burrows-Wheeler Transform
64 pages
Haar Transform
No ratings yet
Haar Transform
12 pages
Transform Coding II
No ratings yet
Transform Coding II
19 pages
Lossy Compression Iii - 1
No ratings yet
Lossy Compression Iii - 1
21 pages

CPS 296.3:algorithms in The Real World: Data Compression: Lecture 2.5

Uploaded by

CPS 296.3:algorithms in The Real World: Data Compression: Lecture 2.5

Uploaded by

CPS 296.

3:Algorithms in the Real World

Huffman codes are optimal prefix codes

Encoding: Model and Coder

Decoding: Model and Decoder

The probabilities {p(s) | s S} generated by the model need to

Codes with Dynamic Probabilities

Applications of Probability Coding

Run Length Coding

Facsimile ITU T4 (Group 3)

Since alternate black and white, no need to specify but

Facsimile ITU T4 (Group 3)

Move to Front Coding

JPEG LS: Stage 1

Averages neighbors and captures edges. e.g.

JPEG LS: Stage 2

Using Conditional Probabilities: PPM

PPM: Prediction by Partial Matching

PPM: Changing between context

PPM: Example Contexts

k=2 String = ACCBACCACBA

$ means escape, values from PPMC

PPM: Other important optimizations

You might also like