CPS 296.3:algorithms in The Real World: Data Compression: Lecture 2.5
CPS 296.3:algorithms in The Real World: Data Compression: Lecture 2.5
296.3
Page 1
Summary so far
Model generates probabilities, Coder uses them
Probabilities are related to information. The more
you know, the less info a message will give.
More skew in probabilities gives lower Entropy H
and therefore better compression
Context can help skew probabilities (lower H)
Average length la for optimal prefix code bound by
H la H 1
Page 2
Compress
{p(s) | s S}
Coder
Dynamic
Part
Codeword
|w| iM(s)
= -log p(s)
Message
s S
The Static part of the model is fixed
The Dynamic part is based on previous messages allows
probability distribution to change
The optimality of the code is relative to the probabilities.
If they are not accurate, the code is not going to be efficient
296.3
Page 3
Decoder
Model
Static Part
Dynamic
Part
Message
s S
296.3
Page 4
296.3
Page 5
Compression Outline
Introduction: Lossy vs. Lossless, Benchmarks,
Information Theory: Entropy, etc.
Probability Coding: Huffman + Arithmetic Coding
Applications of Probability Coding: PPM + others
Transform coding: move to front, run-length,
Context coding: fixed context, partial matching
Lempel-Ziv Algorithms: LZ77, gzip, compress, ...
Other Lossless Algorithms: Burrows-Wheeler
Lossy algorithms for images: JPEG, MPEG, ...
Compressing graphs and meshes: BBK
296.3
Page 6
Page 7
296.3
Page 8
White
Black
000111 010
0111
11
00111
0000100
Page 9
296.3
Page 10
296.3
Page 11
BZIP
Transform 0: Embedded run length: AAAAAAA AAAA\3
Transform 1: (Burrows Wheeler) covered later
input : character string (block) 100KBto 900KB
output : reordered character string
Transform 2: (move to front)
input : character string
output : MTF numbering
Transform 3: (run length)
input : MTF numbering
output : sequence of run lengths
Probabilities: (on run lengths)
Dynamic based on counts for each block.
Coding: Originally arithmetic, but changed to Huffman in bzip2
due to patent concerns
296.3
Page 12
Residual Coding
Typically used for message values that represent
some sort of amplitude:
e.g., gray-level in an image, or amplitude in audio.
Basic Idea: guess next value based on current
context. Output difference between guess and
actual value. Use probability code on the output.
Consider compressing a stock value over time.
296.3
Page 13
JPEG-LS
JPEG Lossless (not to be confused with lossless JPEG)
Recently completed standardization process.
Codes in Raster Order. Uses 4 pixels as context:
NW N NE
W *
Tries to guess value of * based on W, NW, N and NE.
Works in two stages
296.3
Page 14
min( N , W )
P max( N , W )
N W NW
if NW max( N , W )
if NW min( N , W )
otherwise
40
40
30 40
20
40
296.3
Page 15
Page 16
JPEG LS
Transform: (residual)
input : gray-level image (8 bits/pixel)
output : difference from guess at each pixel
Probabilities: (on the differences)
Static probabilities based on golomb code --something like p(n) = c/n2.
Coding: Golomb code
296.3
Page 17
296.3
Page 18
296.3
Page 19
296.3
Page 20
Counts
A = 4
B = 2
C = 5
$ = 3
Context
A
B
C
Counts
C = 3
$ = 1
A = 2
$ = 1
A = 1
B = 2
C = 2
$ = 3
Context
AC
B
C
$
BA
C
$
CA
C
$
CB
A
$
CC
A
B
$
Counts
= 1
= 2
= 2
= 1
= 1
= 1
= 1
= 2
= 1
= 1
= 1
= 2
296.3
Page 21
Page 22