Comp 5
Comp 5
5.
English
4.1854
4.1603
0.6
Finnish
4.0448
4.0134
0.8
French
4.0003
4.0376
0.9
German
4.1475
4.1129
0.8
Hebrew
4.2851
4.2490
0.8
Italian
4.0000
3.9725
0.7
Portuguese
4.0100
3.9864
0.6
Spanish
4.0469
4.0230
0.6
Russian
4.4704
4.4425
0.6
English-2
7.4446
7.4158
0.4
Hebrew-2
8.0370
8.0085
0.4
Small Alphabet
What will happen if the items set for
the compression is small?
Example of 77.49MB of bitmaps:
n
Huffman
(MB)
Arithmetic
(MB)
increase (%)
each bit
77.49
9.64
703
k-blocks
256
13.70
7.70
78
5.
Time considerations
Encoding time in seconds:
200
180
Arithmetic
160
140
Static
Huffman
120
100
Adaptive
Huffman
80
60
40
20
0
Gadsby
5.
Tora
gnuplot.exe
chess.bmp
Time considerations
Decoding time in seconds:
800
Arithmetic
700
600
Static
Huffman
500
400
Adaptive
Huffman
300
200
100
0
Gadsby
5.
Tora
gnuplot.exe
chess.bmp
5.
An example
The items are:
a-10%
i-20%
r-30%
y-40%
An example (Cont.)
ones - 0.3*1+0.1*2+0.2*3=1.1
zeros - 0.4*1+0.3*1+0.1*1=0.8
Ratio is 8:11
Arithmetic code for bits:
- [0, 819 )
- [ 819 ,1)
5.
An example (Cont.)
The bit string for "ii" is: 111111.
[0,1) [ 819 ,1) [ 240 361 ,1)
[5528 6859 ,1) [115680 130321 ,1)
[ 2315048 2476099 ,1)
[ 45274320 47045881 ,1)
5.
An example (Cont.)
The shortest binary fraction in this interval is
0.11111 which is just 5 bits.
Huffman represents "i" by 3 bits which is too
much since 3=-log20.125 while the probability
of "i" is 0.2.
0.2*0.2=0.04 so the content of data in this
string is -log20.04 which is circa 4.643856 so
5 bits are the optimal compression of this
string.
5.
Theorem
The second step of the Quasi Arithmetic
Coding will give the worst improvement when
the 1s and the 0s have exactly the same
probability - 0.5
In other words: The entropy of a binary
string will be maximal, if the 1s and the 0s
have exactly the same probability - 0.5
When the ratio between the 1s and the 0s
grows, the entropy will be diminished.
5.
Theorem - Proof
The derivative of the entropy is:
1
(1 P) ln (1 P)
P
ln
P
= [ ln 2
]'
ln 2
P + ln P
P ln 2
ln 2
(1 P)( 1)
ln 2 (1 P)
5.
ln (1 P)
ln 2
=1
1 P = P
P = 12
P ]' =
1
1
ln
ln 2
(1 P ) ln 2 P ln 2
When P=1/2:
ln22 ln22 < 0
Theorem - conclusion
When P=1 or P=0:
lim x 0 H = lim x 0 x log 2 x 1 log 2 1 =
lim x 0 log 2 ( x ) = log 2 1 = 0
x
When P=1/2:
H = 12 log 2 12 12 log 2 12 = 1
5.
Conclusion in a graph
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
P
5.
0.2
0.1
H as a function of P
Wrong probabilities
Excess of Huffman over Arithmetic in
percents when assuming the rows to
compress the columns
English
Gadsby
German
Finnish
French
English
0.6
1.4
-2.1
-5.1
0.4
Gadsby
-2.4
1.0
-2.9
-0.8
-3.3
German
-1.8
-3.6
0.9
-5.2
-0.1
Finnish
2.4
1.2
2.7
0.8
2.9
French
0.8
2.2
-4.6
-11.3
0.9
5.