DC 1

always available

Uploaded by

srigoutham2414

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

15 views3 pages

DC 1

always available

Uploaded by

srigoutham2414

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 3

4” stage Subject: Data Compression Lecture No.:8 Lecture time: 10:30 AM-2:30 PM. Class room no, Instructor: Dr. All Kadhum AL-Quraby Department of computer science Dictionary Methods Statistical compression methods use a statistical model of the data, which is why the quality of compression they achieve depends on how good that moda is. Dictionary-based compression methods do not wse a statistical model, nor do they use vatiable-size codes. Instead they select strings of symbols and encode each string as a token using a dictionary. ‘String Compression Ingeneral, compression methods based on strings of symbols can be rore efficient than methods that compress individual symbols in principle, better compression is possible if the symbols of the alphabet have very different probabilities of occurrence. We use a simple example to show that the robabilities of stings of symbols vary more than the probabilities of the individual symbols constituting the strings ‘Wesstart with a 2-symbol alphabet al and a2, with probabilities P1 = 0.8 and P2 = 0.2, respectively. The average probability is 0.5, and we can get an idea of the variance (how much the individual ‘probabilities deviate from the average) hy calculating the swn of absolute differences (variance }= ps—0s|+ p2-05|=06. Any variable size code would assign 1-bit codes to the two symibols, so the average size of the code is one bit per symbol. ‘We now generate all the strings of two symnbols, There are four of them, shown in Table 3.1, String Probability Code may 08x OR=000 0 asa, OR~O2=O.16 aay 0.2 O.8=0.16 100 aga, «0.2% 0.2=0.04 101 ‘Together with their probabilities and a set of Huffman codes. The average probability is 0.25, so a sum of absolute differences similar to the one above yields variance = p.64—0.25|+ p.16—0.25|+ P16 -0.25|+ p.04- 0.25|- 0.78 ‘The average size of the Huffman code is 1* 064+ 2x 0.16 + 3x 0.16 + 3 x 0.04 = 1.56 bits per string, which is 0.78 (1.56/2-0.78) bits per gmmbol, Inthe next step we similarly create al eight stings of thnee symbols. They are shown in Table 3.1b,st Probability Code yaaa 0.8 KOS XO, 100 ayaa, 0.8 «0.20. 101 yaaa 08 0.2% 0, L100 ayaa, 02 X08 x0. 110 tgayay 0.2K 08 x0. 101 fanaa 0.20.20, nino dnaaty 0.2 40.2% 0, aut ) ‘together with their probabilities and a set of Huffinan codes. The average probability is 0.225, so a sum of absolute differences sirlar to the ones above yields variance= [).512-0.125* 3)0128- 0.125 #3 P.032- 0.125|+ [.008- 0.125 0.792 ‘The average size of the Huffman code in this case is 1x 0512 + 3x 3 0.128 + 3 5 0.082 + 5 x 0.008 = 2.184 bits per string, which equals 0.728 (2.184/3) bits. per symbol As we keep generating longer and longer strings, the probabilities of the strings differ more and more fromtheir average, and the average code size gets better (Table 3.10). Str. Variance Avg. size size of prob. _ of code 1 06 T 2 07s 078 3 0.792 0.728 iC) 127 (Sliding Window) ‘The principle of this method (which is sometimes referred to as LZ1) [Ziv and Lempel 77] is to we part of the previousty-seen input stream as the dictionary. The encoder maintains a window to the input stream and shifts the input in that window from right to left as strings of symbols are being encoded. Thus, the method is based on a stiding window The window helow is divided into two ‘patts, The part on the lefts the search buffer. This is the curert dictionary, and it includes symbols that have recently been input and encoded, The parton the rightis the look-ahead buffer, containing textyet tobe encoded. In practical implementations the search buffer is sore thousands of bytes long, ‘while the look-ahead buffer is only tens of bytes long, The vertical har between the t and the e below represents the cument dividing line between the two buffers, We assume that the text sir_sid_eastman_easily_t has already been compressed, while the text enses_sea_sick_seals still needs to be compressed. lst aaa pl Qld Data Compression 2‘coded text. . . sir_sid_eastman_easily_t]eases_sea_sick_seals. . . text to be read The encoder scars the search buffer backwards (from right to lef? looking for a atch for the first symbol e in the look-ahead buffer. It finds one at the e of the word easily. This e is at a distance (offset) of 8 from the end of the search buffer. The encoder then matches as many symbols following the two e's as possible. Three syribols eas match in this case, so the length of the match is 3. The encoder then continues the backward scan, trying to find longer matches. In our case, there is one more match at the word eastman, with offset 16, and it has the same length. The encoder selects the longest match or, if they are all the same length, the last one found, and prepares the token (16, 3, €). Tis possible to follow L277 with Hluffinan, or some other statistical coding of the tokens, where small offsets are assigned shorter codes. This method, proposed by Bemd Herd, is called LZH. Having mary srrell offsets implies better compression in LZH. Ingeneral, an LZ77 token has three parts: offset, length, and next symbol in the look-ahead buffer (which, in our case, isthe second e of the word feases). This token is written on the output stream, and the window is shifted to the right (or, atematively, the input stream is moved to the left) four positions. three positions for the matched string and one position for the next symbol. ir_sid_eastman_easily tease, sea_sick seals. If the backward search yields no match, an 1.2.77 token with zero offset and length and with the unmatched symbol is written. This is also the reason a token has a third component. Tokers with. zero offset and length are common at the beginning of an compression job, when the search buffer is empty or almost empty. The first five steps in encoding our example are the following: sirjeidjeastman] = (0,0,“s”) sir sid eastman,e| (0,0, “i” ) silrusid jeastman,eal (0,0, x”) sir|sidjeastmanjeas = (0,0,4L") sir, |sid_eastman_easil (4,2,*a") LZ78 ‘The 1.278 method (which is sometimes referred to as 1.22) [Ziv and Lempel 78] does not use any search buffer, look-ahead buffer, or sliding window. Instead, there is a dictionary of previously encountered strings. This dictionary starts empty (or almost ery). lst aaa pl Qld Data Compression 3

Seminar Data Compression
No ratings yet
Seminar Data Compression
32 pages
Chapter Four Indexing Structure
100% (2)
Chapter Four Indexing Structure
60 pages
Image Compression
100% (1)
Image Compression
38 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Lecture 10-Print
No ratings yet
Lecture 10-Print
50 pages
ECEVSP L03 Compression2
No ratings yet
ECEVSP L03 Compression2
40 pages
07 DictionaryCoding
No ratings yet
07 DictionaryCoding
37 pages
Compression: Some Slides Courtesy James Allan@umass
No ratings yet
Compression: Some Slides Courtesy James Allan@umass
47 pages
Huffman Coding, RLE, LZW
No ratings yet
Huffman Coding, RLE, LZW
41 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
Lec5 - LZW Compression
No ratings yet
Lec5 - LZW Compression
29 pages
Dictionary Methods: Introduction To Lempel-Ziv Encoding
No ratings yet
Dictionary Methods: Introduction To Lempel-Ziv Encoding
40 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
chapter 7
No ratings yet
chapter 7
70 pages
Ch5 Dictionary Coding
No ratings yet
Ch5 Dictionary Coding
56 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Image Compression-2
No ratings yet
Image Compression-2
13 pages
Arithmetic, Run Length, Compression
No ratings yet
Arithmetic, Run Length, Compression
62 pages
Lossless Compression Techniques-Slides
No ratings yet
Lossless Compression Techniques-Slides
11 pages
Compression: Safeen H. Rasool Assist. Lecturer
No ratings yet
Compression: Safeen H. Rasool Assist. Lecturer
27 pages
Lempel Ziv
No ratings yet
Lempel Ziv
22 pages
LZ77 JensMueller
No ratings yet
LZ77 JensMueller
14 pages
Chapter 3-Part II
100% (1)
Chapter 3-Part II
26 pages
Source Coding Techniques: 1. Huffman Code. 2. Two-Pass Huffman Code. 3. Lemple-Ziv Code
No ratings yet
Source Coding Techniques: 1. Huffman Code. 2. Two-Pass Huffman Code. 3. Lemple-Ziv Code
111 pages
15-583:algorithms in The Real World: Data Compression I - Introduction - Information Theory - Probability Coding
No ratings yet
15-583:algorithms in The Real World: Data Compression I - Introduction - Information Theory - Probability Coding
33 pages
Lemp El Ziv Compression
No ratings yet
Lemp El Ziv Compression
6 pages
Data_Compression__Unit-5 (1)
No ratings yet
Data_Compression__Unit-5 (1)
17 pages
Data Compression Chapter 7
No ratings yet
Data Compression Chapter 7
40 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Chapter Three
No ratings yet
Chapter Three
30 pages
Multimedia Data Compression
No ratings yet
Multimedia Data Compression
31 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Lecture 3-Huffman Coding
No ratings yet
Lecture 3-Huffman Coding
30 pages
Imc14 05 Dictionary Codes
No ratings yet
Imc14 05 Dictionary Codes
31 pages
CH 6
No ratings yet
CH 6
21 pages
Basics of Information Theory
No ratings yet
Basics of Information Theory
21 pages
chapter 2
No ratings yet
chapter 2
13 pages
3.source Coding Data Compression
No ratings yet
3.source Coding Data Compression
25 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Arithmetic Code Discussion and Implementation
No ratings yet
Arithmetic Code Discussion and Implementation
11 pages
LZ78
No ratings yet
LZ78
17 pages
Shannon Fano and Huffman
No ratings yet
Shannon Fano and Huffman
10 pages
Data Compression (Pt2)
No ratings yet
Data Compression (Pt2)
22 pages
Entropy
No ratings yet
Entropy
10 pages
4.ResM Non Stat Coding
No ratings yet
4.ResM Non Stat Coding
9 pages
Multimedia Systems Chapter 7
No ratings yet
Multimedia Systems Chapter 7
21 pages
Class Notes CS 3137 1 LZW Encoding
No ratings yet
Class Notes CS 3137 1 LZW Encoding
5 pages
Information Theory Notes
No ratings yet
Information Theory Notes
4 pages
Lossless Data Compression Algorithm Abraham Lempel Jacob Ziv Terry Welch LZ78
No ratings yet
Lossless Data Compression Algorithm Abraham Lempel Jacob Ziv Terry Welch LZ78
9 pages
Unit 2 - Part 7 Coding Information Sources: 1 Adaptive Variable-Length Codes
No ratings yet
Unit 2 - Part 7 Coding Information Sources: 1 Adaptive Variable-Length Codes
5 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
Lemp El Ziv Report
No ratings yet
Lemp El Ziv Report
17 pages
A New Approach For Compression On Textual Data
No ratings yet
A New Approach For Compression On Textual Data
4 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
DC 8
No ratings yet
DC 8
9 pages
DC 2
No ratings yet
DC 2
7 pages
DC 16
No ratings yet
DC 16
2 pages
DC 3
No ratings yet
DC 3
15 pages

DC 1

Uploaded by

DC 1

Uploaded by

You might also like