Priority Queue
Priority Queue
Lecture 19
Priority Queues; Huffman Encoding
slides adapted from Marty Stepp, Hlne Martin, and Daniel Otero
http://www.cs.washington.edu/143/
Prioritization problems
print jobs: The CSE lab printers constantly accept and
complete jobs from all over the building. Suppose we want
them to print faculty jobs before staff before student jobs,
and grad students before undergraduate students, etc.?
ER scheduling: You are in charge of scheduling patients
for treatment in the ER. A gunshot victim should probably
get treatment sooner than that one guy with a sore neck,
regardless of arrival time. How do we always choose the
most urgent case when new patients continue to arrive?
Why can't we solve these problems efficiently with the data
structures we have (list, sorted list, map, set, BST, etc.)?
2
Java's PriorityQueue
class
public class PriorityQueue<E> implements Queue<E>
Method/Constructor
Description
Runtime
PriorityQueue<E>()
O(1)
add(E value)
O(log N )
clear()
O(1)
iterator()
O(1)
peek()
O(1)
remove()
O(log
size()
O(1)
ueue<Strin
g> pq =
new
PriorityQu
eue<String
N>();
)
q.add("Hel
ene");
q.add("Mar
ty");
..
10
20
40
50
80
60
85
90
99 65
6
Zack 0
Sara 4
Tyler 2
7
Reminder:
public class Foo implements Comparable<Foo> {
Homework 7
(Huffman Coding)
File compression
compression: Process of encoding information in fewer bits.
But isn't disk space cheap?
ASCII encoding
ASCII: Mapping from characters to integers (binary
bits).
Maps every possible character to a number ('A' 65)
uses one byte (8 bits) for each character
most text files on your computer are in ASCII format
Char
ASCII value
ASCII (binary)
' '
32
00100000
'a'
97
01100001
'b'
98
01100010
'c'
99
01100011
'e'
101
01100101
'z'
122
01111010
11
Huffman encoding
Huffman encoding: Uses variable lengths for different
characters to take advantage of their relative
frequencies.
Some characters occur more often than others.
If those characters use < 8 bits each, the file will be
smaller.
Char
ASCII value ASCII (binary)
Hypothetical
Other characters need > 8, but that's OK;
they're rare.
Huffman
' '
32
00100000
10
'a'
97
01100001
0001
'b'
98
01100010
01110100
'c'
99
01100011
001100
'e'
101
01100101
1100
'z'
122
01111010
00100011110
12
Huffman's algorithm
The idea: Create a "Huffman Tree"
that will tell us a good binary
representation for each character.
Left means 0, right means 1.
example: 'b' is 10
Huffman compression
1. Count the occurrences of each character in file
{' '=2, 'a'=3, 'b'=3, 'c'=1, EOF=1}
2. Place characters and counts into priority queue
1) Count characters
step 1: count occurrences of characters into a map
example input file contents:
ab ab cab
byte
10
char
'a'
'b'
' '
'a'
'b'
' '
'c'
'a'
'b'
EOF
ASCII
97
98
32
97
98
32
99
97
98
256
binary
01100001
01100010
00100000
01100001
01100010
00100000
01100011
01100001
01100010
N/A
16
17
18
4) Tree to binary
encodings
The Huffman tree tells you the binary encodings to use.
left means 0, right means 1
example: 'b' is 10
What are the binary
encodings of:
EOF,
' ',
'c',
'a'?
'a'
'b'
' '
'a'
'b'
' '
'c'
'a'
'b'
EOF
binary
11
10
00
11
10
00
010
11
10
011
byte
char
a b
binary 11 10 00 11
10 00 010 1
b EOF
1 10 011 00
Decompressing
How do we decompress a file of Huffman-compressed
bits?
useful "prefix property"
No encoding A is the prefix of another encoding B
I.e. never will have x 011 and y 011100110
the algorithm:
Read each bit one at a time from the input.
If the bit is 0, go left in the tree; if it is 1, go right.
If you reach a leaf node, output the character at that leaf
and go back to the tree root.
21
Decompressing
Use the tree to decompress a compressed file with
these
bits:
1011010001101011011
b a c _ a c a
1011010001101011011
Output:
bac aca
22
Use Huffman tree to unzip the input into the given output
23