0% found this document useful (0 votes)
18 views

13 Greedy-I

Huffman coding assigns variable-length binary codes to characters, with more frequent characters getting shorter codes, allowing for more efficient data compression than fixed-length codes; it works by building a prefix code tree from character frequencies and assigning codes based on paths in the tree from the root to each character; an optimal Huffman code is always a full binary tree that minimizes the number of bits needed to encode a file based on character frequencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

13 Greedy-I

Huffman coding assigns variable-length binary codes to characters, with more frequent characters getting shorter codes, allowing for more efficient data compression than fixed-length codes; it works by building a prefix code tree from character frequencies and assigning codes based on paths in the tree from the root to each character; an optimal Huffman code is always a full binary tree that minimizes the number of bits needed to encode a file based on character frequencies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

CSE 304

Design & Analysis of Algorithms

Greedy Algorithms (Part 1)


Greedy Algorithm
• Greedy algorithms make the choice that looks
best at the moment.
• This locally optimal choice may lead to a globally
optimal solution (i.e. an optimal solution to the
entire problem).

2
When can we use Greedy algorithms?

We can use a greedy algorithm when the following are true:

1) The greedy choice property: A globally optimal solution


can be arrived at by making a locally optimal (greedy) choice.

2) The optimal substructure property: The optimal solution


contains within its optimal solutions to subproblems.

3
Designing Greedy Algorithms
1. Cast the optimization problem as one for which:
• we make a choice and are left with only one subproblem
to solve
2. Prove the GREEDY CHOICE
• that there is always an optimal solution to the original
problem that makes the greedy choice
3. Prove the OPTIMAL SUBSTRUCTURE:
• the greedy choice + an optimal solution to the resulting
subproblem leads to an optimal solution

4
Example: Making Change
• Instance: amount (in cents) to return to customer
• Problem: do this using fewest number of coins
• Example:
– Assume that we have an unlimited number of coins of
various denominations:
– 1c (pennies), 5c (nickels), 10c (dimes), 25c (quarters), 1$
(loonies)
– Objective: Pay out a given sum $5.64 with the
smallest number of coins possible.

5
The Coin Changing Problem
• Assume that we have an unlimited number of coins of various
denominations:
• 1c (pennies), 5c (nickels), 10c (dimes), 25c (quarters), 1$ (loonies)
• Objective: Pay out a given sum S with the smallest number of
coins possible.

• The greedy coin changing algorithm:


• This is a Θ(m) algorithm where m = number of denominations.

while S > 0 do
c := value of the largest coin no larger than S;
num := S / c;
pay out num coins of value c;
S := S - num*c;

6
Example: Making Change
• E.g.:
$5.64 = $2 +$2 + $1 +
.25 + .25 + .10 +
.01 + .01 + .01 +.01

7
Making Change – A big problem
• Example 2: Coins are valued $.30, $.20, $.05,
$.01
– Does not have greedy-choice property, since $.40 is
best made with two $.20’s, but the greedy solution will
pick three coins (which ones?)

8
The Fractional Knapsack Problem
• Given: A set S of n items, with each item i having
– bi - a positive benefit
– wi - a positive weight
• Goal: Choose items with maximum total benefit but with weight at
most W.
• If we are allowed to take fractional amounts, then this is the fractional
knapsack problem.
– In this case, we let xi denote the amount we take of item i

– Objective: maximize

– Constraint:

9
Example
• Given: A set S of n items, with each item i having
– bi - a positive benefit
– wi - a positive weight
• Goal: Choose items with maximum total benefit but with total weight at
most W.

“knapsack”

Solution: P
• 1 ml of 5 50$
Items:
1 2 3 4 5 • 2 ml of 3 40$
• 6 ml of 4 30$
Weight: 4 ml 8 ml 2 ml 6 ml 1 ml • 1 ml of 2 4$
Benefit: $12 $32 $40 $30 $50 10 ml
•Total Profit:124$
Value: 3 4 20 5 50
($ per ml)

10
The Fractional Knapsack Algorithm
• Greedy choice: Keep taking item with highest value (benefit to
weight ratio)
– Since

Algorithm fractionalKnapsack(S, W)
Input: set S of items w/ benefit bi and weight wi; max. weight W
Output: amount xi of each item i to maximize benefit w/ weight at most W

for each item i in S


xi ← 0
vi ← bi / wi {value}
w←0 {total weight}
while w < W
remove item i with highest vi
xi ← min{wi , W - w}
w ← w + min{wi , W - w}
11
The Fractional Knapsack Algorithm
• Running time: Given a collection S of n items, such that each item i
has a benefit bi and weight wi, we can construct a maximum-benefit
subset of S, allowing for fractional amounts, that has a total weight W in
O(nlogn) time.
– Use heap-based priority queue to store S
– Removing the item with the highest value takes O(logn) time
– In the worst case, need to remove all items

12
Huffman Codes
• Widely used technique for data compression

• Assume the data to be a sequence of characters

• Looking for an effective way of storing the data

• Binary character code

– Uniquely represents a character by a binary string

13
Fixed-Length Codes
E.g.: Data file containing 100,000 characters

a b c d e f
Frequency (thousands) 45 13 12 16 9 5

• 3 bits needed
• a = 000, b = 001, c = 010, d = 011, e = 100, f = 101

• Requires: 100,000 ⋅ 3 = 300,000 bits

14
Huffman Codes
• Idea:
– Use the frequencies of occurrence of characters to
build a optimal way of representing each character

a b c d e f
Frequency (thousands) 45 13 12 16 9 5

15
Variable-Length Codes
E.g.: Data file containing 100,000 characters

a b c d e f
Frequency (thousands) 45 13 12 16 9 5

• Assign short codewords to frequent characters and


long codewords to infrequent characters
• a = 0, b = 101, c = 100, d = 111, e = 1101, f = 1100
• (45 ⋅ 1 + 13 ⋅ 3 + 12 ⋅ 3 + 16 ⋅ 3 + 9 ⋅ 4 + 5 ⋅ 4)⋅
1,000
= 224,000 bits
16
Prefix Codes
• Prefix codes:
– Codes for which no codeword is also a prefix of some
other codeword

– Better name would be “prefix-free codes”

• We can achieve optimal data compression using


prefix codes
– We will restrict our attention to prefix codes

17
Encoding with Binary Character Codes
• Encoding
– Concatenate the codewords representing each
character in the file

• E.g.:
– a = 0, b = 101, c = 100, d = 111, e = 1101, f = 1100
– abc = 0 ⋅ 101 ⋅ 100 = 0101100

18
Decoding with Binary Character Codes
• Prefix codes simplify decoding
– No codeword is a prefix of another ⇒ the codeword
that begins an encoded file is unambiguous
• Approach
– Identify the initial codeword
– Translate it back to the original character
– Repeat the process on the remainder of the file
• E.g.:
– a = 0, b = 101, c = 100, d = 111, e = 1101, f = 1100
– 001011101 = 0 ⋅ ⋅ ⋅ = aabe
0 101 1101
19
Prefix Code Representation
• Binary tree whose leaves are the given characters
• Binary codeword
– the path from the root to the character, where 0 means “go to the
left child” and 1 means “go to the right child”
• Length of the codeword
– Length of the path from root to the character leaf (depth of node)
10 10
0 0 1 0 0 1
a:
86 14 55
45 0 1
0 1 0
25 30
58 28 14
0 1 0 1
0 1 0 1 0 1 c: b: d:
a: b: c: d: 12 13 14 16
e: 9 f: 5
45 13 12 16 0 1
f: 5 e: 9 20
Optimal Codes
• An optimal code is always represented by a full
binary tree
– Every non-leaf has two children
– Fixed-length code is not optimal, variable-length is
• How many bits are required to encode a file?
– Let C be the alphabet of characters
– Let f(c) be the frequency of character c
– Let dT(c) be the depth of c’s leaf in the tree T
corresponding to a prefix code

the cost of tree T

21
Constructing a Huffman Code
• A greedy algorithm that constructs an optimal prefix code
called a Huffman code
• Assume that:
– C is a set of n characters
– Each character has a frequency f(c)
– The tree T is built in a bottom up manner
• Idea: f: 5 e: 9
c: b: d: a:
12 13 16 45
– Start with a set of |C| leaves
– At each step, merge the two least frequent objects: the frequency of
the new node = sum of two frequencies
– Use a min-priority queue Q, keyed on f to identify the two least
frequent objects
22
Example
c: b: d: a: c: b: 1 d: a:
f: 5 e: 9
12 13 16 45 12 13 0 4 1 16 45
f: 5 e: 9
1 d: 2 a: 2 3 a:
0 4 1 16 0 5 1 45
0 5 1 0 0 1 45
c: b: c: b: 1 d:
f: 5 e: 9
12 13 12 13 16
0 4 1
f: 5 e: 9
1
a: 5 0 0 1
45 0 5 1 a: 0 5
2 3 45 0 5 1
0 5 1 0 0 1 2 3
c: b: 1 d:
0 5 1 0 0 1
12 13 4 1 16 c: b: 1 d:
0
f: 5 e: 9 12 13 0 4 1 16
f: 5 e: 9 23
Building a Huffman Code
Alg.: HUFFMAN(C) Running time: O(nlgn)
1. n ← ⎜C ⎜
2. Q ← C O(n)
3. for i ← 1 to n – 1
4. do allocate a new node z
5. left[z] ← x ← EXTRACT-MIN(Q)
O(nlgn)
6. right[z] ← y ← EXTRACT-MIN(Q)
7. f[z] ← f[x] + f[y]
8. INSERT (Q, z)
9. return EXTRACT-MIN(Q)
24

You might also like