Eedy Algorithms
Eedy Algorithms
1
List of Algorithms’ Categories (Brief)
Algorithms’ types we will consider include:
Simple Recursive Algorithms
Backtracking Algorithms
Divide and Conquer Algorithms
Dynamic Programming Algorithms
Greedy Algorithms
Branch and bound Algorithms
Brute Force Algorithms
Randomized Algorithms
2
Optimization Problems
3
The Greedy Algorithms
4
Greedy Algorithms:
Many real-world problems are optimization problems for
which we need to find an optimal solution among many
possible candidate solutions. A familiar scenario is the
change-making problem that we often encounter at a cash
register: receiving the fewest numbers of coins to make
change after paying the bill for a purchase. For example,
the purchase is worth $5.27, how many coins and what
coins does a cash register return after paying a $6 bill?
The Make-Change Algorithm:
For a given amount (e.g. $0.73), use as many
quarters ($0.25) as possible without exceeding the
amount. Use as many dimes ($.10) for the remainder,
then use as many nickels ($.05) as possible. Finally, use
the pennies ($.01) for the rest.
5
Example: To make change for the amount x = 67 (cents).
Use q = (x/25) = 2 quarters. The remainder is x – 25q = 17,
which we use d = (17/10) = 1 dime. Then the remainder is
17–10d = 7, therefore we can use n =(7/5)=1 nickel. So, the
remainder is 7 – 5n = 2, which requires p = 2/1 = 2
pennies. Total number of coins used is q + d + n + p = 6.
Note: The above algorithm is optimal i.e. it uses the fewest
number of coins among all possible ways to make change
for a given amount. (This fact can be proven formally).
However, this is dependent on the denominations of the US
currency system. For example, try a system that uses
denominations of 1-cent, 6-cent, and 7-cent coins, and try to
make change for x = 18 cents. The greedy strategy uses two
7-cents and four 1-cents, for a total of 6 coins. However, the
optimal solution is to use three 6-cent coins.
6
A Generic Greedy Algorithm:
(1) Initialize C to be the set of candidate solutions
(2) Initialize a set S = the empty set ∅ (the set is to be
the optimal solution we are constructing).
(3) While C ≠ ∅ and S is (still) not a solution do
(3.1) select x from set C using a greedy strategy
(3.2) delete x from C
(3.3) if {x} ∪ S is a feasible solution, then
S = S ∪ {x} (i.e., add x to set S)
(4) if S is a solution then
return S
(5) else return failure
In general, the greedy algorithm is efficient because it
makes a sequence of (local) decisions and never backtracks.
However, the solution is not always optimal.
7
8
9
10
11
12
Example: Counting Money
Suppose we want to count out a certain amount of
money, using the fewest possible bills and coins
A “Greedy Algorithm” would do the following:
At each step, take the largest possible bill or coin
that does not overshoot
Example: To make $6.39, we can choose:
a $5 bill
a $1 bill, to make $6
a 25¢ coin, to make $6.25
A 10¢ coin, to make $6.35
four 1¢ coins, to make $6.39
For US money, the greedy algorithm always gives
the optimum solution
13
A Failure of the Greedy Algorithm
In some monetary system, for example: “Rupees” come
in Rs. 1, Rs. 2, Rs. 5, and Rs. 10 coins
Using a Greedy Algorithm to count Rs. 15, we would get
One 10 Rs.; Five 1 Rs., for a total of 15 Rs.; requires Six coins
Three 5 Rs. for a total of 15 Rs.; Requires Three coins only
A better solution would be to use one 5 Rs. piece and one
10 Rs piece for a total of Rs. 15
This requires Two coins only
The Greedy Algorithm results in a solution, but not in
finding an optimal solution
14
Example: Text Compression
Given a string of text characters X, efficiently encode X
into a smaller string of characters Y
Saves memory and/or bandwidth
A good approach: Huffman Encoding
Compute frequency f(c) for each character c.
Encode high-frequency characters with short code words
No code word is a prefix for another code
Use an optimal encoding tree to determine code words
15
Huffman Codes:
Suppose we wish to save a text (ASCII) file on the disk or
to transmit it though a network using an encoding scheme
that minimizes the number of bits required. Without
compression, characters are typically encoded by their
ASCII codes with 8 bits per character. We can do better if
we have the freedom to design our own encoding.
Example. Given a text file that uses only 5 different letters
(a, e, i, s, t), the space character, and the newline character.
Since there are 7 different characters, we could use 3 bits
per character because that allows 8 bit patterns ranging
from 000 through 111 (so we still one pattern to spare).
The following table shows the encoding of characters, their
frequencies, and the size of encoded (compressed) file.
16
Character Frequency Code Total bits Code Total bits
a 10 000 30 001 30
e 15 001 45 01 30
i 12 010 36 10 24
s 3 011 9 00000 15
t 4 100 12 0001 16
space 13 101 39 11 26
newline 1 110 3 00001 5
Total 58 174 146
0 1
0 0
0 0 1 0 1 0
1 1 1
Codes 001, 01, 10, 00000,
0001, 11, and 00001
Code 001 Codes 001 Codes 001, Note: each code terminates at a
and 01 01, and 10 leaf node, by the prefix property.
18
We note that the encoded file size is equal to the
total weighted external path lengths if we assign
the frequency to each leaf node. For example,
‘
e’
Total file size = 3*5 + 1*5 + 4*4 +
15 12 13 10*3 + 15*2 + 12*2 + 13*2 = 146,
10 ‘ ‘ which is exactly the total weighted
i’ ’
‘
4 a’
external path lengths.
‘t
3 1 ’
‘ ‘\
s’ n’
x x
We also note that in an optimal
prefix code, each node in the y
tree has either no children or has
two. Thus, the optimal binary Merge x and y,
merge tree algorithm finds the Node x has only reducing total size
optimal code (Huffman code). one child y
19
Encoding Tree Example
A code is a mapping of each character of an alphabet to a binary code-word
A prefix code is a binary code such that no code-word is the prefix of
another code-word
An encoding tree represents a prefix code
Each external node stores a character
The code word of a character is given by the path from the root to the external
node storing the character (0 for a left child and 1 for a right child)
00 010 011 10 11
a b c d e
a d e
b c
20
Encoding Tree Optimization
Given a text string X, we want to find a prefix code for the characters of X that
yields a small encoding for X
Frequent characters should have long code-words
Rare characters should have short code-words
Example
X = abracadabra
T1 encodes X into 29 bits
T2 encodes X into 24 bits
T1 T2
c d b a b r
a r c d
21
Huffman’s Algorithm
Algorithm HuffmanEncoding(X)
Given a string X, Input string X of size n
Huffman’s Algorithm Output optimal encoding trie for X
constructs a prefix code C ← distinctCharacters(X)
the minimizes the size of computeFrequencies(C, X)
the encoding of X Q ← new empty heap
It runs in time for all c ∈ C
O(n + d log d), where n T ← new single-node tree storing c
is the size of X and d is Q.insert(getFrequency(c), T)
the number of distinct while Q.size() > 1
characters of X f1 ← Q.minKey()
A heap-based priority T1 ← Q.removeMin()
queue is used as an f2 ← Q.minKey()
auxiliary structure T2 ← Q.removeMin()
T ← join(T1, T2)
Q.insert(f1 + f2, T)
return Q.removeMin()
22
Example for Huffman Encoding
11
a 6
X = abracadabra
Frequencies 2 4
a b c d r c d b r
5 2 1 1 2
6
2 4
a b c d r
5 2 1 1 2 a c d b r
5
2 2 4
a b c d r a c d b r
5 2 2 5
23
Extended Huffman Tree Example
24
Example for Huffman Encoding
The Huffman encoding algorithm is a greedy algorithm
We always pick the two smallest numbers to combine
Average bits/char:
100 0.22*2 + 0.12*3 +
54 0.24*2 + 0.06*4 +
0.27*2 + 0.09*4
27 A=00
= 2.42
B=100
C=01 The Huffman
46 15
D=1010 algorithm finds an
E=11 optimal solution
22 12 24 6 27 9
F=1011
A B C D E F
25
The Knapsack Problem:
Given n objects each have a Weight wi and a Value vi , and
given a knapsack of total Capacity W. The problem is to
pack the knapsack with these objects in order to maximize
the total value of those objects packed without exceeding the
knapsack’s capacity. More formally, let xi denote the fraction
of the object i to be included in the knapsack, 0 ≤ xi ≤ 1, for 1
≤ i ≤ n. The problem is to find the values for the xi such that
n n
∑ xi wi ≤ W and ∑ xi vi is maximized.
i =1 i =1 n
Note that we may assume that i∑ wi > W otherwise, we
=1
would choose xi = 1 for each i which would be considered as
an obvious optimal solution.
26
There seem to be 3 obvious greedy strategies:
(Max value) Sort the objects from the highest value to the
lowest, then pick them in that order.
(Min weight) Sort the objects from the lowest weight to the
highest, then pick them in that order.
(Max value/weight ratio) Sort the objects based on the value to
weight ratios, from the highest to the lowest, then select.
Example: Given n = 5 objects and a knapsack capacity W =100
as shown in Table I. The three solutions are given in Table II.
Table I
Max vi/wi 1 1 1 0 0.8 164
Table II
27
The Optimal Knapsack Algorithm:
Input: An integer n, positive values wi and vi , for 1 ≤ i ≤ n, and
another positive value W.
Output: n values xi such that 0 ≤ xi ≤ 1 and
n n
∑ xi wi ≤ W and ∑ xi vi is maximized.
i =1 i =1
Algorithm (of time complexity O(n log n))
(1) Sort ‘n’ objects from large to small based on the ratios vi/wi . Assume
the arrays w[1..n] and v[1..n]. Store the weights and values after sorting.
(2) initialize array x[1..n] to zeros.
(3) weight = 0; i = 1
(4) while (i ≤ n and weight < W) do
(4.1) if weight + w[i] ≤ W then x[i] = 1
(4.2) else x[i] = (W – weight) / w[i]
(4.3) weight = weight + x[i] * w[i]
(4.4) i++ 28
Fractional Knapsack Problem
Given: A set S of n items, with each item i having
bi - a positive benefit
wi - a positive weight
Goal: Choose items with maximum total benefit but with weight at
most W.
If we are allowed to take fractional amounts (broken items), then
this is known as the Fractional Knapsack Problem.
In this case, we let xi denote the amount we take of item i
Objective: maximize ∑b (x / w )
i∈S
i i i
Constraint: ∑x
i∈S
i ≤W
29
Example
Given: A set S of n items, with each item i having
bi - a positive benefit
wi - a positive weight
Goal: Choose items with maximum total benefit but with
weight at most W.
“Knapsack”
Solution:
• 1 ml of 5
Items: • 2 ml of 3
1 2 3 4 5
• 6 ml of 4
Weight: 4 ml 8 ml 2 ml 6 ml 1 ml • 1 ml of 2
Benefit: $12 $32 $40 $30 $50 10 ml
Value: 3 4 20 5 50
($ per ml)
30
Fractional Knapsack Algorithm
Greedy choice: Keep taking
item with highest value
Algorithm fractionalKnapsack(S, W)
(benefit to weight ratio)
Input: set S of items w/ benefit bi
Since ∑ bi ( xi / wi ) = ∑ (bi / wi ) xi and weight wi; max. weight W
i∈S i∈S
Run time: O(n log n). Why? Output: amount xi of each item i
to maximize benefit w/ weight
Correctness: Suppose there is a at most W
better solution for each item i in S
there is an item i with higher xi ← 0
value than a chosen item j, but vi ← bi / wi {value}
xi<wi, xj>0 and vi<vj w←0 {total weight}
If we substitute some i with j, we while w < W
get a better solution remove item i w/ highest vi
How much of i: min{wi-xi, xj} xi ← min{wi , W - w}
Thus, there is no better solution w ← w + min{wi , W - w}
than the greedy one
31
Task Scheduling
Given: a set T of n tasks, each having:
A start time, si
A finish time, fi (where si < fi)
Goal: Perform all the tasks using a minimum number of
“machines.”
Machine 3
Machine 2
Machine 1
1 2 3 4 5 6 7 8 9
32
Task Scheduling Algorithm
Greedy choice: consider tasks by their
start time and use as few machines as
possible with this order. Algorithm taskSchedule(T)
Run time: O(n log n). Why?
Input: set T of tasks w/ start time si
and finish time fi
Correctness: Suppose there is a better Output: non-conflicting schedule
schedule. with minimum number of machines
We can use k-1 machines m←0 {no. of machines}
The algorithm uses k while T is not empty
Let i be first task scheduled on remove task i w/ smallest si
machine k if there’s a machine j for i then
Machine i must conflict with k-1 schedule i on machine j
other tasks else
But that means there is no non- m←m+1
conflicting schedule using k-1 schedule i on machine m
machines
33
Example for Task Scheduling
Given: a set T of n tasks, each having:
A start time, si
A finish time, fi (where si < fi)
[1,4], [1,3], [2,5], [3,7], [4,7], [6,9], [7,8] (ordered by start)
Goal: Perform all tasks on min. number of machines
Machine 3
Machine 2
Machine 1
1 2 3 4 5 6 7 8 9
34
Job Scheduling Problem
We have to run nine jobs, with running times of 3, 5, 6, 10, 11,
14, 15, 18, and 20 minutes
We have three processors on which we can run these jobs
We decide to do the longest-running jobs first, on whatever
processor is available
P1 20 10 3
P2 18 11 6
P3 15 14 5
P1 3 10 15
P2 5 11 18
P3 6 14 20
That wasn’t such a good idea; time to completion is now
6 + 14 + 20 = 40 minutes
Note, however, that the greedy algorithm itself is fast
All we had to do at each stage was pick the minimum or maximum
36
An Optimum Solution
Better Solutions do exist:
P1 20 14
P2 18 11 5
P3 15 10 6 3
This solution is clearly optimal (why?)
Clearly, there are other optimal solutions (why?)
How do we find such a solution?
One way: Try all possible assignments of jobs to processors
Unfortunately, this approach can take exponential time
37
Interval Scheduling
38
Interval Scheduling
39
Interval Scheduling
40
Interval Scheduling
41
Interval Partitioning
42
Interval Partitioning
43
Interval Partitioning
44
Interval Partitioning
45
Interval Partitioning
46
Interval Partitioning
47
Greedy Strategies Applied to Graph Problems:
We first review some notations and terms about graphs.
A graph consists of vertices (nodes) and edges (arcs, links),
in which each edge “connects” two vertices (not necessarily
distinct). More formally, a graph G = (V, E), where V and
E denote the sets of vertices and edges, respectively.
48
Directed Graphs vs. Un-directed Graphs:
If every edge has an orientation, e.g., an edge starting from
node x terminating at node y, the graph is called a directed
graph, or digraph for short. If all edges have no orientation,
the graph is called an undirected graph, or simply, a graph.
When there are no parallel edges (two edges that have
identical end points), we could identify an edge with its two
end points, such as edge (1,2), or edge (3,3). In an undirected
graph, edge (1,2) is the same as edge (2,1). We will assume
no parallel edges unless otherwise stated.
1
a b A directed graph. Edges c and d
2 3 are parallel (directed) edges. Some
d directed paths are ad, ebac.
c
e
4
49
Both directed and undirected graphs appear often and
naturally in many scientific (call graphs in program
analysis), business (query trees, entity-relation diagrams in
databases), and engineering (CAD design) applications. The
simplest data structure for representing graphs and digraphs
is using 2-dimensional arrays. Suppose G = (V, E), and |V| =
n. Declare an array T[1..n][1..n] so that T[i][j] = 1 if there is
an edge (i, j) ∈ E; 0 otherwise. (Note that in an undirected
graph, edges (i, j) and (j, i) refer to the same edge.)
j
1 2 3 4
1
1
0 1 0 0 A 2-dimensional
0 1
2
0 0
array for the
2 3 i 1 0 1 0
3
digraph, called the
4 0 1 0 0
adjacency matrix.
4
50
Sometimes, edges of a graph or digraph are given a
positive weight or cost value. In that case, the adjacency
matrix can easily modified so that T[i][j] = the weight of
edge (i, j); 0 if there is no edge (i, j). Since the adjacency
matrix may contain many zeros (when the graph has few
edges, known as sparse), a space-efficient representation
uses linked lists representing the edges, known as the
adjacency list representation.
1 2
1
2 4
2 3 3 3 1
4 2
4
The adjacency lists for the digraph, which
can store edge weights by adding another
field in the list nodes.
51
Graph (and Digraph) Traversal Techniques:
Given a (directed) graph G = (V, E), determine all nodes that
are connected from a given node v via a (directed) path.
The are essentially two graph traversal algorithms, known as
Breadth-first search (BFS) and depth-first search (DFS), both
of which can be implemented efficiently.
BFS: From node v, visit each of its neighboring nodes in
sequence, then visit their neighbors, etc., while avoiding
repeated visits.
DFS: From node v, visit its first neighboring node and all its
neighbors using recursion, then visit node v’s second neighbor
applying the same procedure, until all v’s neighbors are
visited, while avoiding repeated visits.
52
Breadth-First Search (BFS):
BFS(v) // visit all nodes reachable from node v
(1) Create an empty FIFO queue Q, add node v to Q
(2) Create a Boolean array visited[1..n], initialize all values
to false except for visited[v] to true
(3) while Q is not empty
(3.1) delete a node w from Q
(3.2) for each node z adjacent from node w
if visited[z] is false then
add node z to Q and set visited[z] to true
The time complexity is O(n+e) 1
with n nodes and e edges, if the Node search order
adjacency lists are used. This is 2 4 starting with node
because in the worst case, each 1, including two
node is added once to the queue 5 nodes not reached
(O(n) part), and each of its 3 6
neighbors gets considered once
(O(e) part).
53
Depth-First Search (DFS):
(1) Create a Boolean array visited[1..n], initialize all values
to false except for visited[v] to true
(2) Call DFS(v) to visit all nodes reachable via a path
DFS(v)
for each neighboring nodes w of v do
if visited[w] is false then
set visited[w] to true; call DFS(w) // recursive call
1 Node search order
The algorithm’s time
2 5 starting with node 1,
complexity is also O(n+e)
including two nodes
using the same reasoning as 4 not reached
in the BFS algorithm. 3 6
54
The Minimum Spanning Tree (MST) Problem:
Given a weighted (undirected) graph G = (V, E), where
each edge e has a positive weight w(e). A spanning tree of
G is a tree (connected graph without cycles, or circuits)
which has V as its vertex set, i.e., the tree connects all
vertices of the graph G. If |V| = n, then the tree has n – 1
edges (this is a fact which can be proved by induction). A
minimum spanning tree of G is a spanning tree that has the
minimum total edge weight.
1 1
3 6 3 6
8
2 3 2 A minimum
3
5 spanning tree (of 4
4 4
7 edges), weight = 3
5 4 5 4 + 2 + 4 + 6 = 15.
2 2
A weighted graph of no
parallel edges or self-loops
55
Minimum Spanning Tree (MST)
A minimum spanning tree is a least-cost subset of the edges of a
graph that connects all the nodes
Start by picking any node and adding it to the tree
Repeatedly: Pick any least-cost edge from a node in the tree to a
node not in the tree, and add the edge and new node to the tree
Stop when all nodes have been added to the tree
Minimum spanning tree: At each new node, must include new edges and
keep them sorted, which is O(n log n) overall
Therefore, MST is O(n log n) + O(n) = O(n log n)
58
Greedy Analysis Strategies
59
Other Greedy Algorithms
Dijkstra’s Algo for finding the shortest path in a graph
Always takes the shortest edge (path) connecting a known node
to an unknown node
Kruskal’s Algo for finding a minimum-cost spanning tree
Always tries the lowest-cost remaining edge
Prim’s Algo for finding a minimum-cost spanning tree
Always takes the lowest-cost edge between nodes in the
spanning tree and nodes not yet in the spanning tree
60
Dijkstra’s Shortest-path Algorithm
Dijkstra’s algorithm finds the shortest paths from a given node to
all other nodes in a graph
Initially,
Mark the given node as known (path length is zero)
For each out-edge, set the distance in each neighboring node equal to the cost
(length) of the out-edge, and set its predecessor to the initially given node
Repeatedly (until all nodes are known),
Find an unknown node containing the smallest distance
Mark the new node as known
For each node adjacent to the new node, examine its neighbors to see whether the
estimated distance can be reduced (distance to known node + cost of out-edge)
If so, also reset the predecessor of the new node
61
Analysis of Dijkstra’s Algorithm I
Assume that the average out-degree of a node is some
constant k
Initially,
Mark the given node as known (path length is zero)
This takes O(1) (constant) time
For each out-edge, set the distance in each neighboring node equal to
the cost (length) of the out-edge, and set its predecessor to the initially
given node
If each node refers to a list of k adjacent node/edge pairs, this
62
Analysis of Dijkstra’s Algorithm II
Repeatedly (until all nodes are known), (n times)
Find an unknown node containing the smallest distance
Probably the best way to do this is to put the unknown nodes into a
priority queue; this takes k * O(log n) time each time a new node is
marked “known” (and this happens n times)
Mark the new node as known -- O(1) time
For each node adjacent to the new node, examine its neighbors to
see whether their estimated distance can be reduced (distance to
known node plus cost of out-edge)
If so, also reset the predecessor of the new node
There are k adjacent nodes (on average), operation requires constant
time at each, therefore O(k) (constant) time
Combining all the parts, we get:
O(1) + n*(k*O(log n)+O(k)), that is, O(nk log n) time
63