0% found this document useful (0 votes)
9 views

Lecture15

Uploaded by

Gezae Gebredingl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture15

Uploaded by

Gezae Gebredingl
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

Lecture 15

Minimum Spanning Trees


Announcements
• HW5 due Friday
• HW6 released Friday
Last time
• Greedy algorithms
• Make a series of choices.
• Choose this activity, then that one, ..
• Never backtrack.
• Show that, at each step, your choice does not rule out
success.
• At every step, there exists an optimal solution consistent with
the choices we’ve made so far.
• At the end of the day:
• you’ve built only one solution,
• never having ruled out success,
• so your solution must be correct.
Today
• Greedy algorithms for Minimum Spanning Tree.

• Agenda:
1. What is a Minimum Spanning Tree?
2. Short break to introduce some graph theory tools
3. Prim’s algorithm
4. Kruskal’s algorithm
Minimum Spanning Tree
Say we have an undirected weighted graph

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
A tree is a
1 2 connected graph
H G F
with no cycles!

A spanning tree is a tree that connects all of the vertices.


Minimum Spanning Tree
Say we have an undirected weighted graph
The cost of a
This is a
spanning tree is 8 7 spanning tree.
the sum of the B C D
weights on the
edges. 4 9
2
11 4
A I 14 E

7 6
8 10
A tree is a
This tree 1 2 connected graph
has cost 67 H G F
with no cycles!

A spanning tree is a tree that connects all of the vertices.


Minimum Spanning Tree
Say we have an undirected weighted graph
This is also a
8 7 spanning tree.
B C D
It has cost 37
4 9
2
11 4
A I 14 E

7 6
8 10
A tree is a
1 2 connected graph
H G F
with no cycles!

A spanning tree is a tree that connects all of the vertices.


Minimum Spanning Tree
Say we have an undirected weighted graph

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F

minimum of minimal cost


A spanning tree is a tree that connects all of the vertices.
Minimum Spanning Tree
Say we have an undirected weighted graph
This is a minimum
8 7 spanning tree.
B C D
It has cost 37
4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F

minimum of minimal cost


A spanning tree is a tree that connects all of the vertices.
Why MSTs?
• Network design
• Connecting cities with roads/electricity/telephone/…
• cluster analysis
• eg, genetic distance
• image processing
• eg, image segmentation
• Useful primitive
• for other graph algs

Figure 2: Fully parsimonious minimal spanning tree of 933 SNPs for 282 isolates of Y. pestis colored by location.
Morelli et al. Nature genetics 2010
How to find an MST?
• Today we’ll see two greedy algorithms.
• In order to prove that these greedy algorithms work, we’ll
need to show something like:

Suppose that our choices so far


haven’t ruled out success.
Then the next greedy choice that we make
also won’t rule out success.

• Here, success means finding an MST.


Let’s brainstorm
• How would we design a greedy algorithm?
8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Brief aside
for a discussion of cuts in graphs!
Cuts in graphs
• A cut is a partition of the vertices into two parts:

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F

This is the cut “{A,B,D,E} and {C,I,H,G,F}”


Let A be a set of edges in G
• We say a cut respects A if no edges in A cross the cut.
• An edge crossing a cut is called light if it has the
smallest weight of any edge crossing the cut.

8
B C D
7
4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F

A is the thick orange edges


Let A be a set of edges in G
• We say a cut respects A if no edges in A cross the cut.
• An edge crossing a cut is called light if it has the
smallest weight of any edge crossing the cut.
This edge is light
8
B C D
7
4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F

A is the thick orange edges


Lemma
• Let A be a set of edges, and consider a cut that respects A.
• Suppose there is an MST containing A.
• Let (u,v) be a light edge.
• Then there is an MST containing A ∪ {(u,v)}

This edge is light

B 8 C D
7
4 9
2
11 4
A I 14 E

8 7 6
10
H 1 G 2 F

A is the thick orange edges


Lemma
• Let A be a set of edges, and consider a cut that respects A.
• Suppose there is an MST containing A.
• Let (u,v) be a light edge.
• Then there is an MST containing A ∪ {(u,v)}
We can safely add
This is precisely the this edge to the tree
sort of statement we
need for a greedy B 8 C D
algorithm: 7
4 9
If we haven’t ruled
2
11 4
out the possibility of A I 14 E
success so far, then
adding a light edge 7 6
still won’t rule it out. 8 10
H 1 G 2 F

A is the thick orange edges


Proof of Lemma
• Assume that we have:
• a cut that respects A

x y

u a

v b
Proof of Lemma
• Assume that we have:
• a cut that respects A
• A is part of some MST T.
• Say that (u,v) is light.
x y
• lowest cost crossing the cut

u a

v b
Claim: Adding any additional edge to
a spanning tree will create a cycle.

Proof of Lemma Proof: Both endpoints are already in


the tree and connected to each other.

• Assume that we have:


• a cut that respects A
• A is part of some MST T.
• Say that (u,v) is light.
x y
• lowest cost crossing the cut
• But (u,v) is not in T.
• So adding (u,v) to T
will make a cycle. u a

v b
Claim: Adding any additional edge to
a spanning tree will create a cycle.

Proof of Lemma Proof: Both endpoints are already in


the tree and connected to each other.
• Assume that we have:
• a cut that respects A
• A is part of some MST T.
• Say that (u,v) is light.
x y
• lowest cost crossing the cut
• But (u,v) is not in T.
• So adding (u,v) to T
will make a cycle. u a

• So there is at least one


other edge in this cycle
crossing the cut. v b
• call it (x,y)
Proof of Lemma ctd.
• Consider swapping (u,v) for (x,y) in T.
• Call the resulting tree T’.

x y

u a

v b
Proof of Lemma ctd.
• Consider swapping (u,v) for (x,y) in T.
• Call the resulting tree T’.
• Claim: T’ is still an MST.
• It is still a tree:
x y
• we deleted (x,y)
• It has cost at most that of T
• because (u,v) was light.
• T had minimal cost.
• So T’ does too. u a

• So T’ is an MST
containing (u,v).
• This is what we wanted. v b
Lemma
• Let A be a set of edges, and consider a cut that respects A.
• Suppose there is an MST containing A.
• Let (u,v) be a light edge.
• Then there is an MST containing A ∪ {(u,v)}

This edge is light

B 8 C D
7
4 9
2
11 4
A I 14 E

8 7 6
10
H 1 G 2 F

A is the thick orange edges


End aside
Back to MSTs!
Back to MSTs
• How do we find one?
• Today we’ll see two greedy algorithms.

• The strategy:
• Make a series of choices, adding edges to the tree.
• Show that each edge we add is safe to add:
• we do not rule out the possibility of success
• we will choose light edges crossing cuts and use the Lemma.
• Keep going until we have an MST.
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Idea 1
Start growing a tree, greedily add the shortest edge
we can to grow the tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
We’ve discovered Prim’s algorithm!
• slowPrim( G = (V,E), starting vertex s ):
• Let (s,u) be the lightest edge coming out of s.
• MST = { (s,u) }
n iterations of this
• verticesVisited = { s, u } while loop.
• while |verticesVisited| < |V|:
• find the lightest edge (x,v) in E so that: Maybe take time
• x is in verticesVisited m to go through all
• v is not in verticesVisited the edges and find
• add (x,v) to MST the lightest.
• add v to verticesVisited
• return MST
Naively, the running time is O(nm):
• For each of n-1 iterations of the while loop:
• Maybe go through all the edges.
Two questions
1. Does it work?
• That is, does it actually return a MST?

2. How do we actually implement this?


• the pseudocode above says “slowPrim”…
Does it work?
• We need to show that our greedy choices don’t
rule out success.
• That is, at every step:
• There exists an MST that contains all of the edges we
have added so far.

• Now it is time to use our lemma!


Lemma
• Let A be a set of edges, and consider a cut that respects A.
• Suppose there is an MST containing A.
• Let (u,v) be a light edge.
• Then there is an MST containing A ∪ {(u,v)}

This edge is light

B 8 C D
7
4 9
2
11 4
A I 14 E

8 7 6
10
H 1 G 2 F

A is the thick orange edges


Suppose we are partway through Prim
• Assume that our choices A so far are safe.
• they don’t rule out success
• Consider the cut {visited, unvisited}
• A respects this cut.

A is the set of
edges selected so far.
8 7
B C D
4 9
11 2
4
A I 14 E

8 7 6
2 10
H G F
1
Suppose we are partway through Prim
• Assume that our choices A so far are safe.
• they don’t rule out success
• Consider the cut {visited, unvisited}
• A respects this cut.
• The edge we add next is a light edge.
• Least weight of any edge crossing the cut.
A is the set of
• By the Lemma, edges selected so far.

this edge is safe. 8 7


B C D
• it also doesn’t 4 9
rule out 11 2
4
success. A I 14 E

8 7 6
2 10
add this one next H G F
1
Hooray!
• Our greedy choices don’t rule out success.

• This is enough (along with an argument by


induction) to guarantee correctness of Prim’s
algorithm.
This is what we needed
• Inductive hypothesis:
• After adding the t’th edge, there exists an MST with the
edges added so far.
• Base case:
• After adding the 0’th edge, there exists an MST with the
edges added so far. YEP.
• Inductive step:
• If the inductive hypothesis holds for t (aka, the choices so far
are safe), then it holds for t+1 (aka, the next edge we add is
safe).
• That’s what we just showed.
• Conclusion:
• After adding the n-1’st edge, there exists an MST with the
edges added so far.
• At this point we have a spanning tree, so it better be minimal.
Two questions
1. Does it work?
• That is, does it actually return a MST?
• Yes!

2. How do we actually implement this?


• the pseudocode above says “slowPrim”…
How do we actually implement this?
• Each vertex keeps:
• the distance from itself to the growing spanning tree
if you can get there in one edge.
• how to get there.

I’m 7 away.
C is the closest.
B
8 C
7 D

4 9
2
11 4
A I 14 E

8 7 6
10 I can’t get to the
H
1 G 2 F tree in one edge
How do we actually implement this?
• Each vertex keeps:
• the distance from itself to the growing spanning tree
if you can get there in one edge.
• how to get there.
• Choose the closest vertex, add it.
I’m 7 away.
C is the closest.
B
8 C
7 D

4 9
2
11 4
A I 14 E

8 7 6
10 I can’t get to the
H
1 G 2 F tree in one edge
How do we actually implement this?
• Each vertex keeps:
• the distance from itself to the growing spanning tree
if you can get there in one edge.
• how to get there.
• Choose the closest vertex, add it.
I’m 7 away.
C is the closest.
B
8 C
7 D

4 9
2
11 4
A I 14 E

8 7 6
10 I can’t get to the
H
1 G 2 F tree in one edge
How do we actually implement this?
• Each vertex keeps:
• the distance from itself to the growing spanning tree
if you can get there in one edge.
• how to get there.
• Choose the closest vertex, add it.
• Update the stored info. I’m 7 away.
C is the closest.
B
8 C
7 D

4 9
2
11 4
A I 14 E

8 7 6
10 I’m 10 away.
H
1 G 2 F F is the closest.
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
k[x] is the distance of x
k[x] from the growing tree

a b p[b] = a, meaning that


a was the vertex that
k[b] comes from.

∞ ∞ ∞
8 7
B C D

4 9
2
4 ∞
11
A I ∞
14 E
0
7 6
8 10
1 2
H G F
∞ ∞ ∞
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree

a b p[b] = a, meaning that


a was the vertex that
k[b] comes from.

∞ ∞ ∞
8 7
B C D

4 9
2
4 ∞
11
A I ∞
14 E
0
7 6
8 10
1 2
H G F
∞ ∞ ∞
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
k[b] comes from.

4 ∞ ∞
8 7
B C D

4 9
2
4 ∞
11
A I ∞
14 E
0
7 6
8 10
1 2
H G F
8 ∞ ∞
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 ∞ ∞
8 7
B C D

4 9
2
4 ∞
11
A I ∞
14 E
0
7 6
8 10
1 2
H G F
8 ∞ ∞
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 ∞ ∞
8 7
B C D

4 9
2
4 ∞
11
A I ∞
14 E
0
7 6
8 10
1 2
H G F
8 ∞ ∞
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 ∞
8 7
B C D

4 9
2
4 ∞
11
A I ∞
14 E
0
7 6
8 10
1 2
H G F
8 ∞ ∞
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 ∞
8 7
B C D

4 9
2
4 ∞
11
A I ∞
14 E
0
7 6
8 10
1 2
H G F
8 ∞ ∞
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 ∞
8 7
B C D

4 9
2
4 ∞
11
A I ∞
14 E
0
7 6
8 10
1 2
H G F
8 ∞ ∞
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 ∞
11
A I 2
14 E
0
7 6
8 10
1 2
H G F
8 ∞ 4
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 ∞
11
A I 2
14 E
0
7 6
8 10
1 2
H G F
8 ∞ 4
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 ∞
11
A I 2
14 E
0
7 6
8 10
1 2
H G F
8 ∞ 4
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 ∞
11
A I 2
14 E
0
7 6
8 10
1 2
H G F
7 6 4
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 ∞
11
A I 2
14 E
0
7 6
8 10
1 2
H G F
7 6 4
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 ∞
11
A I 2
14 E
0
7 6
8 10
1 2
H G F
7 6 4
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 10
11
A I 2
14 E
0
7 6
8 10
1 2
H G F
7 2 4
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 10
11
A I 2
14 E
0
7 6
8 10
1 2
H G F
7 2 4
Efficient implementation x
x
Can’t reach x yet
x is “active”
Every vertex has a key and a parent x Can reach x
Until all the vertices are reached:
• Activate the unreached vertex u with the smallest key. k[x] is the distance of x
k[x] from the growing tree
• for each of u’s neighbors v:
• k[v] = min( k[v], weight(u,v) ) p[b] = a, meaning that
a b
• if k[v] updated, p[v] = u a was the vertex that
• Mark u as reached, and add (p[u],u) to MST. k[b] comes from.

4 8 7
8 7
B C D

4 9
2
4 10
11
A I 2
14 E
0
7 6
8 10

7
H etc.1
G
2
2 F
4
This should look pretty familiar
• Very similar to Dijkstra’s algorithm!
• Differences:
1. Keep track of p[v] in order to return a tree at the end
• But Dijkstra’s can do that too, that’s not a big difference.

2. Instead of d[v] which we update by


• d[v] = min( d[v], d[u] + w(u,v) )
we keep k[v] which we update by
• k[v] = min( k[v], w(u,v) )
• To see the difference, consider: U
2 2
S T
3
One thing that is similar:
Running time
• Exactly the same as Dijkstra:
• O(mlog(n)) using a Red-Black tree as a priority queue.
• O(m + nlog(n)) if we use a Fibonacci Heap*.

*See CS166
Two questions
1. Does it work?
• That is, does it actually return a MST?
• Yes!

2. How do we actually implement this?


• the pseudocode above says “slowPrim”…
• Implement it basically the same way
we’d implement Dijkstra!
What have we learned?
• Prim’s algorithm greedily grows a tree
• smells a lot like Dijkstra’s algorithm
• It finds a Minimum Spanning Tree in time O(mlog(n))
• if we implement it with a Red-Black Tree

• To prove it worked, we followed the same recipe for


greedy algorithms we saw last time.
• Show that, at every step, we don’t rule out success.
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?
That won’t
cause a cycle
8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?
That won’t
cause a cycle
8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?
That won’t
cause a cycle
8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?
That won’t
cause a cycle
8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
That’s not the only greedy algorithm
what if we just always take the cheapest edge?
whether or not it’s connected to what we have so far?
That won’t
cause a cycle
8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
We’ve discovered Kruskal’s algorithm!
• slowKruskal(G = (V,E)):
• Sort the edges in E by non-decreasing weight.
• MST = {}
m iterations through this loop
• for e in E (in sorted order):
• if adding e to MST won’t cause a cycle:
• add e to MST. How do we check this?
• return MST

How would you Naively, the running time is ???:


figure out if added e • For each of m iterations of the for loop:
would make a cycle
• Check if adding e would cause a cycle…
in this algorithm?
Two questions
1. Does it work?
• That is, does it actually return a MST?

Let’s do this
2. How do we actually implement this? one first
• the pseudocode above says “slowKruskal”…
At each step of Kruskal’s, A forest is a
collection of
we are maintaining a forest. disjoint trees

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
At each step of Kruskal’s, A forest is a
collection of
we are maintaining a forest. disjoint trees

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
At each step of Kruskal’s, A forest is a
collection of
we are maintaining a forest. disjoint trees

When we add an edge, we merge two trees:

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
At each step of Kruskal’s, A forest is a
collection of
we are maintaining a forest. disjoint trees

When we add an edge, we merge two trees:

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
At each step of Kruskal’s, A forest is a
collection of
we are maintaining a forest. disjoint trees

When we add an edge, we merge two trees:

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F

We never add an edge within a tree since that would create a cycle.
Keep the trees in a special data structure

“treehouse”?
Union-find data structure
also called disjoint-set data structure
• Used for storing collections of sets
• Supports:
• makeSet(u): create a set {u}
• find(u): return the set that u is in
• union(u,v): merge the set that u is in with the set that v is in.

makeSet(x) x
makeSet(y) y
makeSet(z)

union(x,y)
z
Union-find data structure
also called disjoint-set data structure
• Used for storing collections of sets
• Supports:
• makeSet(u): create a set {u}
• find(u): return the set that u is in
• union(u,v): merge the set that u is in with the set that v is in.

makeSet(x) x y
makeSet(y)
makeSet(z)

union(x,y)
z
Union-find data structure
also called disjoint-set data structure
• Used for storing collections of sets
• Supports:
• makeSet(u): create a set {u}
• find(u): return the set that u is in
• union(u,v): merge the set that u is in with the set that v is in.

makeSet(x) x y
makeSet(y)
makeSet(z)

union(x,y)

find(x)
z
Kruskal pseudo-code
• kruskal(G = (V,E)):
• Sort E by weight in non-decreasing order
• MST = {} // initialize an empty tree
• for v in V:
• makeSet(v) // put each vertex in its own tree in the forest
• for (u,v) in E: // go through the edges in sorted order
• if find(u) != find(v): // if u and v are not in the same tree
• add (u,v) to MST
• union(u,v) // merge u’s tree with v’s tree
• return MST
Once more…
To start, every vertex is in it’s own tree.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Once more…
Then start merging.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Once more…
Then start merging.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Once more…
Then start merging.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Once more…
Then start merging.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Once more…
Then start merging.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Once more…
Then start merging.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Once more…
Then start merging.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Stop when we have one big tree!
Once more…
Then start merging.

8 7
B C D

4 9
2
11 4
A I 14 E

7 6
8 10
1 2
H G F
Running time
• Sorting the edges takes O(m log(n))
• In practice, if the weights are integers we can use
radixSort and take time O(m)
• For the rest: In practice, each of
• n calls to makeSet makeSet, find, and union
• put each vertex in its own set run in constant time*
• 2m calls to find
• for each edge, find its endpoints
• n calls to union
• we will never add more than n-1 edges to the tree,
• so we will never call union more than n-1 times.
• Total running time:
• Worst-case O(mlog(n)), just like Prim.
• Closer to O(m) if you can do radixSort

*technically, they run in amortized time O(𝛼(𝑛)), where 𝛼(𝑛) is the inverse Ackerman function.
𝛼 𝑛 ≤ 4 provided that n is smaller than the number of atoms in the universe.
Two questions
1. Does it work?
Now that we
• That is, does it actually return a MST? understand this
“tree-merging” view,
let’s do this one.

2. How do we actually implement this?


• the pseudocode above says “slowKruskal”…
• Worst-case running time O(mlog(n)) using a
union-find data structure.
Does it work?
• We need to show that our greedy choices don’t
rule out success.
• That is, at every step:
• There exists an MST that contains all of the edges we
have added so far.

• Now it is time to use our lemma!


again!
Lemma
• Let A be a set of edges, and consider a cut that respects A.
• Suppose there is an MST containing A.
• Let (u,v) be a light edge.
• Then there is an MST containing A ∪ {(u,v)}

This edge is light

B 8 C D
7
4 9
2
11 4
A I 14 E

8 7 6
10
H 1 G 2 F

A is the thick orange edges


Suppose we are partway through Kruskal
• Assume that our choices A so far are safe.
• they don’t rule out success
• The next edge we add will merge two trees, T1, T2

B 8 C 7 D

4 9
2
11 4
A I 14 E

8 7 6 10
A is the set of
H 1 G 2 F
edges selected so far.
Suppose we are partway through Kruskal
• Assume that our choices A so far are safe.
• they don’t rule out success
• The next edge we add will merge two trees, T1, T2
• Consider the cut {T1, V – T1}.
• A respects this cut. This is the
• Our new edge is light for the cut next edge

T1

B 8 C 7 D

4 9
2
11 4
A I 14 E

8 7 6 10
A is the set of
H 1 G 2 F
edges selected so far. T2
Suppose we are partway through Kruskal
• Assume that our choices A so far are safe.
• they don’t rule out success
• The next edge we add will merge two trees, T1, T2
• Consider the cut {T1, V – T1}.
• A respects this cut. This is the
• Our new edge is light for the cut next edge

T1
• By the Lemma, 8 7
B C D
this edge is safe.
• it also doesn’t 4 9
2
rule out 11 4
A I 14 E
success.
8 7 6 10
A is the set of
H 1 G 2 F
edges selected so far. T2
Hooray!
• Our greedy choices don’t rule out success.

• This is enough (along with an argument by


induction) to guarantee correctness of Kruskal’s
algorithm.
This is what we needed This is exactly the
same slide that we
had for Prim’s
algorithm.
• Inductive hypothesis:
• After adding the t’th edge, there exists an MST with the
edges added so far.
• Base case:
• After adding the 0’th edge, there exists an MST with the
edges added so far. YEP.
• Inductive step:
• If the inductive hypothesis holds for t (aka, the choices so far
are safe), then it holds for t+1 (aka, the next edge we add is
safe).
• That’s what we just showed.
• Conclusion:
• After adding the n-1’st edge, there exists an MST with the
edges added so far.
• At this point we have a spanning tree, so it better be minimal.
Two questions
1. Does it work?
• That is, does it actually return a MST?
• Yes

2. How do we actually implement this?


• the pseudocode above says “slowKruskal”…
• Using a union-find data structure!
What have we learned?
• Kruskal’s algorithm greedily grows a forest
• It finds a Minimum Spanning Tree in time O(mlog(n))
• if we implement it with a Union-Find data structure
• if the edge weights are reasonably-sized integers and we ignore the inverse
Ackerman function, basically O(m) in practice.

• To prove it worked, we followed the same recipe for


greedy algorithms we saw last time.
• Show that, at every step, we don’t rule out success.
Compare and contrast
• Prim:
• Grows a tree.
• Time O(mlog(n)) with a red-black tree
• Time O(m + nlog(n)) with a Fibonacci heap
• Kruskal:
• Grows a forest.
• Time O(mlog(n)) with a union-find data structure
• If you can do radixSort on the edge weights, morally O(m)
Both Prim and Kruskal
• Greedy algorithms for MST.
• Similar reasoning:
• Optimal substructure: subgraphs generated by cuts.
• The way to make safe choices is to choose light edges
crossing the cut.

This edge is light

B 8 C D
7
4 9
2
11 4
A I 14 E

8 7 6
10
H 1 G 2 F

A is the thick orange edges


Can we do better?
State-of-the-art MST on connected undirected graphs

• Karger-Klein-Tarjan 1995:
• O(m) time randomized algorithm
• Chazelle 2000:
• O(m⋅ 𝛼(𝑛)) time deterministic algorithm
• Pettie-Ramachandran 2002:
The optimal number of comparisons
• O N*(n,m) you need to solve the time deterministic algorithm
problem, whatever that is…
Recap
• Two algorithms for Minimum Spanning Tree
• Prim’s algorithm
• Kruskal’s algorithm

• Both are (more) examples of greedy algorithms!


• Make a series of choices.
• Show that at each step, your choice does not rule out
success.
• At the end of the day, you haven’t ruled out success, so
you must be successful.
Next time
• Cuts and flows!
• In the meantime,

You might also like