Unit 5&7 - GraphTheoryAndGreedyApproach
Unit 5&7 - GraphTheoryAndGreedyApproach
shortest path
Prim’s and Kruskal’s Algorithm for Minimal Spanning tree
Graph G=(V,E) is composed of
V = set of vertices
E = set of edges between vertices
a b
V = {a,b,c,d,e}
d e
To represent electric circuits
To represent networks (cities, flights, communications)
a b
d e
Directed Graph: Edge also shows the direction between
vertices. Edges are ordered pair. Edge (u,v) means “u” is
source and “v” is destination
Eg. Network of water supply
a b
d e
Un-weighted Graph: When weight/cost of edges is not
specified. Default weight is considered to be 1.
a b
d e
Weighted Graph: When each edge has a associated
cost/weight. Travelling an edge results in it’s cost added to total
cost.
5
a b
8
12
c 3
6
1
d e
5
Adjacent vertices: vertices connected by an edge
Eg. Adjacent(a) = {b,c,d}
d e
Path: Sequence of vertices v1, v2,…,vk such that there is an edge
between two consecutive vertices
a b a b
c c
d e d e
adce abecde
Simple Path: a path with no repeated vertices
a b a b
c c
d e d e
adce abecde
YES NO
Cycle: simple path with same start and last vertex
a b a b
c c
d e d e
adce abedca
NO Yes
Connected Graph: any two vertices are connected through
some path
a b a b
c c
d e d e
a b a
c c
d e d e
G G’ Subgraph of G
Connected Components: maximal connected subgraph
Eg. Below graph has 3 connected components
(free)Tree: connected graph without cycles
(free)Tree: connected graph without cycles
Forest: collection of tree
If m= #edges and n= #vertices, then
Minimum number of edges possible: m=0
a b
d e
c
If m= #edges and n= #vertices, then
Minimum number of edges possible: m=0
Maximum number of edges possible:
m = n(n-1)/2 i.e. every vertex is adjacent a b
to every other vertex.
Complete graph: each pair of vertices are
adjacent
d e
c
If m= #edges and n= #vertices, then
Minimum number of edges possible: m=0
Maximum number of edges possible:
m = n(n-1)/2 i.e. every vertex is adjacent a b
to every other vertex
Complete graph: each pair of vertices are
adjacent
Number of edges for a tree: m=n-1
d e
c
If m= #edges and n= #vertices, then
Minimum number of edges possible: m=0
Maximum number of edges possible:
m = n(n-1)/2 i.e. every vertex is adjacent a b
to every other vertex
Complete graph: each pair of vertices are
adjacent
Number of edges for a tree: m=n-1
If m < n-1 then graph is not connected d e
If m<=n-k, then graph will have ≥k
connected component(s)
c
A spanning tree of graph G is a subgraph which is a tree and
has all vertices of G
a b a b
d e d e
c c
Consider the given graph. We will represent this graph using
various techniques.
A12
a b A23
A21
A25
A51
c
A43
A54 A35
d e
A45
Edge List
Adjacency List
Adjacency Matrix
A list of all the edges where each node in list stores some info
about the edge
Each node also stores link to the two vertices and a link to next
edge in the list
Space required: Q(m+n)
a b c d e
O(1): insert edge/vertex, both vertices, source, destination
O(m): edges incidenting on a vertex, number of edges,
removing vertex
a b c d e
For each vertex, store list of adjacent vertices
If directed graph(digraph), store two list for “in” and “out”
edges separately
Space complexity: Q(n+Ʃdegree(v)) = Q(n+m)
IN OUT
a b e a b
b a b a c e
c b d c e
d e d c e
e b c d e a d
Some operations like finding adjacent vertices, degree of a
vertex can be done in constant time/time proportional to
degree of the vertex
But, as we do not store information about edges, operations like
end points of an edge is not possible.
Eg. end points of edge A12.
Extend Edge List to store Adjacency list at each vertex
a b c d e
A12
a b c d e a b A23
a 0 1 0 0 0 A21
b 1 0 1 0 1 A25
A51
c 0 0 0 0 1 c
A43
d 0 0 1 0 1
e 1 0 0 1 0 A54 A35
d e
A45
Modify Adjacency matrix representation such that each row
and column corresponds to a vertex and matrix stores Edge
information
0 1 2 3 4
0 0 A12 0 0 0
1 A21 0 A23 0 A25
2 0 0 0 0 A35
3 0 0 A43 0 A45
4 A51 0 0 A54 0
0 a 1 b 2 c 3 d 4 e
Starting from a vertex(source), visit all the nodes which are
directly connected. Repeat the process of visiting remaining
vertices from the recently visited vertices
BFS produces spanning tree
Uses of BFS:
Determines whether graph is connected or not
Can determine the shortest distance from source for every vertex
Initialize all the vertex with label ∞ and predecessor NULL
Enqueue source vertex in queue “Q” and set label 0
In each step:
Dequeue from Q
For each adjacent vertex of dequeued vertex
if label = ∞ then
label = label of dequeued vertex + 1
predecessor = dequeued vertex
Repeat until Q is empty
1. Initialize vertices with label ∞
∞ ∞
a b
c ∞
d e
∞ ∞
Q
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
0 ∞
a b
c ∞
d e
∞ ∞
Q a
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
c 1
d e
1 ∞
Q b c d
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
d e
1 2
Q c d e
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
7. Dequeue Q, find adjacent of c
d e
8. Ignore all as all are visited 1 2
Q d e
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
7. Dequeue Q, find adjacent of c
d e
8. Ignore all as all are visited 1 2
9. Dequeue Q, find adjacent of d. Ignore all. Q e
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
7. Dequeue Q, find adjacent of c
d e
8. Ignore all as all are visited 1 2
9. Dequeue Q, find adjacent of d. Ignore all. Q
10. Dequeue Q, find adjacent of e. Ignore all.
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
7. Dequeue Q, find adjacent of c
d e
8. Ignore all as all are visited 1 2
9. Dequeue Q, find adjacent of d. Ignore all. Q
10. Dequeue Q, find adjacent of e. Ignore all.
11. Stop as Q is empty
The resultant graph is spanning tree
Every node has shortest distance from
source ‘a’ stored in label
0 1
a b
c 1
d e
1 2
Initializing vertices takes: O(V) or O(n)
All the vertices are examined once only, hence O(V) again
Assuming we were using adjacency list representation, each
time we dequeue a vertex we fetch its adjacent vertex. In total,
accessing adjacent vertices for all the vertices will take O(E) or
O(m)
Therefore, examining vertices will take O(V+E)
Total time: O(V)+ O(V+E) = O(V+E)
Similar to Depth First traversal of tree
Only difference is that graph might have cycle
Therefore, mark a vertex visited if DFS is applied on it
DFS also produces spanning tree as output
DFS traverse each edge and vertex(BFS too)
DFS application:
Finding cycle in the graph
All pair shortest path
Initialize label for all vertex as 0(Unvisited)
Start from any vertex s
If the label is 0, set it to 1 else ignore
For all the adjacent vertex of s, recursively perform DFS
Initialize label for each vertex to 0
0 0
a b
c 0
d e
0 0
f
0
Initialize label for each vertex to 0
Start from vertex a, mark a visited
Run DFS for adjacent vertex b 1 0
a b
c 0
d e
0 0
f
0
Initialize label for each vertex to 0
Start from vertex a, mark a visited
Run DFS for adjacent vertex b 1 1
a b
As Label(b) =0, therefore mark b visited
Run DFS for adjacent vertex a
c 0
As a is already visited, run DFS(e)
d e
0 0
f
0
Initialize label for each vertex to 0
Start from vertex a, mark a visited
Run DFS for adjacent vertex b 1 1
a b
As Label(b) =0, therefore mark b visited
Run DFS for adjacent vertex a
c 0
As a is already visited, run DFS(e)
As Label(e)=0, therefore mark e visited
d e
Run DFC(c) 0 1
f
0
Initialize label for each vertex to 0
Start from vertex a, mark a visited
Run DFS for adjacent vertex b 1 1
a b
As Label(b) =0, therefore mark b visited
Run DFS for adjacent vertex a
c 1
As a is already visited, run DFS(e)
As Label(e)=0, therefore mark e visited
d e
Run DFS(c) 0 1
As Label(c)=0, mark c visited
Run DFS(d)
f
0
Initialize label for each vertex to 0
Start from vertex a, mark a visited
Run DFS for adjacent vertex b 1 1
a b
As Label(b) =0, therefore mark b visited
Run DFS for adjacent vertex a
c 1
As a is already visited, run DFS(e)
As Label(e)=0, therefore mark e visited
d e
Run DFS(c) 1 1
As Label(c)=0, mark c visited
Run DFS(d)
As Label(d)=0, mark d visited f
0
Initialize label for each vertex to 0
Start from vertex a, mark a visited
Run DFS for adjacent vertex b
As Label(b) =0, therefore mark b visited 1 1
a b
Run DFS for adjacent vertex a
As a is already visited, run DFS(e)
As Label(e)=0, therefore mark e visited c 1
Run DFS(c)
As Label(c)=0, mark c visited
d e
Run DFS(d) 1 1
As Label(d)=0, mark d visited
As a and e are already visited, backtrack to c
As a and e are already visited, backtrack to e
d is visited. Run DFS(f) f
0
Initialize label for each vertex to 0
Start from vertex a, mark a visited
Run DFS for adjacent vertex b
As Label(b) =0, therefore mark b visited 1 1
Run DFS for adjacent vertex a a b
As a is already visited, run DFS(e)
As Label(e)=0, therefore mark e visited c 1
Run DFS(c)
As Label(c)=0, mark c visited
d e
Run DFS(d)
1 1
As Label(d)=0, mark d visited
As a and e are already visited, backtrack to c
As a and e are already visited, backtrack to e
d is visited. Run DFS(f)
f
As Label(f) = 0, mark f visited 0
DFS also produces a spanning tree
DFS algorithm can be altered to find
cycle in a graph and to fit other 1 1
a b
application
c 1
d e
1 1
f
1
Initializing vertices : O(V)
In total, we are accessing adjacency list of every node exactly
once. Hence, time required is O(V+E)
Total time required: O(V)+O(V+E) = O(V+E)
If we say that graph is connected, then V<E. Thus, O(E)
SSSP algorithms find shortest path from single source(vertex)
to every other vertex
For un-weighted graph, we can use BFS to find shortest path to
each vertex from a given source.
For weighted graph, multiple algorithms exists to find the
shortest path.
Dijkstra’s SSSP algorithms is one
widely used in routing protocol systems
used in GPS navigating systems.
This algo is generally used for weighted graphs, though can
also be used for un-weighted graphs.
Precondition for Dijkstra’s algo:
Graph should not have any negative weight edge
Input: Weighted graph G={E,V} and source vertex v∈V, such that all
edge weights are nonnegative
Contributed to :
• Shortest path-algorithm, also known as
Dijkstra's algorithm;
• Reverse Polish Notation and related
www.math.bas.bg/.../EWDwww.jpg Shunting yard algorithm;
• THE multiprogramming system;
• Banker's algorithm;
• Self-stabilization – an alternative way to
ensure the reliability of the system.
1. Initialize distance label for all vertex to ∞
2. Set distance of source vertex to 0
3. While shortest distance for all vertices are known
1. Pick vertex with minimum distance label, let v
2. Mark v as known
3. For all unknown vertex n adjacent to v
1. dist = D[v] + weight of edge (v,n)
2. If dist < D[n], then D[n] = dist
d[s] 0
for each v V – {s}
do d[v]
S
QV ⊳ Q is a priority queue maintaining V – S
while Q
do u EXTRACT-MIN(Q)
S S {u}
for each v Adj[u]
do if d[v] > d[u] + w(u, v) relaxation
then d[v] d[u] + w(u, v) step
p[v] u
70
EXERCISE
0
2 2
A B F 3 H
1
5 10 2 1
4 9
3 G
2 C
11 1
D E
7
71
SOLUTION
0 2 4 7
2 2
A B F 3 H
1
5 10 2 1
4 9 3 G 8
2 C 1
11 1
4 D E
7 11 vertex known? cost path
A Y 0
B Y 2 A
C Y 1 A
D Y 4 A
Order Added to Known Set:
E Y 11 G
F Y 4 B
A, C, B, D, F, H, G, E
G Y 8 H
H Y 7 F
When linked list or Array is used to store vertices and their
distance label
Use case: BSNL wants to lay underground cables across the city
to connect every household. One solution is to connect every
household directly with centre. Another is building an MST
10 10
a b a b
30
40 20 40 20
80
d 60 e d e
50
70 50
c c
Kruskal’s Algo: In each step include edge with minimum weight
in MST from remaining edges such that it does not forms a
cycle. Repeat this process until all vertices are covered
Prim’s Algo: Start from any vertex and add it to set S. From all
the edges having one endpoint in S, pick the edge with
minimum weight not forming cycle and add to MST. Repeat the
same process, till all vertices are covered.
Kruskal’s Algorithm might produce a forest for a connected
graph intermediately but eventually will produce MST at end.
Algo:
1. Start with NULL MST
2. From set E, delete edge with minimum weight e
3. If adding e to MST creates cycle, reject e. Else add e to MST
4. Repeat step 2 for remaining edges in E or until n-1 edges are
added to MST
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b a b
30
40 20
80
d 60 e
50
70
c
10
a b a b
30
40 20
80
d 60 e
50
70
c
10
a b a b
30
40 20
80
d 60 e e
50
70
c
10
a b a b
30
40 20
80
d 60 e e
50
70
c
10
a b a b
40 20
80
d 60 e e
50
70
c
10
a b a b
40 20
80
d 60 e e
50
70
c
10
a b a b
40 20
80
d 60 e d e
50
70
c
10
a b a b
40 20
80
d 60 e d e
50
70
c
10
a b a b
40 20
80
60 e d e
50
70
c c
10
a b a b
40 20
80
60 e d e
50
70
c c
Algo
1. Start with NULL MST
2. Add any arbitrary vertex v to set S
3. Add a new vertex from V-S to S which is adjacent to any
vertex u ϵ S with minimum weight edge
4. Repeat step 3 until all vertices are included in S.
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
30
40 20
80
d 60 e
50
70
c
10
a b
40 20
d e
50
c
PrimAlgo() {
Initialize array M of vertices to 0 i.e. M[i] =0
Start from a vertex v and set M[v]=1
For all edges e adjacent to v, H.insert(e)
While(!H.empty()) {
while(1) {
f=H.deletemin()
let f is an edge (x,y), then
if (M[x]==1 && M[y]==1)
continue
else
break;
}
for all edges e adjacent to y, let (y,z) is the edge
if (M[z] ==0) then
H.insert(e)
M[y]=1;
}
}
We are inserting every edge at most once in the heap. As
insertion in heap take O(log m) time with m elements, total time
required for insert m elements in heap, time required=O(m log
m)
a b c d e f
Let, a and b are now connected. Then {a} and {b} will be
merged and let say represented by ‘b’
b c d e f
a
Further ‘b’ and ‘c’ are connected. Also ‘d’ and ‘e’
c e f
b d
a
Now, connecting ‘b’ and ‘e’ will produce sets as below
c f
b e
a d
If we want to connect vertices ‘b’ and ‘d’, then it would create
cycle as
representative(b) = representative(d)
c f
b e
a d
REP 1 2 2 4 2 5
0 1 2 3 4 5
1 2 3 … n
Union(1,2)
2 3
… n
Union(2,3)
1
3
… n :
:
2
n Union(n-1,n)
1
3
2 Find(1) n steps!!
1
108
Cost of n Union operations followed by n Find
operations is n2
Θ(n) per operation
Find operation takes O(n) time in worst case
Union requires to run two find operations. But if we know
representative of two elements, union can be done O(1).
1. Union-by-size/rank
• Improve Union so that Find only takes worst case time of Θ(log n).
2. Path compression
• Improve Find so that, with Union-by-size/rank,
Find takes amortized time of almost Θ(1).
111
At each union operation, we have an option to either make
representative(u) the new representative or representative(v)
the new representative.
c e f
b d
a
At each union operation, we have an option to either make
representative(u) the new representative or representative(v)
the new representative.
c e f
b d
c f
b e
a d
At each union operation, we have an option to either make
representative(u) the new representative or representative(v)
the new representative.
c e f
b d
a
e
c f
c d f
b e
b
a d a
Therefore, if we do union cleverly, we can keep the height of
tree low. Thus, reducing time to find representative for any
vertex in O(log n) time.
Union-by-size
Always point the smaller tree to the root of the larger tree
S-Union(7,1)
2 1 4
1 3 7
2 5 4
6
116
1 2 3 … n
S-Union(1,2)
2 3
… n
S-Union(2,3)
1
2
… n :
:
1 3
S-Union(n-1,n)
2
1 3 … n
Find(1) constant time
117
Theorem: With union-by-size an up-tree of height h has size
at least 2h.
Proof by induction
Base case: h = 0. The up-tree has one node, 20 = 1
Inductive hypothesis: Assume true for h-1
Observation: tree gets taller only as a result of a union.
T = S-Union(T1,T2)
≤h-1
T1 h-1
T2
118
What is worst case complexity of Find(x) in an up-tree forest of n
nodes?
119
n/2 Unions-by-size
n/4 Unions-by-size
120
After n -1 = n/2 + n/4 + …+ 1 Unions-by-size
log2n
Find
If there are n = 2k nodes then the longest
path from leaf to root has length k.
121
2 1 4
1 3 7
2 5 4
2 5 4
a b c d e f
In the same example, tree will little different as like
a b c d e f
In the same example, tree will little different as like
a b c d e f
In the same example, tree will little different as like
a b c d e f
In the same example, tree will little different as like
a b c d e f
In the same example, tree will little different as like
b f
a e
c
d
Loop for each edge e(u,v) in m:
a=find(u)
b=find(v)
if a != b
if rank(a) > rank(b)
unionByRank(a,b)
else
unionByRank(b,a)
1 7 1 7
2 5 4 PC-Find(3) 2 3 6 5 4
6 8 9 10 8 9
3 10 133
PC-Find(x)
134
3 7
1 6 8
2 4
9 5
135
PC-Find(i) {
//find root
j = i;
while (up[j] >= 0) {
j = up[j];
root = j;
//compress path
if (i != root) {
parent = up[i];
while (parent != root) {
up[i] = root;
i = parent;
parent = up[parent];
}
}
return(root)
}
136
Worst case time complexity for…
…a single Union-by-size is:
…a single PC-Find is:
137
LOG* N
log* n = number of times you need to apply
log to bring value down to at most 1
log* 2 = 1
log* 4 = log* 22 = 2
log* 16 = log* 222 = 3 (log log log 16 = 1)
22
log* 65536 = log* 2 = 4 (log log log log 65536 = 1)
2
138
In fact, Tarjan showed the time complexity for
m n operations on n elements is:
Q(m a(m, n))
Amortized complexity is then Q(a(m, n)) .
What is a(m, n)?
Inverse of Ackermann’s function.
For reasonable values of m, n, grows even slower than log * n. So,
it’s even “more constant.”