0% found this document useful (0 votes)
4 views

Unit 5&7 - GraphTheoryAndGreedyApproach

The document discusses different ways to represent graphs including adjacency lists, edge lists, and adjacency matrices. It also covers graph terminology like vertices, edges, paths, trees, and connectivity. Graph representations are useful for modeling networks and finding shortest paths.

Uploaded by

anmolbansal1969
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit 5&7 - GraphTheoryAndGreedyApproach

The document discusses different ways to represent graphs including adjacency lists, edge lists, and adjacency matrices. It also covers graph terminology like vertices, edges, paths, trees, and connectivity. Graph representations are useful for modeling networks and finding shortest paths.

Uploaded by

anmolbansal1969
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 139

Graph representations, Graph Traversals, Dijkstra’s algorithm for

shortest path
Prim’s and Kruskal’s Algorithm for Minimal Spanning tree
 Graph G=(V,E) is composed of
 V = set of vertices
 E = set of edges between vertices

 A vertex (vi)is a node in the graph


 An edge (e = (vi, vj)) is a pair of vertices

a b
V = {a,b,c,d,e}

c E={(a,b), (b,e), (e,d),


(d,a), (a,c), (c,d), (c,e)}

d e
 To represent electric circuits
 To represent networks (cities, flights, communications)

 Once a network of circuit is modelled as graph where node


showing the entities and edges showing the relation between
entities, known properties and algorithms can be applied
 Modelling flights between cities as graph, one can find shortest
path between any two cities using known shortest path
algorithms
 Undirected Graph: when an edge between vertices is
bidirectional. Edge (u,v) is also (v,u)
 Eg. Network of roads between cities

a b

d e
 Directed Graph: Edge also shows the direction between
vertices. Edges are ordered pair. Edge (u,v) means “u” is
source and “v” is destination
 Eg. Network of water supply

a b

d e
 Un-weighted Graph: When weight/cost of edges is not
specified. Default weight is considered to be 1.

a b

d e
 Weighted Graph: When each edge has a associated
cost/weight. Travelling an edge results in it’s cost added to total
cost.

5
a b
8
12
c 3
6
1

d e
5
 Adjacent vertices: vertices connected by an edge
 Eg. Adjacent(a) = {b,c,d}

 Degree of a vertex: No. of adjacent vertices


 Eg. Degree(a) = 3

 Sum of degrees of all vertices


= 2*number of edges a b
= 2*e ; where e=|E|
c

d e
 Path: Sequence of vertices v1, v2,…,vk such that there is an edge
between two consecutive vertices

a b a b

c c

d e d e

adce abecde
 Simple Path: a path with no repeated vertices

a b a b

c c

d e d e
adce abecde
YES NO
 Cycle: simple path with same start and last vertex

a b a b

c c

d e d e
adce abedca
NO Yes
 Connected Graph: any two vertices are connected through
some path

a b a b

c c

d e d e

Connected Not Connected


 Subgraph: subset of vertices and edges forming a graph

a b a

c c

d e d e

G G’ Subgraph of G
 Connected Components: maximal connected subgraph
 Eg. Below graph has 3 connected components
 (free)Tree: connected graph without cycles
 (free)Tree: connected graph without cycles
 Forest: collection of tree
 If m= #edges and n= #vertices, then
 Minimum number of edges possible: m=0

a b

d e

c
 If m= #edges and n= #vertices, then
 Minimum number of edges possible: m=0
 Maximum number of edges possible:
m = n(n-1)/2 i.e. every vertex is adjacent a b
to every other vertex.
 Complete graph: each pair of vertices are
adjacent

d e

c
 If m= #edges and n= #vertices, then
 Minimum number of edges possible: m=0
 Maximum number of edges possible:
m = n(n-1)/2 i.e. every vertex is adjacent a b
to every other vertex
 Complete graph: each pair of vertices are
adjacent
 Number of edges for a tree: m=n-1
d e

c
 If m= #edges and n= #vertices, then
 Minimum number of edges possible: m=0
 Maximum number of edges possible:
m = n(n-1)/2 i.e. every vertex is adjacent a b
to every other vertex
 Complete graph: each pair of vertices are
adjacent
 Number of edges for a tree: m=n-1
 If m < n-1 then graph is not connected d e
 If m<=n-k, then graph will have ≥k
connected component(s)

c
 A spanning tree of graph G is a subgraph which is a tree and
has all vertices of G

a b a b

d e d e

c c
 Consider the given graph. We will represent this graph using
various techniques.

A12

a b A23
A21
A25
A51
c
A43

A54 A35
d e
A45
 Edge List
 Adjacency List
 Adjacency Matrix
 A list of all the edges where each node in list stores some info
about the edge
 Each node also stores link to the two vertices and a link to next
edge in the list
 Space required: Q(m+n)

A12 A21 A23 A25 A35 A43 A45 A51 A54

a b c d e
 O(1): insert edge/vertex, both vertices, source, destination
 O(m): edges incidenting on a vertex, number of edges,
removing vertex

A12 A21 A23 A25 A35 A43 A45 A51 A54

a b c d e
 For each vertex, store list of adjacent vertices
 If directed graph(digraph), store two list for “in” and “out”
edges separately
 Space complexity: Q(n+Ʃdegree(v)) = Q(n+m)
IN OUT

a b e a b

b a b a c e

c b d c e

d e d c e

e b c d e a d
 Some operations like finding adjacent vertices, degree of a
vertex can be done in constant time/time proportional to
degree of the vertex
 But, as we do not store information about edges, operations like
end points of an edge is not possible.
 Eg. end points of edge A12.
 Extend Edge List to store Adjacency list at each vertex

A12 A21 A23 A25 A35 A43 A45 A51 A54

a b c d e

IN OUT IN OUT IN OUT IN OUT IN OUT


b b a a b e e c b a
e c d e c d
e d
 Space Complexity: O(n+m)
 Finding adjacent vertex is O(1), but operations like finding
in/out edges from a vertex is expensive
 If we store list of incident edges in adjacency list, then can
reduce time complexity of finding in/out edges from vertex

 Therefore, all such decision depends on which operations we


want to optimize
 An “n x n” matrix where n is #vertex.
 M[i,j] = 1 if there is an edge from vertex i to j, else 0
 Space complexity = Q(n2)

A12

a b c d e a b A23
a 0 1 0 0 0 A21
b 1 0 1 0 1 A25
A51
c 0 0 0 0 1 c
A43
d 0 0 1 0 1
e 1 0 0 1 0 A54 A35
d e
A45
 Modify Adjacency matrix representation such that each row
and column corresponds to a vertex and matrix stores Edge
information

0 1 2 3 4
0 0 A12 0 0 0
1 A21 0 A23 0 A25
2 0 0 0 0 A35
3 0 0 A43 0 A45
4 A51 0 0 A54 0

0 a 1 b 2 c 3 d 4 e
 Starting from a vertex(source), visit all the nodes which are
directly connected. Repeat the process of visiting remaining
vertices from the recently visited vertices
 BFS produces spanning tree
 Uses of BFS:
 Determines whether graph is connected or not
 Can determine the shortest distance from source for every vertex
 Initialize all the vertex with label ∞ and predecessor NULL
 Enqueue source vertex in queue “Q” and set label 0
 In each step:
 Dequeue from Q
 For each adjacent vertex of dequeued vertex
if label = ∞ then
label = label of dequeued vertex + 1
predecessor = dequeued vertex
 Repeat until Q is empty
1. Initialize vertices with label ∞

∞ ∞
a b

c ∞

d e
∞ ∞
Q
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
0 ∞
a b

c ∞

d e
∞ ∞
Q a
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue

c 1

d e
1 ∞
Q b c d
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue

d e
1 2

Q c d e
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
7. Dequeue Q, find adjacent of c
d e
8. Ignore all as all are visited 1 2

Q d e
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
7. Dequeue Q, find adjacent of c
d e
8. Ignore all as all are visited 1 2
9. Dequeue Q, find adjacent of d. Ignore all. Q e
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
7. Dequeue Q, find adjacent of c
d e
8. Ignore all as all are visited 1 2
9. Dequeue Q, find adjacent of d. Ignore all. Q
10. Dequeue Q, find adjacent of e. Ignore all.
1. Initialize vertices with label ∞
2. Set label 0 and enqueue a
3. dequeue Q, find adjacent of a 0 1
a b
4. Set label for b,c,d 1 and enqueue
5. dequeue Q, find adjacent of b
c 1
6. Ignore a. set label for e 2 and enqueue
7. Dequeue Q, find adjacent of c
d e
8. Ignore all as all are visited 1 2
9. Dequeue Q, find adjacent of d. Ignore all. Q
10. Dequeue Q, find adjacent of e. Ignore all.
11. Stop as Q is empty
 The resultant graph is spanning tree
 Every node has shortest distance from
source ‘a’ stored in label
0 1
a b

c 1

d e
1 2
 Initializing vertices takes: O(V) or O(n)
 All the vertices are examined once only, hence O(V) again
 Assuming we were using adjacency list representation, each
time we dequeue a vertex we fetch its adjacent vertex. In total,
accessing adjacent vertices for all the vertices will take O(E) or
O(m)
 Therefore, examining vertices will take O(V+E)
 Total time: O(V)+ O(V+E) = O(V+E)
 Similar to Depth First traversal of tree
 Only difference is that graph might have cycle
 Therefore, mark a vertex visited if DFS is applied on it
 DFS also produces spanning tree as output
 DFS traverse each edge and vertex(BFS too)
 DFS application:
 Finding cycle in the graph
 All pair shortest path
 Initialize label for all vertex as 0(Unvisited)
 Start from any vertex s
 If the label is 0, set it to 1 else ignore
 For all the adjacent vertex of s, recursively perform DFS
 Initialize label for each vertex to 0

0 0
a b

c 0

d e
0 0

f
0
 Initialize label for each vertex to 0
 Start from vertex a, mark a visited
 Run DFS for adjacent vertex b 1 0
a b

c 0

d e
0 0

f
0
 Initialize label for each vertex to 0
 Start from vertex a, mark a visited
 Run DFS for adjacent vertex b 1 1
a b
 As Label(b) =0, therefore mark b visited
 Run DFS for adjacent vertex a
c 0
 As a is already visited, run DFS(e)

d e
0 0

f
0
 Initialize label for each vertex to 0
 Start from vertex a, mark a visited
 Run DFS for adjacent vertex b 1 1
a b
 As Label(b) =0, therefore mark b visited
 Run DFS for adjacent vertex a
c 0
 As a is already visited, run DFS(e)
 As Label(e)=0, therefore mark e visited
d e
 Run DFC(c) 0 1

f
0
 Initialize label for each vertex to 0
 Start from vertex a, mark a visited
 Run DFS for adjacent vertex b 1 1
a b
 As Label(b) =0, therefore mark b visited
 Run DFS for adjacent vertex a
c 1
 As a is already visited, run DFS(e)
 As Label(e)=0, therefore mark e visited
d e
 Run DFS(c) 0 1
 As Label(c)=0, mark c visited
 Run DFS(d)
f
0
 Initialize label for each vertex to 0
 Start from vertex a, mark a visited
 Run DFS for adjacent vertex b 1 1
a b
 As Label(b) =0, therefore mark b visited
 Run DFS for adjacent vertex a
c 1
 As a is already visited, run DFS(e)
 As Label(e)=0, therefore mark e visited
d e
 Run DFS(c) 1 1
 As Label(c)=0, mark c visited
 Run DFS(d)
 As Label(d)=0, mark d visited f
0
 Initialize label for each vertex to 0
 Start from vertex a, mark a visited
 Run DFS for adjacent vertex b
 As Label(b) =0, therefore mark b visited 1 1
a b
 Run DFS for adjacent vertex a
 As a is already visited, run DFS(e)
 As Label(e)=0, therefore mark e visited c 1
 Run DFS(c)
 As Label(c)=0, mark c visited
d e
 Run DFS(d) 1 1
 As Label(d)=0, mark d visited
 As a and e are already visited, backtrack to c
 As a and e are already visited, backtrack to e
 d is visited. Run DFS(f) f
0
 Initialize label for each vertex to 0
 Start from vertex a, mark a visited
 Run DFS for adjacent vertex b
 As Label(b) =0, therefore mark b visited 1 1
 Run DFS for adjacent vertex a a b
 As a is already visited, run DFS(e)
 As Label(e)=0, therefore mark e visited c 1
 Run DFS(c)
 As Label(c)=0, mark c visited
d e
 Run DFS(d)
1 1
 As Label(d)=0, mark d visited
 As a and e are already visited, backtrack to c
 As a and e are already visited, backtrack to e
 d is visited. Run DFS(f)
f
 As Label(f) = 0, mark f visited 0
 DFS also produces a spanning tree
 DFS algorithm can be altered to find
cycle in a graph and to fit other 1 1
a b
application

c 1

d e
1 1

f
1
 Initializing vertices : O(V)
 In total, we are accessing adjacency list of every node exactly
once. Hence, time required is O(V+E)
 Total time required: O(V)+O(V+E) = O(V+E)
 If we say that graph is connected, then V<E. Thus, O(E)
 SSSP algorithms find shortest path from single source(vertex)
to every other vertex
 For un-weighted graph, we can use BFS to find shortest path to
each vertex from a given source.
 For weighted graph, multiple algorithms exists to find the
shortest path.
 Dijkstra’s SSSP algorithms is one
 widely used in routing protocol systems
 used in GPS navigating systems.
 This algo is generally used for weighted graphs, though can
also be used for un-weighted graphs.
 Precondition for Dijkstra’s algo:
 Graph should not have any negative weight edge

Input: Weighted graph G={E,V} and source vertex v∈V, such that all
edge weights are nonnegative

Output: Lengths of shortest paths (or the shortest paths


themselves) from a given source vertex v∈V to all other vertices
• Dutch Computer Scientist
• Received Turing Award for contribution to
developing programming languages.

Contributed to :
• Shortest path-algorithm, also known as
Dijkstra's algorithm;
• Reverse Polish Notation and related
www.math.bas.bg/.../EWDwww.jpg Shunting yard algorithm;
• THE multiprogramming system;
• Banker's algorithm;
• Self-stabilization – an alternative way to
ensure the reliability of the system.
1. Initialize distance label for all vertex to ∞
2. Set distance of source vertex to 0
3. While shortest distance for all vertices are known
1. Pick vertex with minimum distance label, let v
2. Mark v as known
3. For all unknown vertex n adjacent to v
1. dist = D[v] + weight of edge (v,n)
2. If dist < D[n], then D[n] = dist
d[s]  0
for each v  V – {s}
do d[v]  
S
QV ⊳ Q is a priority queue maintaining V – S
while Q  
do u  EXTRACT-MIN(Q)
S  S  {u}
for each v  Adj[u]
do if d[v] > d[u] + w(u, v) relaxation
then d[v]  d[u] + w(u, v) step
p[v]  u
70

EXERCISE

0   
2 2
A B F 3 H
1
5 10 2 1
4 9

3 G 
2 C
11 1
 D E
7 
71

SOLUTION
0 2 4 7
2 2
A B F 3 H
1
5 10 2 1
4 9 3 G 8
2 C 1
11 1
4 D E
7 11 vertex known? cost path
A Y 0
B Y 2 A
C Y 1 A
D Y 4 A
Order Added to Known Set:
E Y 11 G
F Y 4 B
A, C, B, D, F, H, G, E
G Y 8 H
H Y 7 F
When linked list or Array is used to store vertices and their
distance label

 Initialization of all vertices : O(V)


 Finding and deleting minimum will take O(V) time
 While loop will run O(V) times and in each step find and delete
minimum
 Hence, total is O(V2)
 Optionally, we might need a previous pointer at each vertex to
redraw the shortest path tree, which is O(E)
 Total time= O(V2+E)
 For an un-directed connected weighted graph, MST is a
spanning tree with minimum total cost
 Each edge has some cost associated in a weighted graph
 There can be more than one MST for a graph, though cost of
every MST possible is same

 Use case: BSNL wants to lay underground cables across the city
to connect every household. One solution is to connect every
household directly with centre. Another is building an MST
10 10
a b a b

30
40 20 40 20
80

d 60 e d e

50
70 50

c c
 Kruskal’s Algo: In each step include edge with minimum weight
in MST from remaining edges such that it does not forms a
cycle. Repeat this process until all vertices are covered

 Prim’s Algo: Start from any vertex and add it to set S. From all
the edges having one endpoint in S, pick the edge with
minimum weight not forming cycle and add to MST. Repeat the
same process, till all vertices are covered.
 Kruskal’s Algorithm might produce a forest for a connected
graph intermediately but eventually will produce MST at end.

Algo:
1. Start with NULL MST
2. From set E, delete edge with minimum weight e
3. If adding e to MST creates cycle, reject e. Else add e to MST
4. Repeat step 2 for remaining edges in E or until n-1 edges are
added to MST
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b a b

30
40 20
80

d 60 e

50
70

c
10
a b a b

30
40 20
80

d 60 e

50
70

c
10
a b a b

30
40 20
80

d 60 e e

50
70

c
10
a b a b

30
40 20
80

d 60 e e

50
70

c
10
a b a b

40 20
80

d 60 e e

50
70

c
10
a b a b

40 20
80

d 60 e e

50
70

c
10
a b a b

40 20
80

d 60 e d e

50
70

c
10
a b a b

40 20
80

d 60 e d e

50
70

c
10
a b a b

40 20
80

60 e d e

50
70

c c
10
a b a b

40 20
80

60 e d e

50
70

c c

Total cost = 120


 Kruskal’s algo requires to delete edge with minimum weight at
each step. Therefore, if we sort edges in ascending order : O(m
log m)
and deleting smallest element is O(1) time
 For each edge, compare whether adding the edge would create
cycle: O(log n) [using union-find data structure]
 If not then adding that edge to MST: O(1) [using union-find]

 Time required to run loop: O(m log n + n)

Loop Adding only


running m n-1 edges
times
 Total running time = O(m log m + m log n + n)
= O(m log n) [m ≤ n2]
 Start with arbitrary vertex in the tree. In each step add one
more vertex with minimum weight to tree until all vertices are
covered.

Algo
1. Start with NULL MST
2. Add any arbitrary vertex v to set S
3. Add a new vertex from V-S to S which is adjacent to any
vertex u ϵ S with minimum weight edge
4. Repeat step 3 until all vertices are included in S.
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

30
40 20
80

d 60 e

50
70

c
10
a b

40 20

d e

50

c
PrimAlgo() {
Initialize array M of vertices to 0 i.e. M[i] =0
Start from a vertex v and set M[v]=1
For all edges e adjacent to v, H.insert(e)
While(!H.empty()) {
while(1) {
f=H.deletemin()
let f is an edge (x,y), then
if (M[x]==1 && M[y]==1)
continue
else
break;
}
for all edges e adjacent to y, let (y,z) is the edge
if (M[z] ==0) then
H.insert(e)
M[y]=1;
}
}
 We are inserting every edge at most once in the heap. As
insertion in heap take O(log m) time with m elements, total time
required for insert m elements in heap, time required=O(m log
m)

 Similarly, each edge will be deleted at most once from heap.


Deletion in heap take O(log m) for heap with m elements. Total
time required for m elements= O(m log m)
 Union of all sets is equal to universal set
 Intersection of any two sets is Null
 Each set has a representative
 Representative is an element from the set
 Two elements belong to same set if the representative of two
elements is same
 Merging two sets creates a new set with elements from both set
present in it and deletes previously existing sets
 Representative of new set is either of the representative of
previously existing sets
 Initially, all vertices are disjoint sets and represented by same
vertex

a b c d e f

 Let, a and b are now connected. Then {a} and {b} will be
merged and let say represented by ‘b’

b c d e f

a
 Further ‘b’ and ‘c’ are connected. Also ‘d’ and ‘e’

c e f

b d

a
 Now, connecting ‘b’ and ‘e’ will produce sets as below

c f

b e

a d
 If we want to connect vertices ‘b’ and ‘d’, then it would create
cycle as
representative(b) = representative(d)

 We are performing two operations here:


 find(u): returning reference to representative of set containing u
 union(find(u),find(v)): if u and v are in two different set, then
merge two sets and make one of the representatives as
representative of merged set
 Let’s take an array REP of size V (#vertices)
 Initialize REP[i] = i
 A vertex is representative of a set if REP[i] = i

c f

b e

a d

REP 1 2 2 4 2 5
0 1 2 3 4 5
1 2 3 … n
Union(1,2)
2 3
… n

Union(2,3)
1
3
… n :
:
2
n Union(n-1,n)
1
3

2 Find(1) n steps!!

1
108
 Cost of n Union operations followed by n Find
operations is n2
 Θ(n) per operation
 Find operation takes O(n) time in worst case
 Union requires to run two find operations. But if we know
representative of two elements, union can be done O(1).

For each edge e in m:


let e=(u,v) then,
a= find(u)
b= find(v)
if a != b then
union(a,b)

 Total time required = m*O(n) + n*O(1) = O(m*n)


Can we do better? Yes!

1. Union-by-size/rank
• Improve Union so that Find only takes worst case time of Θ(log n).

• Size – number of nodes, Rank – height

2. Path compression
• Improve Find so that, with Union-by-size/rank,
Find takes amortized time of almost Θ(1).

111
 At each union operation, we have an option to either make
representative(u) the new representative or representative(v)
the new representative.

c e f

b d

a
 At each union operation, we have an option to either make
representative(u) the new representative or representative(v)
the new representative.

c e f

b d

c f

b e

a d
 At each union operation, we have an option to either make
representative(u) the new representative or representative(v)
the new representative.

c e f

b d

a
e
c f
c d f
b e
b
a d a
 Therefore, if we do union cleverly, we can keep the height of
tree low. Thus, reducing time to find representative for any
vertex in O(log n) time.
Union-by-size
 Always point the smaller tree to the root of the larger tree

S-Union(7,1)

2 1 4
1 3 7

2 5 4

6
116
1 2 3 … n
S-Union(1,2)
2 3
… n

S-Union(2,3)
1
2
… n :
:
1 3
S-Union(n-1,n)
2

1 3 … n
Find(1) constant time

117
 Theorem: With union-by-size an up-tree of height h has size
at least 2h.
 Proof by induction
 Base case: h = 0. The up-tree has one node, 20 = 1
 Inductive hypothesis: Assume true for h-1
 Observation: tree gets taller only as a result of a union.

T = S-Union(T1,T2)

≤h-1
T1 h-1
T2

118
 What is worst case complexity of Find(x) in an up-tree forest of n
nodes?

 (Amortized complexity is no better.)

119
n/2 Unions-by-size

n/4 Unions-by-size

120
After n -1 = n/2 + n/4 + …+ 1 Unions-by-size

log2n

Find
If there are n = 2k nodes then the longest
path from leaf to root has length k.

121
2 1 4
1 3 7

2 5 4

Can store separate size array:


1 2 3 4 5 6 7
up -1 1 -1 7 7 5 -1
size 2 1 4
122
2 1 4
1 3 7

2 5 4

Better, store sizes in the up array:


1 2 3 4 5 6 7
up -2 1 -1 7 7 5 -4

Negative up-values correspond to sizes of roots.


123
S-Union(i,j){
// Collect sizes
si = -up[i];
sj = -up[j];

// verify i and j are roots


assert(si >=0 && sj >=0)
// point smaller sized tree to
// root of larger, update size
if (si < sj) {
up[i] = j;
up[j] = -(si + sj);
else {
up[j] = i;
up[i] = -(si + sj);
}
} 124
 When merging two trees(sets) with n1 and n2
nodes(elements), such that n1<n2 then make root of
n1 to point root of n2(representative of n1 to point
representative of n2)
 Initially, height of two trees is log(n1) and log(n2)
respectively
 After merging, height of new tree ≤ log(n1 + n2)
 As the height is O(log n), hence “find” will take
O(log n) time
 In the same example, tree will little different as like

a b c d e f
 In the same example, tree will little different as like

a b c d e f
 In the same example, tree will little different as like

a b c d e f
 In the same example, tree will little different as like

a b c d e f
 In the same example, tree will little different as like

a b c d e f
 In the same example, tree will little different as like

b f

a e
c

d
Loop for each edge e(u,v) in m:
a=find(u)
b=find(v)
if a != b
if rank(a) > rank(b)
unionByRank(a,b)
else
unionByRank(b,a)

Time complexity = m*log n+ n*1 = O(m log n)


 To improve the amortized complexity, we’ll borrow an
idea from splay trees:
 When going up the tree, improve nodes on the path!
 On a Find operation point all the nodes on the search
path directly to the root. This is called “path
compression.”

1 7 1 7

2 5 4 PC-Find(3) 2 3 6 5 4

6 8 9 10 8 9

3 10 133
PC-Find(x)

134
3 7

1 6 8

2 4

9 5

135
PC-Find(i) {
//find root
j = i;
while (up[j] >= 0) {
j = up[j];
root = j;

//compress path
if (i != root) {
parent = up[i];
while (parent != root) {
up[i] = root;
i = parent;
parent = up[parent];
}
}
return(root)
}
136
 Worst case time complexity for…
 …a single Union-by-size is:
 …a single PC-Find is:

 Time complexity for m  n operations on n


elements has been shown to be O(m log* n).
[See Weiss for proof.]
 Amortized complexity is then O(log* n)
 What is log* ?

137
LOG* N
log* n = number of times you need to apply
log to bring value down to at most 1

log* 2 = 1
log* 4 = log* 22 = 2
log* 16 = log* 222 = 3 (log log log 16 = 1)
22
log* 65536 = log* 2 = 4 (log log log log 65536 = 1)
2

log* 265536 = …………… ≈ log* (2 x 1019,728) = 5

log * n ≤ 5 for all reasonable n.

138
In fact, Tarjan showed the time complexity for
m  n operations on n elements is:
Q(m a(m, n))
Amortized complexity is then Q(a(m, n)) .
What is a(m, n)?
 Inverse of Ackermann’s function.
 For reasonable values of m, n, grows even slower than log * n. So,
it’s even “more constant.”

Proof is beyond scope of this class. A simple


algorithm can lead to incredibly hardcore
analysis!
139

You might also like