0% found this document useful (0 votes)
37 views

Directed Graphs: Digraph Search Transitive Closure Topological Sort Strong Components

Directed graphs, also known as digraphs, are mathematical structures used to represent one-way relationships between objects. Digraphs consist of vertices connected by edges that have orientations indicating the direction of the relationship. Some applications of digraphs include representing dependencies between software modules, modeling food webs using prey-predator relationships, and showing hyperlinks between web pages.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Directed Graphs: Digraph Search Transitive Closure Topological Sort Strong Components

Directed graphs, also known as digraphs, are mathematical structures used to represent one-way relationships between objects. Digraphs consist of vertices connected by edges that have orientations indicating the direction of the relationship. Some applications of digraphs include representing dependencies between software modules, modeling food webs using prey-predator relationships, and showing hyperlinks between web pages.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Directed Graphs

 digraph search
 transitive closure
 topological sort
 strong components

References:
Algorithms in Java, Chapter 19
http://www.cs.princeton.edu/introalgsds/52directed

1
Directed graphs (digraphs)

Set of objects with oriented pairwise connections.


one-way streets in a map
hyperlinks connecting web pages 0
25 1
0 34 2
3
4
7 5
6
2 7
10 40 8
9
19 33 10
29 15
41 49 11
12
8
44 13
14
45 15
16
28 17
1 14 18
19
22 48 20
39 21
18 6 21 22
23
24
25
42 13
26
23 27
31 47 28
11 12 29
32 30
30 31
26 32
33
5 37 34
27
9 16 35
36
43 37

dependencies in software modules prey-predator relationships


38
24 39
4
40
38 41
42
3 17 43
35
36 44
46
20 45
46
47
48
49

6 22

Page ranks with histogram for a larger example

2
Digraph applications

digraph vertex edge


financial stock, currency transaction

transportation street intersection, airport highway, airway route

scheduling task precedence constraint

WordNet synset hypernym

Web web page hyperlink

game board position legal move

telephone person placed call

food web species predator-prey relation

infectious disease person infection

citation journal article citation

object graph object pointer

inheritance hierarchy class inherits from

control flow code block jump

3
Some digraph problems

Transitive closure.
Is there a directed path from v to w?

Strong connectivity.
Are all vertices mutually reachable?

Topological sort.
Can you draw the digraph so that all edges point
from left to right?

PERT/CPM.
Given a set of tasks with precedence constraints,
how we can we best complete them all?

Shortest path. Find best route from s to t


in a weighted digraph

PageRank. What is the importance of a web page? 4


Digraph representations

Vertices
• this lecture: use integers between 0 and V-1.
• real world: convert between names and integers with symbol table.

Edges: four easy options


• list of vertex pairs
• vertex-indexed adjacency arrays (adjacency matrix)
• vertex-indexed adjacency lists
• vertex-indexed adjacency SETs
0
7 8

1 2 6 9 10
Same as undirected graph
BUT
orientation of edges is significant.
3 4 11 12

5
Adjacency matrix digraph representation

Maintain a two-dimensional V  V boolean array.


For each edge vw in graph: adj[v][w] = true.

to

0 0 1 2 3 4 5 6 7 8 9 10 11 12

one entry 0 0 1 1 0 0 1 1 0 0 0 0 0 0
for each
edge 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 2 6 2 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0 0
3 4 5 0 0 0 1 1 0 0 0 0 0 0 0 0
from 6 0 0 0 0 1 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 1 0 0 0 0
5
9 10 8 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 1 1 1
10 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 1
7 8 11 12
12 0 0 0 0 0 0 0 0 0 0 0 0 0

6
Adjacency-list digraph representation

Maintain vertex-indexed array of lists.

0: 5 2 1 6
0
1:

2:
one entry
3:
1 2 6 for each
4: 3 edge

5: 4 3
3 4
6: 4

5 7: 8
9 10
8:

9: 10 11 12

7 8 11 12 10:

11: 12

12:
7
Adjacency-SET digraph representation

Maintain vertex-indexed array of SETs.

0: { 1 2 5 6 }
0
1: { }

2: { }

1 2 6 3: { } one entry
for each
4: { 3 } edge
5: { 3 4 }
3 4
6: { 4 }

7: { 8 }
5
9 10 { }
8:

9: { 10 11 12 }

7 8 11 12 10: { }

11: { 12 }

12: { }

8
Adjacency-SET digraph representation: Java implementation

Same as Graph, but only insert one copy of each edge.

public class Digraph


{
private int V; adjacency
private SET<Integer>[] adj; SETs

public Digraph(int V)
{
this.V = V;
adj = (SET<Integer>[]) new SET[V]; create empty
for (int v = 0; v < V; v++) V-vertex graph
adj[v] = new SET<Integer>();
}

public void addEdge(int v, int w)


{
adj[v].add(w); add edge from v to w
(Graph also has adj[w].add[v])
}

public Iterable<Integer> adj(int v)


{
iterable SET for
return adj[v]; v’s neighbors
}
}
9
Digraph representations

Digraphs are abstract mathematical objects, BUT


• ADT implementation requires specific representation.
• Efficiency depends on matching algorithms to representations.

edge between iterate over edges


representation space
v and w? incident to v?
list of edges E E E

adjacency matrix V2 1 V

adjacency list E+V degree(v) degree(v)

adjacency SET E+V log (degree(v)) degree(v)

In practice: Use adjacency SET representation


• Take advantage of proven technology
• Real-world digraphs tend to be “sparse”
[ huge number of vertices, small average vertex degree]
• Algs all based on iterating over edges incident to v.
10
Typical digraph application: Google's PageRank algorithm

Goal. Determine which web pages on Internet are important.


Solution. Ignore keywords and content, focus on hyperlink structure.

Random surfer model.


• Start at random page.
• With probability 0.85, randomly select a hyperlink to visit next;
with probability 0.15, randomly select any page.
• PageRank = proportion of time random surfer spends on each page.
25
0 34

Solution 1: Simulate random surfer for a long time. 7

10 40
2

Solution 2: Compute ranks directly until they converge


19 33
29 15
41 49
8
44

Solution 3: Compute eigenvalues of adjacency matrix! 28


1
45

14

22 48
39
18 6 21

None feasible without sparse digraph representation 42 13


23
31 47
11 12
32
30
26

Every square matrix is a weighted digraph


5 37
27
9 16
43

24
4
38
3 17
35
36
46
20
11
 digraph search
 transitive closure
 topological sort
 strong components

12
Digraph application: program control-flow analysis

Every program is a digraph (instructions connected to possible successors)

Dead code elimination.


Find (and remove) unreachable code

can arise from compiler optimization (or bad code)

Infinite loop detection.


Determine whether exit is unreachable

can’t detect all possible infinite loops (halting problem)

13
Digraph application: mark-sweep garbage collector

Every data structure is a digraph (objects connected by references)

Roots. Objects known to be directly


accessible by program (e.g., stack).

Reachable objects.
Objects indirectly accessible by
program (starting at a root and
following a chain of pointers).

easy to identify pointers in type-safe language

Mark-sweep algorithm. [McCarthy, 1960]


• Mark: mark all reachable objects.
• Sweep: if object is unmarked, it is garbage, so add to free list.

Memory cost: Uses 1 extra mark bit per object, plus DFS stack. 14
Reachability

Goal. Find all vertices reachable from s along a directed path.

15
Reachability

Goal. Find all vertices reachable from s along a directed path.

16
Digraph-processing challenge 1:

Problem: Mark all vertices reachable from a given vertex.

How difficult?
1) any COS126 student could do it
2) need to be a typical diligent COS226 student
3) hire an expert 0-1
4) intractable 0 0-6
0-2
5) no one knows 3-4
1 2 6 3-2
5-4
5-0
3 4 3-5
2-1
5 6-4
3-1

17
Depth-first search in digraphs

Same method as for undirected graphs

Every undirected graph is a digraph


• happens to have edges in both directions
• DFS is a digraph algorithm

DFS (to visit a vertex v)


Mark v as visited.
Visit all unmarked vertices w adjacent to v.

recursive

18
Depth-first search (single-source reachability)

Identical to undirected version (substitute Digraph for Graph).

public class DFSearcher


{
private boolean[] marked; true if
connected to s

public DFSearcher(Digraph G, int s)


{
constructor
marked = new boolean[G.V()]; marks vertices
dfs(G, s); connected to s
}

private void dfs(Digraph G, int v)


{
marked[v] = true;
recursive DFS
for (int w : G.adj(v)) does the work
if (!marked[w]) dfs(G, w);
}

public boolean isReachable(int v)


{ client can ask whether
return marked[v]; any vertex is
connected to s
}
}
19
Depth-first search (DFS)

DFS enables direct solution of simple digraph problems.



 Reachability.
•Cycle detection
•Topological sort
•Transitive closure.
•Is there a path from s to t ?
stay tuned
Basis for solving difficult digraph problems.
• Directed Euler path.
• Strong connected components.

20
Breadth-first search in digraphs

Same method as for undirected graphs

Every undirected graph is a digraph


• happens to have edges in both directions
• BFS is a digraph algorithm

BFS (from source vertex s)

Put s onto a FIFO queue.


Repeat until the queue is empty:
 remove the least recently added vertex v
 add each of v's unvisited neighbors to the
queue and mark them as visited.

Visits vertices in increasing distance from s 21


Digraph BFS application: Web Crawler

The internet is a digraph

Goal. Crawl Internet, starting from some root website.


Solution. BFS with implicit graph.
0 .0
25 1 .0
0 34 2 .0
3 .0

BFS. 7
4
5
.0
.0


6 .0
2 7 .0

Start at some root website


10 40 8 .0
9 .0
19 33 10 .0
29 15
41 49 11 .0

( say http://www.princeton.edu.).
12 .0
8
44 13 .0
14 .0


45 15 .0

Maintain a Queue of websites to explore.


16 .0
28 17 .0
1 14 18 .0


19 .0

Maintain a SET of discovered websites.


22 48 20 .0
39 21 .0
18 6 21 22 .0


23 .0

Dequeue the next website


24 .0
25 .0
42 13
26 .0
23 27 .0
47 28 .0

and enqueue websites to which it links


31
11 12 29 .0
32 30 .0
30 31 .0
26 32 .0

(provided you haven't done so before).


33 .0
5 37 34 .0
27
9 16 35 .0
36 .0
43 37 .0
38 .0
24 39 .0
4
40 .0
38 41 .0
42 .0
3 17 43 .0
35
36 44 .0
46

Q. Why not use DFS?


20 45 .0
46 .0
47 .0
48 .0
49 .0

A. Internet is not fixed (some pages generate new ones when visited)
6 22

subtle point:
Page ranks think
with histogram for a largerabout
example it!
22
Web crawler: BFS-based Java implementation

Queue<String> q = new Queue<String>(); queue of sites to crawl


SET<String> visited = new SET<String>(); set of visited sites

String s = "http://www.princeton.edu";
q.enqueue(s); start crawling from s
visited.add(s);

while (!q.isEmpty())
{
String v = q.dequeue(); read in raw html for next site in queue
System.out.println(v);
In in = new In(v);
String input = in.readAll();
http://xxx.yyy.zzz

String regexp = "http://(\\w+\\.)*(\\w+)";


Pattern pattern = Pattern.compile(regexp); use regular expression
Matcher matcher = pattern.matcher(input); to find all URLs in site
while (matcher.find())
{
String w = matcher.group();
if (!visited.contains(w))
{ if unvisited, mark as visited
visited.add(w); and put on queue
q.enqueue(w);
}
}
}
23
 digraph search
 transitive closure
 topological sort
 strong components

24
Graph-processing challenge (revisited)

Problem: Is there a path from s to t ?


Goals: linear ~(V + E) preprocessing time
constant query time
0
0-1
1 2 6 0-6
How difficult? 0-2
4-3
1) any COS126 student could do it 3 4 5-3
2) need to be a typical diligent COS226 student 5-4
5
3) hire an expert
4) intractable
5) no one knows
6) impossible

25
Digraph-processing challenge 2

Problem: Is there a directed path from s to t ?


Goals: linear ~(V + E) preprocessing time
constant query time

How difficult?
1) any COS126 student could do it
2) need to be a typical diligent COS226 student
3) hire an expert
4) intractable 0-1
5) no one knows 0 0-6
0-2
6) impossible 3-4
1 2 6 3-2
5-4
5-0
3 4 3-5
2-1
5 6-4
1-3

26
Transitive Closure

The transitive closure of G has an directed edge from v to w


if there is a directed path from v to w in G

graph is usually sparse

TC is usually dense
so adjacency matrix
Transitive closure representation is OK
of G

27
Digraph-processing challenge 2 (revised)

Problem: Is there a directed path from s to t ?


Goals: ~V2 preprocessing time
constant query time

How difficult?
1) any COS126 student could do it
2) need to be a typical diligent COS226 student
3) hire an expert
4) intractable 0-1
5) no one knows 0 0-6
0-2
6) impossible 3-4
1 2 6 3-2
5-4
5-0
3 4 3-5
2-1
5 6-4
1-3

28
Digraph-processing challenge 2 (revised again)

Problem: Is there a directed path from s to t ?


Goals: ~VE preprocessing time (~V3 for dense digraphs)
~V2 space
constant query time

How difficult?
1) any COS126 student could do it
2) need to be a typical diligent COS226 student
3) hire an expert 0-1
4) intractable 0 0-6
0-2
5) no one knows 3-4
6) impossible 1 2 6 3-2
5-4
5-0
3 4 3-5
2-1
5 6-4
1-3

29
Transitive closure: Java implementation

Use an array of DFSearcher objects,


one for each row of transitive closure public class DFSearcher
{
private boolean[] marked;
public DFSearcher(Digraph G, int s)
{
marked = new boolean[G.V()];
dfs(G, s);
}
private void dfs(Digraph G, int v)
{
marked[v] = true;
public class TransitiveClosure for (int w : G.adj(v))
if (!marked[w]) dfs(G, w);
{ }
public boolean isReachable(int v)
{
private DFSearcher[] tc; }
return marked[v];
}
public TransitiveClosure(Digraph G)
{
tc = new DFSearcher[G.V()];
for (int v = 0; v < G.V(); v++)
tc[v] = new DFSearcher(G, v);
}

public boolean reachable(int v, int w)


{
return tc[v].isReachable(w); is there a directed path from v to w ?
}
}
30
 digraph search
 transitive closure
 topological sort
 strong components

31
Digraph application: Scheduling

Scheduling. Given a set of tasks to be completed with precedence


constraints, in what order should we schedule the tasks?

Graph model.
• Create a vertex v for each task.
• Create an edge vw if task v must precede task w.
• Schedule tasks in topological order.
tasks

0. read programming assignment


1. download files
precedence 2. write code
constraints 3. attend precept

12. sleep

feasible
schedule

32
Topological Sort

DAG. Directed acyclic graph.

Topological sort. Redraw DAG so all edges point left to right.

Observation. Not possible if graph has a directed cycle.

33
Digraph-processing challenge 3

Problem: Check that the digraph is a DAG.


If it is a DAG, do a topological sort.
Goals: linear ~(V + E) preprocessing time
provide client with vertex iterator for topological order

How difficult?
1) any CS126 student could do it 0-1
2) need to be a typical diligent CS226 student 0-6
0-2
3) hire an expert 0-5
4) intractable 2-3
4-9
5) no one knows 6-4
6) impossible
6-9
7-6
8-7
9-10
9-11
9-12
11-12

34
Topological sort in a DAG: Java implementation

public class TopologicalSorter standard DFS


{ with 5
private int count;
extra lines of code
private boolean[] marked;
private int[] ts;

public TopologicalSorter(Digraph G)
{
marked = new boolean[G.V()];
ts = new int[G.V()];
count = G.V();
for (int v = 0; v < G.V(); v++)
if (!marked[v]) tsort(G, v);
}

private void tsort(Digraph G, int v)


{
marked[v] = true;
for (int w : G.adj(v))
if (!marked[w]) tsort(G, w);
ts[--count] = v;
} add iterator that returns
} ts[0], ts[1], ts[2]...

Seems easy? Missed by experts for a few decades


35
Topological sort of a dag: trace

“visit” means “call tsort()” and “leave” means “return from tsort()
marked[] ts[]
adj SETs
visit 0: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0: 1 2 5
visit 1: 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1: 4
visit 4: 1 1 0 0 1 0 0 0 0 0 0 0 0 0 2:
2 5 1
leave 4: 1 1 0 0 1 0 0 0 0 0 0 0 0 4 3: 2 4 5 6
leave 1: 1 1 0 0 1 0 0 0 0 0 0 0 1 4 4:
visit 2: 1 1 1 0 1 0 0 0 0 0 0 0 1 4 3 4 5: 2
leave 2: 1 1 1 0 1 0 0 0 0 0 0 2 1 4 6: 0 4
visit 5: 1 1 1 0 1 1 0 0 0 0 0 2 1 4 6
check 2: 1 1 1 0 1 1 0 0 0 0 0 2 1 4
leave 5: 1 1 1 0 1 1 0 0 0 0 5 2 1 4
leave 0: 1 1 1 0 1 1 0 0 0 0 5 2 1 4 0 3
check 1: 1 1 1 0 1 1 0 0 0 0 5 2 1 4
check 2: 1 1 1 0 1 1 0 0 0 0 5 2 1 4
1 2 5 6
visit 3: 1 1 1 1 1 1 0 0 0 0 5 2 1 4
check 2: 1 1 1 1 1 1 0 0 0 0 5 2 1 4
check 4: 1 1 1 1 1 1 0 0 0 0 5 2 1 4 4
check 5: 1 1 1 1 1 1 0 0 0 0 5 2 1 4
visit 6: 1 1 1 1 1 1 1 0 0 0 5 2 1 4
leave 6: 1 1 1 1 1 1 1 0 6 0 5 2 1 4
leave 3: 1 1 1 1 1 1 1 3 6 0 5 2 1 4
check 4: 1 1 1 1 1 1 0 3 6 0 5 2 1 4
check 5: 1 1 1 1 1 1 0 3 6 0 5 2 1 4 3 6 0 5 2 1 4
check 6: 1 1 1 1 1 1 0 3 6 0 5 2 1 4
36
Topological sort in a DAG: correctness proof

Invariant: public class TopologicalSorter

tsort(G, v) visits all vertices


{
private int count;
private boolean[] marked;
reachable from v with a directed path private int[] ts;

public TopologicalSorter(Digraph G)
{

Proof by induction:
marked = new boolean[G.V()];
ts = new int[G.V()];

• w marked: vertices reachable from w


count = G.V();
for (int v = 0; v < G.V(); v++)
if (!marked[v]) tsort(G, v);
are already visited }

• w not marked: call tsort(G, w) to private void tsort(Digraph G, int v)


{

visit the vertices reachable from w


marked[v] = true;
for (int w : G.adj(v))
if (!marked[w]) tsort(G, w);
ts[--count] = v;
}
Therefore, algorithm is correct }

in placing v before all vertices visited


during call to tsort(G, v) just before returning.

Q. How to tell whether the digraph has a cycle (is not a DAG)?
A. Use TopologicalSorter (exercise)
37
Topological sort applications.

• Causalities.
• Compilation units.
• Class inheritance.
• Course prerequisites.
• Deadlocking detection.
• Temporal dependencies.
• Pipeline of computing jobs.
• Check for symbolic link loop.
• Evaluate formula in spreadsheet.
• Program Evaluation and Review Technique / Critical Path Method

38
Topological sort application (weighted DAG)

Precedence scheduling index task time prereq

• Task v takes time[v] units of time. A begin 0 -

• Can work on jobs in parallel. B framing 4 A

• Precedence constraints: C roofing 2 B

• must finish task v before beginning task w. D siding 6 B

• Goal: finish each task as soon as possible


windows
E 5 D
F plumbing 3 D
G electricity 4 C, E
Example: F H paint 6 C, E
D
3 I finish 0 F, H
6
E
5

A B G H I
0 4 C 4 6 0
2

vertices labelled
A-I in topological order

39
Program Evaluation and Review Technique / Critical Path Method

PERT/CPM algorithm.
• compute topological order of vertices.
• initialize fin[v] = 0 for all vertices v.
• consider vertices v in topologically sorted order.
for each edge vw, set fin[w]= max(fin[w], fin[v] + time[w])

13
10
F
D
critical 6 15 3
path E
5 25
4 19 25 13
A B 6 G H I
0 4 C 4 6 0
2

Critical path
• remember vertex that set value.
• work backwards from sink
40
 digraph search
 transitive closure
 topological sort
 strong components

41
Strong connectivity in digraphs

Analog to connectivity in undirected graphs


In a Graph, u and v are connected In a Digraph, u and v are strongly connected
when there is a path from u to v when there is a directed path from u to v
and a directed path from v to u

0 0
6 6
7 8 7 8
1 2 1 2
9 10 9 10
3 4 3
4
11 12 11 12
5 5

3 connected components 4 strongly connected components


(sets of mutually connected vertices) (sets of mutually strongly connected vertices)

Connectivity table (easy to compute with DFS) Strong connectivity table (how to compute?)
0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12
cc 0 0 0 0 0 0 0 1 1 2 2 2 2 sc 2 1 2 2 2 2 2 3 3 0 0 0 0

public int connected(int v, int w) public int connected(int v, int w)


{ return cc[v] == cc[w]; } { return cc[v] == cc[w]; }

constant-time client connectivity query constant-time client strong connectivity query 42


Digraph-processing challenge 4

Problem: Is there a directed cycle containing s and t ?


Equivalent: Are there directed paths from s to t and from t to s?
Equivalent: Are s and t strongly connected?

Goals: linear (V + E) preprocessing time (like for undirected graphs)


constant query time

How difficult?
1) any COS126 student could do it
2) need to be a typical diligent COS226 student
3) hire an expert
4) intractable
5) no one knows
6) impossible

43
Typical strong components applications

Ecological food web Software module dependency digraphs


Firefox

Internet explorer

Strong component: subset with common energy flow Strong component: subset of mutually interacting modules
• source in kernel DAG: needs outside energy? • approach 1: package strong components together
• sink in kernel DAG: heading for growth? • approach 2: use to improve design!
44
Strong components algorithms: brief history

1960s: Core OR problem


• widely studied
• some practical algorithms
• complexity not understood

1972: Linear-time DFS algorithm (Tarjan)


• classic algorithm
• level of difficulty: CS226++
• demonstrated broad applicability and importance of DFS

1980s: Easy two-pass linear-time algorithm (Kosaraju)


• forgot notes for teaching algorithms class
• developed algorithm in order to teach it!
• later found in Russian scientific literature (1972)

1990s: More easy linear-time algorithms (Gabow, Mehlhorn)


• Gabow: fixed old OR algorithm
• Mehlhorn: needed one-pass algorithm for LEDA 45
Kosaraju's algorithm

Simple (but mysterious) algorithm for computing strong components


• Run DFS on GR and compute postorder.
• Run DFS on G, considering vertices in reverse postorder
• [has to be seen to be believed: follow example in book]

GR

Theorem. Trees in second DFS are strong components. (!)


Proof. [stay tuned in COS 423]
46
Digraph-processing summary: Algorithms of the day

Single-source
DFS
reachability

transitive closure DFS from each vertex

topological sort
DFS
(DAG)

Kosaraju
strong components
DFS (twice)

47

You might also like