0% found this document useful (0 votes)

33 views

Graph Theory

Uploaded by

hraikar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Graph Theory

Uploaded by

hraikar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 146

UNIT 1: INTRODUCTION TO GRAPHS

Structure:

1.0 Objectives
1.2 Introduction
1.3 Basic Definitions

1.4 Graph Data Structure

1.5 Representation of Graphs

1.6 Summary

1.7 Keywords

1.8 Questions

1.9 References

1.0 OBJECTIVES

After studying this unit, we will be able to explain the following:

 Basic terminologies of graph

 Graph Data Structure.
 Graph Representation based on Sequential Allocation
 Graph Representation based on Linked Allocation.

1.1 INTRODUCTION

Graphs are a fundamental data structure used to represent and model various real-world
scenarios and relationships. A graph is a collection of nodes (vertices) connected by edges
(links or arcs). They are widely used in computer science, mathematics, social networks,
transportation systems, computer networks, and more.

Key Terminology:
1. Node (Vertex): A node is a fundamental unit in a graph and represents an entity or an
element. It can be anything, such as a person, location, computer, or abstract concept.
Nodes are usually depicted as circles or points in visual representations.
2. Edge (Link or Arc): An edge represents a connection or a relationship between two
nodes. It can be directed (one-way) or undirected (two-way). Directed edges have an
arrow indicating the direction of the connection.
3. Degree of a Node: The degree of a node is the number of edges connected to it. In
the case of directed graphs, a node has both an in-degree (number of incoming edges)
and an out-degree (number of outgoing edges).
4. Path: A path in a graph is a sequence of nodes where each node is connected to the
next node by an edge. Paths can be open (starting and ending at different nodes) or
closed (forming a loop).
5. Cycle: A cycle is a closed path in the graph, where the first and last node are the
same.
6. Connected Graph: A graph is connected if there is a path between every pair of
nodes. In other words, there are no isolated or disconnected components.
7. Weighted Graph: In a weighted graph, each edge is assigned a numerical value
called a weight. It represents some kind of cost, distance, or measure associated with
the connection.
8. Directed Acyclic Graph (DAG): A directed acyclic graph is a graph without cycles,
meaning there is no path that starts and ends at the same node, and there is a clear
direction for the edges.
9. Graph Representation: Graphs can be represented using different data structures,
such as an adjacency matrix or an adjacency list. Each method has its own advantages
and is chosen based on the specific use case and operations needed.

Types of Graphs:

 Undirected Graph: In an undirected graph, edges have no direction, and the

connection between nodes is bidirectional.
 Directed Graph (Digraph): In a directed graph, edges have a specific direction,
indicating a one-way relationship from one node to another.
 Weighted Graph: In a weighted graph, edges have associated weights or costs,
reflecting the "cost" of traversing the edge.
 Cyclic Graph: A graph that contains at least one cycle.
 Acyclic Graph: A graph that has no cycles.
 Connected Graph: A graph in which there is a path between every pair of nodes.
 Disconnected Graph: A graph with two or more connected components, where there
is no path between nodes in different components.

Graphs are an essential tool for solving many problems, and various algorithms have been
developed to work with graphs efficiently, such as breadth-first search (BFS), depth-first
search (DFS), Dijkstra's algorithm, and minimum spanning tree algorithms, among others.
Understanding graphs and their properties is crucial for tackling a wide range of
computational and real-world challenges.

1.2 BASIC DEFINITIONS

Definition1: A graph G = (V,E) is a finite nonempty set V of objects called vertices

together with a (possibly empty) set E of unordered pairs of distinct vertices of G called
edges.

Definition2: A digraph G = (V,E) is a finite nonempty set V of vertices together with a

(possibly empty) set E of ordered pairs of vertices of G called arcs

An arc that begins and ends at a same vertex u is called a loop. We usually (but not always)
disallow loops in our digraphs. By being defined as a set, E does not contain duplicate (or
multiple) edges/arcs between the same two vertices. For a given graph (or digraph) G, we
also denote the set of vertices by V (G) and the set of edges (or arcs) by E (G) to lessen any
ambiguity.

Definition3: The order of a graph (digraph) G = (V, E) is |V| sometimes denoted by |G| and
the size of this graph is |E|

Sometimes we view a graph as a digraph where every unordered edge (u, v) is replaced by
two directed arcs (u, v) and (v, u). In this case, the size of a graph is half the size of the
corresponding digraph.

Definition 4: A walk in a graph (digraph) G is a sequence of vertices v0,v1…vn such that for
all 0 ≤ i < n, (vi,vi+1) is an edge (arc) in G. The length of the walk v0,v1…vn is the number n. A
path is a walk in which no vertex is repeated. A cycle is a walk (of length at least three for
graphs) in which v0 = vn and no other vertex is repeated; sometimes, it is understood, we omit
vn from the sequence.

In the next example, we display a graph G1 and a digraph G2 both of order 5. The size of the
graph G1 is 6 where E(G1) = {(0, 1), (0, 2), (1, 2), (2, 3), (2, 4), (3, 4) while the size of the
graph G2 is 7 where E(G2) = {(0, 2), (1, 0), (1, 2), (1, 3), (3, 1), (3, 4), (4, 2)}.

A pictorial example of a graph G1 and a digraph G2 is given in figure 1.1

Figure 1.1 A Graph G1 and a digraph G2

Example 1: For the graph G1 of Figure 1.1, the following sequences of vertices are classified
as being walks, paths, or cycles.

v0,v1…vn is walk? is path? is cycle?

01234 Yes Yes No

0120 Yes No Yes
012 Yes Yes Yes
032 No No No
010 Yes No No

Example 2: For the graph G1 of Figure 1.1, the following sequences of vertices are classified
as being walks, paths, or cycles.
v0,v1…vn is walk? is path? is cycle?

01234 No No No
024 No No No
312 Yes Yes No
131 Yes No Yes
31310 Yes No No

Definition 5: A graph G is connected if there is a path between all pairs of vertices u and v of
V(G). A digraph G is strongly connected if there is a path from vertex u to vertex v for all
pairs u an v in V(G).

In Figure 1.1, the graph G1 is connected by the digraph G2 is not strongly connected because
there are no arcs leaving vertex 2. However, the underlying graph G2 is connected.

Definition 6: In a graph, the degree of a vertex v, denoted by deg(v), is the number of edges
incident to v. For digraphs, the out-degree of a vertex v is the number of arcs {(v, x) Є E | x Є
V} incident from v (leaving v) and the in-degree of vertex v is the number of arcs {(v, x) Є E |
x Є V} incident to v (entering v).

For a graph, the in-degree and out-degree’s are the same as the degree. For out graph G1, we
have deg(0) = 2, deg(2) = 4, deg(3) = 2 and deg(4) = 2. We may concisely write this as a
degree sequence (2, 2, 4, 2, 2) id there is a natural ordering (e.g., 0, 1, 2, 3, 4) of the vertices.
The in-degree sequence and out-degree sequence of the digraph G2 are (1, 1, 3, 1, 1) and (1,
3, 0, 2, 1), respectively. The degree of a vertex of a digraph is sometimes defined as the sum
of its in-degree and out-degree. Using this definition, a degree sequence of G2 would be (2, 4,
3, 3, 2).

Definition 7: A weighted graph is a graph whose edges have weights. These weights can be
thought as cost involved in traversing the path along the edge. Figure 1.2 shows a weighted
graph.
Figure 1.2 A weighted graph

Definition 8: If removal of an edge makes a graph disconnected then that edge is called
cutedge or bridge.

Definition 9: If removal of a vertex makes a graph disconnected then that vertex is called
cutvertex.

Definition 10: A connected graph without a cycle in it is called a tree. The pendent vertices
of a tree are called leaves.

Definition 11: A graph without self loop and parallel edges is called a simple graph.

Definition 12: A graph which can be traced without repeating any edge is called an Eulerian
graph. If all vertices of a graph happen to be even degree then the graph is called an Eulerian
graph.

Definition 13: If two vertices of a graph are odd degree and all other vertices are even then it
is called open Eulerian graph. In open Eulerian graph the starting and ending points must be
odd degree vertices.

Definition 14: A graph in which all vertices can be traversed without repeating any edge but
can have any number of edges is called Hamiltonian graph.

Definition 15: Total degree of a graph is twice the number of edges. That is, the total degree
= 2* |E|

Corollary: Number of odd degree vertices of a graph is always even.

 Total degree = Sum of degrees of all vertices = 2 * |E| = Even.

 Sum of degrees of all even degree vertices + Sum of degrees of all odd degree
vertices = Even.

 Even + Sum of vertices of all odd degree vertices = Even.

 Sum of vertices of all odd degree vertices = Even – Even = Even.

1.3 GRAPH DATA STRUCTURE

We can formally define graph as an abstract data type with data objects and operations on it
as follows:

Data objects: A graph G of vertices and edges. Vertices represent data objects.

Operations:

 Check-Graph-Empty(G): Check if graph G is empty - Boolean function

 Insert-Vertex(G, V): Insert an isolated vertex V into a graph G. Ensure that vertex V
does not exist in G before insertion.
 Insert-Edge(G, u, v): Insert an edge connecting vertices u, v into a graph G. Ensure
that an edge does not exist in G before insertion.
 Delete-Vertex(G, V): Delete vertex V and all the edges incident on it from the graph
G. Ensure that such a vertex exists in the graph G before deletion.
 Delete-Edge(G, u, v): Delete an edge from the graph G connecting the vertices u, v.
Ensure that such an edge exists before deletion.
 Store-Data(G, V, Item): Store Item into a vertex V of graph G.
 Retrieve-Data(G, V, Item): Retrieve data of a vertex V in the graph G and return it
in Item.
 BFT(G): Perform Breath First Traversal of a graph.
 DFT(G): Perform Depth First Traversal of a graph.

1.4 REPRESENTATION OF GRAPHS

A graph is a mathematical structure and it is required to be represented as a suitable data

structure so that very many applications can be solved using digital computer. The
representation of graphs in a computer can be categorized as (i) sequential representation and
(ii) linked representation. The sequential representation makes use of an array data structure
where as the linked representation of a graph makes use of a singly linked list as its
fundamental data structure.

Sequential Representation of Graphs

The sequential or the matrix representations of graphs have the following methods:

 Adjacency Matrix Representation

 Incidence Matrix Representation

Adjacency Matrix Representation

A graph with n nodes can be represented as n x n Adjacency Matrix A such that an element
Ai j

1 if there is an edge between nodes i and j

Ai j =
0 Otherwise

Note that the number of 1s in a row represents the out degree of a node. In case of undirected
graph, the number of 1s represents the degree of the node. Total number of 1s in the matrix
represents number of edges. Figure 1.3(a) shows a graph and Figure 1.3(b) shows its
adjacency matrix.

Figure 1.3(a) Graph Figure 1.3(b) Adjacency matrix

Figure 1.4(a) shows a digraph and Figure 1.4(b) shows its adjacency matrix.

Figure 1.4(a) Digraph Figure 1.4(b) Adjacency matrix

Incidence Matrix Representation

Let G be a graph with n vertices and e edges. Define an n x e matrix M = [mij] whose n rows
corresponds to n vertices and e columns correspond to e edges, as

1 ej incident upon vi
Aij =
0 Otherwise

Matrix M is known as the incidence matrix representation of the graph G. Figure 1.5(a)
shows a graph and Figure 1.5(b) shows its incidence matrix.

e1 e2 e3 e4 e5 e6 e7
v1 1 0 0 0 1 0 0
v2 1 1 0 0 0 1 1
v3 0 1 1 0 0 0 0
v4 0 0 1 1 0 0 1
v5 0 0 0 1 1 1 0

Figure 1.5(a) Undirected graph Figure 1.5(b) Incidence matrix

The incidence matrix contains only two elements, 0 and 1. Such a matrix is called a binary
matrix or a (0, 1)-matrix.

The following observations about the incidence matrix can readily be made:

1. Since every edge is incident on exactly two vertices, each column of in an incidence
matrix has exactly two1’s.
2. The number of 1’s in each row equals the degree of the corresponding vertex.
3. A row with all 0’s, therefore, represents an isolated vertex.

Linked Representation of Graphs

The linked representation of graphs is referred to as adjacency list representation and is

comparatively efficient with regard to adjacency matrix representation. Given a graph G with
n vertices and e edges, the adjacency list opens n head nodes corresponding to the n vertices
of graph G, each of which points to a singly linked list of nodes, which are adjacent to the
vertex representing the head node. Figure 1.6(a-b) shows an undirected its linked
representation. Similarly, Figure 1.7(a-b) shows a digraph and its linked representation.

Figure 1.6(a) Undirected graph Figure 1.6(b) Linked representation of a graph

Figure 1.7(a) Digraph Figure 1.7(b) Linked representation of a graph

1.5 SUMMARY
 Graphs are non-linear data structures. Graph is an important mathematical
representation of a physical problem.
 Graphs and directed graphs are important to computer science for many real world
applications from building compilers to modeling physical communication networks.
 A graph is an abstract notion of a set of nodes (vertices or points) and connection
relations (edges or arcs) between them.
 The representation of graphs in a computer can be categorized as (i) sequential
representation and (ii) linked representation.
 The sequential representation makes use of an array data structure where as the linked
representation of a graph makes use of a singly linked list as its fundamental data
structure.

1.6 KEYWORDS

Non-linear data structures, Undirected graphs, Directed graphs, Walk, Path, Cycle, Cutedge,
Cutvertex, In-degree, Out-degree, Pendent edge, Eulerian graph, Hamiltonian graph,
Adjacency matrix, Incidence matrix.

1.7 QUESTIONS

1. Define graph and directed graph.

2. Define walk, path and cycle with reference to graph.
3. Define connected graph and strongly connected graph and give examples.
4. Define in-degree and out-degree of a graph and give some examples.
5. Define cutedge, cutvertex, pendent vertex, Hamiltonian graph, Eulerian graph.
6. Show that the number of odd degree vertices of a graph is always even.
7. Explain graphs as a data structure.
8. Explain two different ways of sequential representation of a graph with an example.
9. Explain the linked representation of an undirected and directed graph.
1.8 REFERENCES

1. Sartaj Sahni, 2000, Data structures, algorithms and applications in C++, McGraw Hill
international edition.
2. Horowitz and Sahni, 1983, Fundamentals of Data structure, Galgotia publications
3. Narsingh Deo, 1990, Graph theory with applications to engineering and computer
science, Prentice hall publications.
4. Tremblay and Sorenson, 1991, An introduction to data structures with applications,
McGraw Hill edition.
5. C and Data Structures by Practice- Ramesh, Anand and Gautham.
6. Data Structures and Algorithms: Concepts, Techniques and Applications by GAV Pai.
Tata McGraw Hill, New Delhi.

*****
UNIT 2: GRAPH TRAVERSAL

STRUCTURE

2.0 Objectives

2.1 Introduction

2.2 Elementary Graph Algorithms

2.3 Breadth- First Search

2.4 Depth- First Search

2.5 Applications of graph traversals

2.6 Summary

2.7 Keywords

2.8 Questions for self-study

2.9 Reference

2.0 OBJECTIVES

After studying this unit, you should be able to

 Basic terminologies of graph

 Graph Data Structure.
 Breadth first search
 Depth first search

2.1 INTRODUCTION

Graph traversal is a fundamental concept in graph theory and computer science, and it refers
to the process of systematically visiting or exploring all the nodes and edges of a graph. The
goal of graph traversal is to access all the nodes in the graph in a particular order, allowing us
to perform various operations on them or discover specific patterns within the graph. Two
common graph traversal algorithms are Breadth-First Search (BFS) and Depth-First Search
(DFS).

1. Breadth-First Search (BFS): BFS starts from a specified source node and explores the
graph level by level. It visits all the nodes at the current level before moving to the next level.
BFS uses a queue data structure to keep track of the nodes to be visited.

Algorithm for BFS:

1. Create a queue and enqueue the source node.

2. Mark the source node as visited.
3. While the queue is not empty:
 Dequeue a node from the front of the queue.
 Process the node (e.g., print it or perform some operation on it).
 Enqueue all the unvisited neighbors of the current node.
 Mark the neighbors as visited.

BFS is particularly useful for finding the shortest path between two nodes in an unweighted
graph.

2. Depth-First Search (DFS): DFS explores as far as possible along each branch before
backtracking. It starts from a specified source node and explores as deeply as possible before
backtracking. DFS uses a stack data structure (often implemented using recursion) to keep
track of the nodes to be visited.

Algorithm for DFS:

1. Create a stack or use recursion and push the source node.

2. Mark the source node as visited.
3. While the stack is not empty:
 Pop a node from the top of the stack.
 Process the node (e.g., print it or perform some operation on it).
 Push all the unvisited neighbors of the current node onto the stack.
 Mark the neighbors as visited.

DFS is commonly used in topological sorting, cycle detection, and exploring connected
components in a graph.

Choosing Between BFS and DFS: The choice between BFS and DFS depends on the
specific problem and the desired properties of the traversal. Some general guidelines are:

 Use BFS to find the shortest path or the minimum number of edges between two
nodes in an unweighted graph.
 Use DFS to explore all connected components of a graph, perform a depth-based
search, or find cycles in the graph.
 BFS is more memory-efficient for graphs with a wide branching factor, while DFS is
often more memory-efficient for sparse graphs.

Both BFS and DFS are essential techniques for graph theory and have numerous applications
in graph-based algorithms, pathfinding, network analysis, and many other areas of computer
science and engineering.

2.2 REPRESNTATION OF GRAPH

A graph is a mathematical structure and it is required to be represented as a suitable data

Sequential Representation of Graphs

The sequential or the matrix representations of graphs have the following methods:

 Adjacency Matrix Representation

 Incidence Matrix Representation
Adjacency Matrix Representation

A graph with n nodes can be represented as n x n Adjacency Matrix A such that an element
Ai j

1 if there is an edge between nodes i and j

Ai j =
0 Otherwise

Figure 1.3(a) Graph Figure 1.3(b) Adjacency matrix

Figure 1.4(a) shows a digraph and Figure 1.4(b) shows its adjacency matrix.
Figure 1.4(a) Digraph Figure 1.4(b) Adjacency matrix

Incidence Matrix Representation

Let G be a graph with n vertices and e edges. Define an n x e matrix M = [mij] whose n rows
corresponds to n vertices and e columns correspond to e edges, as

1 ej incident upon vi
Aij =
0 Otherwise

Matrix M is known as the incidence matrix representation of the graph G. Figure 1.5(a)
shows a graph and Figure 1.5(b) shows its incidence matrix.

e1 e2 e3 e4 e5 e6 e7
v1 1 0 0 0 1 0 0
v2 1 1 0 0 0 1 1
v3 0 1 1 0 0 0 0
v4 0 0 1 1 0 0 1
v5 0 0 0 1 1 1 0

Figure 1.5(a) Undirected graph Figure 1.5(b) Incidence matrix

The incidence matrix contains only two elements, 0 and 1. Such a matrix is called a binary
matrix or a (0, 1)-matrix.

The following observations about the incidence matrix can readily be made:

4. Since every edge is incident on exactly two vertices, each column of in an incidence
matrix has exactly two1’s.
5. The number of 1’s in each row equals the degree of the corresponding vertex.
6. A row with all 0’s, therefore, represents an isolated vertex.

Linked Representation of Graphs

The linked representation of graphs is referred to as adjacency list representation and is
comparatively efficient with regard to adjacency matrix representation. Given a graph G with
n vertices and e edges, the adjacency list opens n head nodes corresponding to the n vertices
of graph G, each of which points to a singly linked list of nodes, which are adjacent to the
vertex representing the head node. Figure 1.6(a-b) shows an undirected its linked
representation. Similarly, Figure 1.7(a-b) shows a digraph and its linked representation.

Figure 1.6(a) Undirected graph Figure 1.6(b) Linked representation of a graph

Figure 1.7(a) Digraph Figure 1.7(b) Linked representation of a graph

2.3 BREADTH –FIRST -SEARCH

Depth first search algorithm starts visiting nodes of a graph arbitrarily, marking that node as
visited node. Soon after visiting any node (current node) we consider any of its adjacent
nodes as next node for traversal and the current node address will be stored in stack data
structure and traverse to the next adjacent node. The same thing is processed until no node
can be processed further. If there are any nodes which are not visited, then backtracking is
used until all the nodes are visited. In depth first search stack will be used as a storage
structure to store information about the nodes which will be used during backtracking.
Before knowing how to search a node in a graph using depth first search we need to
understand how depth first search can be used for traversal of graph. Consider a graph G as
shown in Figure 1(a). The traversal starts with node 1 (Figure 1(b)), mark the node as
traversed (Gray shading is used to indicate that the node is traversed) and push the node
number 1 into the stack. As it has only one adjacent node 4 we will move to node number 4.
Mark the node number 4 Figure 1(c) and push 4 into the stack. For node number 4 there are 2
adjacent nodes i.e., 2 and 5.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k)
Figure 1: Traversal of a graph using depth first search algorithm

Select one node arbitrarily (For implementation purpose we can select the node with smallest
number) and move to that node, in this case we will move to node 2 and push the node
number 2 to stack. Similarly we will move to node 5 from node 2, pushing 5 into stack and
then move to node 3 from node 5 and push node 3 into the stack (Figure (1(d, e, f, g, h, I, j)).
Figure 1(k) shows the elements present in the stack at the end. From node 3 there is no
possibility to traverse further. From this point onwards we will backtrack to check whether
there are any other nodes which are not traversed. Pop the top node 3 from stack. Now,
check is there any possibility to traverse from the element present in the top of the stack. The
top element is 5 and there is an edge with has not been traversed from the node 5 (See Figure
2(b), the line marked in red color is untraversed edge). This edge leads to 4 which has been
already visited and there is no other possibility for traversing from node 5, pop node 5 from
the stack. Do the same process and at the end there will be no elements in the stack indicating
that all the vertices of the graph have been traversed.

(a) (b) (c)

(d) (e) (f)

(g)
Figure 2. Backtracking operations for the depth first search algorithm

Figure 1 and Figure 2 demonstrated the depth first search for traversal purpose. The same
technique can be used to search an element in the graph. Given a graph with n nodes we can
search whether given node is present in the graph or not. Each time we visit a node we check
whether that node is same as the search node, if it is stop the procedure declaring that the
node is present else push that node into the stack and traverse until you the stack become
empty.
Let us consider a tree example and illustrate the working principle of the depth first search.
Let the search element be F.

Figure 3. A binary tree with 8 nodes.

Figure 4(a)

Figure 4(b).

Figure 4(c)
Figure 4(d)
Figure 4(a) – 4(d) : Various steps in depth first search algorithm.
Note: Depth first search method uses stack as a data structure.

2.4 DEPTH –FIRST -SEARCH

Analogous to depth first search which search the nodes from top to bottom fashion
postponing the traversal of adjacent elements, the breadth first search algorithm first traverse
adjacent nodes of a starting node, then all unvisited nodes in a connected graph will be
traversed in the same manner.
It is convenient to use a queue to trace the operation of breadth first search. The queue is
initialized with the traversal’s starting node, which is marked as visited. On each iteration, the
algorithm identifies all unvisited nodes that are adjacent to the front node, marks them as
visited, and adds them to the queue; after that front node is removed from the queue.
Let us consider the same example of tree traversal Figure 3.
Starting node is A, Insert A into queue mark A as traversed. Move to its successor element
{B, C}, push them to queue and mark them as traversed. Since there is no other adjacent
element to node A, remove A from which is first element in the queue. The next element in
the queue is B, check for its successor node. Since B has no successor elements remove B
from the Queue. The next element in the queue is C, find its successor elements i.e., {D, F}.
Insert them into the queue and correspondingly marks the, as traversed. Since C has no other
elements as its successor remove C from the Queue. The next element in the queue is D, its
successor is E insert it into the queue and mark it as traversed. Now, D has no successor node
hence remove D from the Queue. The next element in the Queue is F, find out is successor
i.e., {H, I}. Insert them into the queue and mark them as visited. Once again the element F
has no successor so remove it from queue and check for next element in the queue. The next
element is E and E has no successor remove it and next elements are H and I. traverse them in
the same way.

For searching an element using breadth first search, similar to depth first search we traverse
the graph using breadth first traversal and while traversing the graph if a node same as search
element occurs we declare that search element is present in the graph.

2.5 APPLICATIONS OF GRAPH TRAVERSALS

Applications Of Graph Traversals:

Graph traversals (BFS and DFS) have a wide range of applications in various fields due to
their ability to explore and analyze the structure of graphs efficiently. Some common
applications of graph traversals include:

1. Pathfinding and Shortest Path: BFS can be used to find the shortest path between
two nodes in an unweighted graph. In scenarios like GPS navigation, routing
algorithms, or network routing, BFS helps find the optimal path from one location to
another.
2. Connected Components: DFS can identify connected components in an undirected
graph. This is useful in social network analysis, where connected components
represent groups of users with strong connections to each other.
3. Cycle Detection: DFS can be employed to detect cycles in a graph. In applications
like deadlock detection in operating systems or resource allocation in networks, cycle
detection helps avoid potential issues.
4. Topological Sorting: DFS can be used to perform topological sorting of a directed
acyclic graph (DAG). It is useful for tasks that require sequencing or dependency
resolution, such as compiling a set of source files with dependencies.
5. Maze Solving: BFS can help solve mazes by finding the shortest path from the
entrance to the exit. It can be applied in robotics, games, and solving puzzles.
6. Network Broadcast and Propagation: BFS can simulate how information spreads in
a network. It can model the propagation of information, rumors, or viruses in social
networks or computer networks.
7. Web Crawling and Indexing: BFS can be utilized for web crawling to explore and
index web pages efficiently. Search engines use graph traversals to build indexes and
retrieve relevant search results.
8. Tree and Graph Algorithms: BFS and DFS are fundamental for various tree and
graph algorithms. These include finding the height of a tree, computing the diameter
of a graph, finding bridges and articulation points, and more.
9. Puzzle Solving and Game AI: BFS and DFS can be applied to puzzle solving and
game artificial intelligence (AI). They help in solving puzzles like the sliding tile
puzzle and finding the optimal moves in board games.
10. Social Network Analysis: Both BFS and DFS are crucial for analyzing social
networks, identifying influential users, and understanding network structures.
11. Robotics and Path Planning: BFS and DFS are used in robotics for path planning,
collision avoidance, and exploration of unknown environments.
12. Language Processing and Grammar Analysis: BFS and DFS can be used in natural
language processing to analyze sentence structures, grammars, and parse trees.

These are just a few examples of the many applications of graph traversals in diverse fields.
Graph theory and graph algorithms play a vital role in solving complex problems and
understanding relationships and structures in various systems and networks.

2.6 SUMMARY
In this unit we have presented the basics of elementary Graph Algorithms. We have presented
the Representation of Graph, Breadth first search and depth first search.
2.7 KEYWORDS
Depth first search
Breadth first search
Queue
Traverse
Non-linear data structures
Undirected graphs

2.8 QUESTIONS FOR SELF STUDY

1) What is a graph and directed graph? Explain.

2) Explain walk, path and cycle with reference to graph.
3) How connected graph and strongly connected graph and give examples

4) Mention the difference between depth first search and breadth first search algorithm.

5) What are the applications of Graph Traversals.

2.9 REFERENCES

1) Sartaj Sahni, 2000, Data structures, algorithms and applications in C++, McGraw Hill
international edition.
2) Horowitz and Sahni, 1983, Fundamentals of Data structure, Galgotia publications
3) Narsingh Deo, 1990, Graph theory with applications to engineering and computer science,
Prentice hall publications.
4) Tremblay and Sorenson, 1991, An introduction to data structures with applications,
McGraw Hill edition.
5) C and Data Structures by Practice- Ramesh, Anand and Gautham.
6) Data Structures and Algorithms: Concepts, Techniques and Applications by GAV Pai.
Tata McGraw Hill, New Delhi.
UNIT – 03: MINIMUM SPANNING TREES

STRUCTURE

3.0 Objectives

3.1 Introduction

3.2 Krushkal and prim-single –source shortest paths

3.3 The Bellman Ford Algorithm single source shortest paths in directed Acyclic graph

3.4 Applications of Minimum Spanning trees in graph theory

3.5 Summary

3.6 Keywords

3.7 Questions for self-study

3.8 Reference

3.0 OBJECTIVES
After studying this unit, you should be able to

 Analyze the complexity of these algorithms.

 Graph Data Structure.
 Single source shortest paths
 Bellman –Ford algorithm

3.1 INTRODUCTION

A graph is an abstract notation used to represent the connection between pairs of objects. A
graph consists of −

 Vertices − Interconnected objects in a graph are called vertices. Vertices are also
known as nodes.

 Edges − Edges are the links that connect the vertices.

There are two types of graphs −

 Directed graph − In a directed graph, edges have direction, i.e., edges go from one
vertex to another.

 Undirected graph − In an undirected graph, edges have no direction.

Graph Coloring

Graph coloring is a method to assign colors to the vertices of a graph so that no two adjacent
vertices have the same color. Some graph coloring problems are −

 Vertex coloring − A way of coloring the vertices of a graph so that no two adjacent
vertices share the same color.

 Edge Coloring − It is the method of assigning a color to each edge so that no two
adjacent edges have the same color.

 Face coloring − It assigns a color to each face or region of a planar graph so that no
two faces that share a common boundary have the same color.

Chromatic Number
Chromatic number is the minimum number of colors required to color a graph. For example,
the chromatic number of the following graph is 3.

The concept of graph coloring is applied in preparing timetables, mobile radio frequency
assignment, Suduku, register allocation, and coloring of maps.

Steps for graph coloring

 Set the initial value of each processor in the n-dimensional array to 1.

 Now to assign a particular color to a vertex, determine whether that color is already
assigned to the adjacent vertices or not.

 If a processor detects same color in the adjacent vertices, it sets its value in the array
to 0.

 After making n2 comparisons, if any element of the array is 1, then it is a valid

coloring.

Pseudocode for graph coloring

begin

create the processors P(i0,i1,...in-1) where 0_iv < m, 0 _ v < n

status[i0,..in-1] = 1

for j varies from 0 to n-1 do

begin
for k varies from 0 to n-1 do
begin
if aj,k=1 and ij=ikthen
status[i0,..in-1] =0
end

end
ok = ΣStatus

if ok > 0, then display valid coloring exists

else
display invalid coloring

end

3.2 KRUSHKAL AND PRIM-SINGLE –SOURCE SHORTEST PATHS

Kruskal’s Algorithm

The Kruskal’s algorithm differs from Prim’s in the following manner. It does not
insist on nearness to a vertex already existing in the partial spanning tree. As long as the new
incoming low cost edge does not form a loop, it is included in the tree. A broad outline of the
algorithm can be listed as follows:

 Choose an edge with the lowest cost. Add it to the spanning tree. Delete it from the
set of edges.

 From the set of edges choose the next low cost edge. Try it on the partial spanning
tree. If no loop is created, add it to the spanning tree, otherwise discard. In either case,
delete it from the set of edges.

 Repeat the operation till (n-1) edges are picked up from the set of edges and added to
the spanning tree which spans over the vertex set V.

We now see its effect on the graph we have considered for the Prim’s algorithm
Complexity
Since kruskal’s method works on the basis of sorting the edges based on their weights, the
complexity in the worst case is O( |E| log |E| ). This complexity can be reduced to O( |V| +
|E|).

Prim’s algorithm

Prim’s algorithm starts with the least cost edge. Then, it chooses another edge that is
adjacent to this edge and is of least cost and attaches it to the first edge. The process
continues as follows:

1) At each stage, choose an edge that is adjacent to any of the nodes of the partially
constructed spanning tree and the least weighted amongst them.

2) If the node selected above forms a loop/circuit, then reject it and select the next edge
that satisfies the criteria.

3) Repeat the process (n-1) times for the graph with n vertices.

To further clarify the situation let us trace the application of Prim’s algorithm to the
following graph.

Now we see how the Prim’s algorithm works on it.

Now we shall devise the formal algorithm.
Complexity

The algorithm, though appears to be fairly complex, can be looked up as made up of

several parts. At each stage, the nearest edge (indicating the edge with the least weight that
connects an outside vertex to a vertex that is in the partially built up spanning tree) is
identified and added to the tree. It can be seen that the complexity of the algorithm is q (n2).
The complexity can be reduced to O( ( n + |E| ) log n). You are expected to refer additional
books and obtain more information about this.

Single Source Shortest Paths

This gives another application for greedy algorithms on graphs. Often, graphs are
used to indicate paths - roadmaps, pipelines etc. Graphs can be used to represent the highway
structure of a state or country with vertices representing cities and edges representing sections
of highway. The edges can then be assigned weights which may be either the distance
between the two cities connected by the edge or the average time to drive along that section
of highway. A motorist wishing to drive from city A to B would be interested in answers to
the following questions:

1) Is there a path from A to B?

2) If there is more than one path from A to B? Which is the shortest path?

The problems defined by these questions are special case of the path problem we
study in this section. The length of a path is now defined to be the sum of the weights of the
edges on that path. The starting vertex of the path is referred to as the source and the last
vertex, destination. The graphs are digraphs representing streets. Consider a digraph G = (V,
E), with the distance to be traveled as weights on the edges. The problem is to determine the
shortest path from v0 to all the remaining vertices of G. It is assumed that all the weights
associated with the edges are positive. The shortest path between v0 and some other node v is
an ordering among a subset of the edges. Hence this problem fits the ordering paradigm.

Example:

Consider the digraph of fig 8-1. Let the numbers on the edges be the costs of traveling along
that route. If a person is interested travel from v1 to v2, then he encounters many paths. Some
of them are

v1→ v2 = 50 units.

v1→ v3→ v4→ v2 = 10 + 15 + 20 = 45 units.

v1→ v5→ v4→ v2 = 45 + 30 + 20 = 95 units.

v1→ v3→ v4→ v5→ v4→ v2 = 10 + 15 + 35 + 30 + 20 = 110 units.

Figure 8.1

The cheapest path among these is the path along v1→ v3→ v4→ v2. The cost of the
path is 10 + 15 + 20 = 45 units. Even though there are three edges on this path, it is cheaper
than traveling along the path connecting v1 and v2 directly i.e., the path v1 → v2 that costs 50
units. One can also notice that, it is not possible to travel to v6 from any other node.

To formulate a greedy based algorithm to generate the cheapest paths, we must

conceive a multistage solution to the problem and also of an optimization measure. One
possibility is to build the shortest paths one by one. As an optimization measure we can use
the sum of the lengths of all paths so far generated. For this measure to be minimized, each
individual path must be of minimum length. If we have already constructed i shortest paths,
then using this optimization measure, the next path to be constructed should be the next
shortest minimum length path. The greedy way to generate these paths in non-decreasing
order of path length is to first, generate a shortest path to the nearest vertex and then a
shortest path to the second nearest vertex is generated, and so on.

A much simpler method would be to solve it using matrix representation. The steps
that should be followed is as follows:

1. Find the adjacency matrix for the given graph. The adjacency matrix for figure 8.1
is given below
2. Consider v1 to be the source and choose the minimum entry in the row v1. In the
above table the minimum in row v1 is 10.
3. Find out the column in which the minimum is present, for the above example it is
column v3. Hence, this is the node that has to be next visited.
The adjacency matrix for figure 8.1

4. Compute a matrix by eliminating v1 and v3 columns. Initially retain only row v1.
The second row is computed by adding 10 to all values of row v3.

The resulting matrix

5. Find the minimum in each column. Now select the minimum from the resulting
row. In the above example the minimum is 25. Repeat step 3 followed by step 4
till all vertices are covered or single column is left.

The solution for the figure 8.1 can be continued as follows

Finally the cheapest path from v1 to all other vertices is given by v1→ v3→ v4→ v2→ v5. The
suggested simple algorithm is called the Dijkstra’s algorithm that finds the path from the
initial vertex v to all other vertices. The devised simple algorithm is as follows.
Complexity

The time taken by this algorithm on a graph with n vertices is O(n2). Any shortest
path algorithm must examine each edge in the graph at least once since any of the edges
could be in a shortest path. Hence the minimum time taken is W(|E|) time. However, since the
costs are represented in a cost matrix, this representation must take W(n2) time. The worst
complexity can be reduced to O((n + |E|) log n) which is left as an assignment for you.

3.3 The Bellman Ford Algorithm single source shortest paths in directed Acyclic graph

One weighted directed acyclic graph is given. Another source vertex is also provided. Now
we have to find the shortest distance from the starting node to all other vertices, in the graph.

To detect Smaller distance, we can use another algorithm like Bellman-Ford for the graph
with negative weight, for positive weight the Dijkstra’s algorithm is also helpful. Here for
Directed Acyclic Graph, we will use the topological sorting technique to reduce complexity.
Input and Output

Input:

The cost matrix of the graph.

0 5 3 -∞ -∞ -∞

-∞ 0 2 6 -∞ -∞

-∞ -∞ 0 7 4 2

-∞ -∞ -∞ 0 -1 1

-∞ -∞ -∞ -∞ 0 -2

-∞ -∞ -∞ -∞ -∞ 0

Output:

Shortest Distance from Source Vertex 1

Infinity 0 2 6 5 3

Algorithm

topoSort(u, visited, stack)

Input: starting node u, the visited list to keep track, the stack.
Output: Sort the nodes in a topological way.
Begin

mark u as visited

for all vertex v, which is connected with u, do

if v is not visited, then

topoSort(v, visited, stack)

done

push u into the stack

End

shortestPath(start)

Input − The starting node.

Output − List of the shortest distance of all vertices from the starting node.

Begin

initially make all nodes as unvisited

for each node i, in the graph, do

if i is not visited, then

topoSort(i, visited, stack)

done

make distance of all vertices as ∞

dist[start] := 0

while stack is not empty, do

pop stack item and take into nextVert

if dist[nextVert] ≠∞, then

for each vertices v, which is adjacent with nextVert, do

if cost[nextVert, v] ≠∞, then

if dist[v] > dist[nectVert] + cost[nextVert, v], then

dist[v] := dist[nectVert] + cost[nextVert, v]

done

for all vertices i in the graph, do

if dist[i] = ∞, then

display Infinity

else

display dist[i]

done

End

Example

#include<iostream>

#include<stack>

#define NODE 6

#define INF 9999

using namespace std;

int cost[NODE][NODE] = {

{0, 5, 3, INF, INF, INF},

{INF, 0, 2, 6, INF, INF},

{INF, INF, 0, 7, 4, 2},

{INF, INF, INF, 0, -1, 1},

{INF, INF, INF, INF, 0, -2},

{INF, INF, INF, INF, INF, 0}

};

void topoSort(int u, bool visited[], stack<int>&stk) {

visited[u] = true; //set as the node v is visited

for(int v = 0; v<NODE; v++) {

if(cost[u][v]) { //for allvertices v adjacent to u

if(!visited[v])

topoSort(v, visited, stk);

stk.push(u); //push starting vertex into the stack

void shortestPath(int start) {

stack<int> stk;

int dist[NODE];

bool vis[NODE];

for(int i = 0; i<NODE;i++)

vis[i] = false; // make all nodes as unvisited at first

for(int i = 0; i<NODE; i++) //perform topological sort for vertices

if(!vis[i])

topoSort(i, vis, stk);

for(int i = 0; i<NODE; i++)

dist[i] = INF; //initially all distances are infinity

dist[start] = 0; //distance for start vertex is 0

while(!stk.empty()) { //when stack contains element, process in topological order

int nextVert = stk.top(); stk.pop();

if(dist[nextVert] != INF) {

for(int v = 0; v<NODE; v++) {

if(cost[nextVert][v] && cost[nextVert][v] != INF){ if(dist[v] > dist[nextVert]

+cost[nextVert][v])dist[v] = dist[nextVert] + cost[nextVert][v];

for(int i = 0; i<NODE; i++)

(dist[i] == INF)?cout << "Infinity ":cout << dist[i]<<" ";

main() {

int start = 1;

cout << "Shortest Distance From Source Vertex "<<start<<endl;

shortestPath(start);

Output

Shortest Distance From Source Vertex 1

Infinity 0 2 6 5 3
3.4 APPLICATIONS OF MINIMUM SPANNING TREES IN GRAPH THEORY

Applications of Minimum Spanning trees in graph theory:

Minimum Spanning Trees (MSTs) have numerous practical applications in graph theory and
various real-world scenarios. A minimum spanning tree is a tree that connects all the vertices
of a connected, undirected graph while minimizing the total sum of edge weights. Here are
some key applications of MSTs:

1. Network Design and Communication: In communication networks like the internet

or telephone networks, MSTs can be used to optimize the infrastructure and minimize
the overall cost of connecting all the nodes while ensuring efficient communication.
2. Cable Layout and Routing: In electrical and utility networks, MSTs can help design
efficient cable layouts and routing systems, reducing the total cable length or cost of
connecting different points.
3. Cluster Analysis: MSTs can be employed in data clustering and classification
problems, where they help identify groups or clusters of data points with minimal
inter-cluster connections.
4. Approximation Algorithms: MSTs can be used as a component in approximation
algorithms for solving NP-hard problems, providing good heuristic solutions with
known bounds.
5. Spanning Tree Protocols: MST algorithms, such as the Rapid Spanning Tree
Protocol (RSTP) and the Spanning Tree Protocol (STP), are used in computer
networks to ensure a loop-free and redundant-free topology.
6. Broadcasting and Multicasting: MSTs can be used for broadcasting and
multicasting data in computer networks, ensuring efficient data dissemination with
minimal message duplication.
7. Power Distribution Networks: In power distribution networks, MSTs can help
optimize the layout of power lines, reducing the total cost and ensuring an efficient
supply of electricity.
8. Sensor Networks: In wireless sensor networks, MSTs can be used to establish an
efficient communication backbone, allowing sensors to relay information to a central
node with minimal energy consumption.
9. Traveling Salesman Problem (TSP) Approximation: MSTs can be used as a step in
approximating the TSP, where they help find a near-optimal tour for visiting a set of
cities with minimal distance.
10. Transportation and Logistics: In logistics and transportation planning, MSTs can be
applied to optimize routes, minimizing travel distances, and costs.
11. Data Center Design: MSTs can be utilized in designing data centers, optimizing
server communication paths, and reducing network latency.
12. Geographic Surveying: In geographic surveying and mapping, MSTs can be used to
plan optimal routes for surveyors or minimize the distance between surveying points.

Overall, MSTs play a crucial role in optimizing and simplifying various real-world problems
that involve connecting points or nodes while minimizing costs or distances. Their ability to
provide efficient and near-optimal solutions makes them a valuable tool in graph theory and
practical applications.

3.5 SUMMARY

In this unit, we have described two problems where greedy strategy is used to provide
optimal solution. In single source shortest path problem, we described how greedy strategy is
used to determine the shortest path from a single source to all the remaining vertices of G. In
the case of minimum cost spanning tree problem, we intend to build a least cost spanning
tree, stage by stage, using the greedy method. Obviously at each stage, we choose the edge
with the least weight from amongst the available edges. With this, in mind, we described two
algorithms Prim’s and Kruskal’s respectively, which work on greedy principle.

3.6 KEYWORDS

Graph
Kruskal’s algorithm
Greedy strategy
Complexity
3.7 QUESTIONS FOR SELF STUDY

a. Explain the single source shortest path problem with an example?

b. Explain the steps involved in solving single source shortest path problem.
c. Explain the Prim’s algorithm to solve minimum cost spanning tree problem.
d. Explain the Kruskal’s algorithm to solve minimum cost spanning tree problem.
e. What are the Applications of Minimum Spanning trees in graph theory

3.8 REFERENCES
1) Fundamentals of Algorithmics: Gilles Brassard and Paul Bratley, Prentice Hall
Englewood Cliffs, New Jersey 07632.
2) Sartaj Sahni, 2000, Data structures, Algorithms and Applications in C++, McGraw
Hill International Edition.
UNIT 4: SHORTEST PATH ALGORITHMS

Structure:
4.0 Objectives
4.1 Introduction
4.2 Dijkstra's algorithm
4.3 Bellman-Ford algorithm
4.4 Floyd-Warshall algorithm
4.5 Johnson’s Algorithm
4.6 Summary
4.7 Keywords
4.8 Questions for self-study
4.9 Reference

4.0 OBJECTIVES

At the end of this unit, you will be able to

• Describe Dijkstra's algorithm
• Elucidate Floyd-Warshall algorithm

4.1 INTRODUCTION

An algorithm is essentially a set of instructions, or a highly specific procedure of steps for a

computer to follow to solve a particular problem or specific task. In the case of those seeking
help with efficient routing maps, finding the shortest paths algorithms is the solution they
desire. When using navigation apps to assist you with your route planning, think of it as a
blueprint for how to create a short path algorithm that is specifically designed for computers to
read, understand, and then act upon to produce the shortest possible map route for users. Many
individual users don’t really understand the ins and outs of how these algorithms work, because
they don’t have to understand it all to receive great benefits from such apps. But, in case you
are the type of person who loves to know ‘how the sausage is made’ you’re going to get a brief
overview of some of the leading shortest paths algorithms used in many navigation apps today.
As there are a number of different shortest path algorithms, we’ve gathered the most important
to help you understand how they work and which is the best.

• Dijkstra’s Algorithm
Dijkstra’s Algorithm stands out from the rest due to its ability to find the shortest path from
one node to every other node within the same graph data structure. This means, that rather than
just finding the shortest path from the starting node to another specific node, the algorithm
works to find the shortest path to every single reachable node – provided the graph doesn’t
change. The algorithm runs until all of the reachable nodes have been visited. Therefore, you
would only need to run Dijkstra’s algorithm once, and save the results to be used again and
again without re-running the algorithm – again, unless the graph data structure changed in any
way. In the case of a change in the graph, you would need to rerun the graph to ensure you
have the most updated shortest paths for your data structure. Let’s take our routing example
from above, if you want to go from A to B in the shortest way possible, but you know that
some roads are heavily congested, blocked, undergoing works, and so on, when using Dijkstra,
the algorithm will find the shortest path while avoiding any edges with larger weights, thereby
finding you the shortest route.

• Bellman-Ford Algorithm
Similar to Dijkstra’s algorithm, the Bellman-Ford algorithm works to find the shortest path
between a given node and all other nodes in the graph. Though it is slower than the former,
Bellman-Ford makes up for its a disadvantage with its versatility. Unlike Dijkstra’s algorithm,
Bellman-Ford is capable of handling graphs in which some of the edge weights are negative.
It’s important to note that if there is a negative cycle – in which the edges sum to a negative
value – in the graph, then there is no shortest or cheapest path. Meaning the algorithm is
prevented from being able to find the correct route since it terminates on a negative cycle.
Bellman-Ford is able to detect negative cycles and report on their existence.

• Floyd-Warshall Algorithm
The Floyd-Warshall stands out in that unlike the previous two algorithms it is not a single-
source algorithm. Meaning, it calculates the shortest distance between every pair of nodes in
the graph, rather than only calculating from a single node. It works by breaking the main
problem into smaller ones, then combines the answers to solve the main shortest path issue.
Floyd-Warshall is extremely useful when it comes to generating routes for multi-stop trips as
it calculates the shortest path between all the relevant nodes. For this reason, many route
planning software’ will utilize this algorithm as it will provide you with the most optimized
route from any given location. Therefore, no matter where you currently are, Floyd-Warshall
will determine the fastest way to get to any other node on the graph.

Johnson’s Algorithm
Johnson’s algorithm works best with sparse graphs – one with fewer edges, as it’s runtime
depends on the number of edges. So, the fewer edges, the faster it will generate a route.
This algorithm varies from the rest as it relies on two other algorithms to determine the shortest
path. First, it uses Bellman-Ford to detect negative cycles and eliminate any negative edges.
Then, with this new graph, it relies on Dijkstra’s algorithm to calculate the shortest paths in the
original graph that was inputted.

4.2 DIJKSTRA'S ALGORITHM

Dijkstra’s shortest path algorithm is similar to that of Prim’s algorithm as they both rely on
finding the shortest path locally to achieve the global solution. However, unlike prim’s
algorithm, the dijkstra’s algorithm does not find the minimum spanning tree; it is designed to
find the shortest path in the graph from one vertex to other remaining vertices in the graph.
Dijkstra’s algorithm can be performed on both directed and undirected graphs. Since the
shortest path can be calculated from single source vertex to all the other vertices in the graph,
Dijkstra’s algorithm is also called single-source shortest path algorithm. The output obtained
is called shortest path spanning tree. In this unit, we will learn about the greedy approach of
the dijkstra’s algorithm.

Dijkstra’s Algorithm
The dijkstra’s algorithm is designed to find the shortest path between two vertices of a graph.
These two vertices could either be adjacent or the farthest points in the graph. The algorithm
starts from the source. The inputs taken by the algorithm are the graph G {V, E}, where V is
the set of vertices and E is the set of edges, and the source vertex S. And the output is the
shortest path spanning tree.
Algorithm
Declare two arrays − distance[] to store the distances from the source vertex to the other
vertices in graph and visited[] to store the visited vertices.
Set distance[S] to ‘0’ and distance[v] = ∞, where v represents all the other vertices in the graph.
Add S to the visited[] array and find the adjacent vertices of S with the minimum distance.
The adjacent vertex to S, say A, has the minimum distance and is not in the visited array yet.
A is picked and added to the visited array and the distance of A is changed from ∞ to the
assigned distance of A, say d1, where d1 < ∞.Repeat the process for the adjacent vertices of the
visited vertices until the shortest path spanning tree is formed.

Examples
To understand the dijkstra’s concept better, let us analyze the algorithm with the help of an
example graph −

Fig 4.2.1
Step 1
Initialize the distances of all the vertices as ∞, except the source node S.

Vertex S A B C D E

Distance 0 ∞ ∞ ∞ ∞ ∞

Now that the source vertex S is visited, add it into the visited array.
visited = {S}
Step 2
The vertex S has three adjacent vertices with various distances and the vertex with minimum
distance among them all is A. Hence, A is visited and the dist[A] is changed from ∞ to 6.
S→A=6
S→D=8
S→E=7

Vertex S A B C D E

Distance 0 6 ∞ ∞ 8 7

Visited = {S, A}

Fig 4.2.2
Step 3
There are two vertices visited in the visited array, therefore, the adjacent vertices must be
checked for both the visited vertices. Vertex S has two more adjacent vertices to be visited yet:
D and E. Vertex A has one adjacent vertex B.Calculate the distances from S to D, E, B and
select the minimum distance −
S → D = 8 and S → E = 7.
S → B = S → A + A → B = 6 + 9 = 15

Vertex S A B C D E

Distance 0 6 15 ∞ 8 7

Visited = {S, A, E}
Fig 4.2.3
Step 4
Calculate the distances of the adjacent vertices – S, A, E – of all the visited arrays and select
the vertex with minimum distance.
S→D=8
S → B = 15
S → C = S → E + E → C = 7 + 5 = 12

Vertex S A B C D E

Distance 0 6 15 12 8 7

Visited = {S, A, E, D}

Fig 4.2.4
Step 5
Recalculate the distances of unvisited vertices and if the distances minimum than existing
distance is found, replace the value in the distance array.
S → C = S → E + E → C = 7 + 5 = 12
S → C = S → D + D → C = 8 + 3 = 11
dist[C] = minimum (12, 11) = 11
S → B = S → A + A → B = 6 + 9 = 15
S → B = S → D + D → C + C → B = 8 + 3 + 12 = 23
dist[B] = minimum (15,23) = 15

Vertex S A B C D E

Distance 0 6 15 11 8 7

Visited = { S, A, E, D, C}

Fig 4.2.5
Step 6
The remaining unvisited vertex in the graph is B with the minimum distance 15, is added to the
output spanning tree.
Visited = {S, A, E, D, C, B}

Fig 4.2.6
The shortest path spanning tree is obtained as an output using the dijkstra’s algorithm.
Example
The program implements the dijkstra’s shortest path problem that takes the cost adjacency
matrix as the input and prints the shortest path as the output along with the minimum cost.

#include<stdio.h>
#include<limits.h>
#include<stdbool.h>
int min_dist(int[], bool[]);
void greedy_dijsktra(int[][6],int);
int min_dist(int dist[], bool visited[]){ // finding minimum dist
int minimum=INT_MAX,ind;
for(int k=0; k<6; k++) {
if(visited[k]==false && dist[k]<=minimum) {
minimum=dist[k];
ind=k;
}
}
return ind;
}
void greedy_dijsktra(int graph[6][6],int src){
int dist[6];
bool visited[6];
for(int k = 0; k<6; k++) {
dist[k] = INT_MAX;
visited[k] = false;
}
dist[src] = 0; // Source vertex dist is set 0
for(int k = 0; k<6; k++) {
int m=min_dist(dist,visited);
visited[m]=true;
for(int k = 0; k<6; k++) {

// updating the dist of neighbouring vertex

if(!visited[k] && graph[m][k] && dist[m]!=INT_MAX &&
dist[m]+graph[m][k]<dist[k])
dist[k]=dist[m]+graph[m][k];
}
}
printf("Vertex\t\tdist from source vertex\n");
for(int k = 0; k<6; k++) {
char str=65+k;
printf("%c\t\t\t%d\n", str, dist[k]);
}
}
int main(){
int graph[6][6]= {
{0, 1, 2, 0, 0, 0},
{1, 0, 0, 5, 1, 0},
{2, 0, 0, 2, 3, 0},
{0, 5, 2, 0, 2, 2},
{0, 1, 3, 2, 0, 1},
{0, 0, 0, 2, 1, 0}
};
greedy_dijsktra(graph,0);
return 0;
}

Output
Vertex dist from source vertex
A 0
B 1
C 2
D 4
E 2
F 3
4.3 BELLMAN-FORD ALGORITHM

Consider a scenario where you are presented with a weighted graph. Your objective is to
determine the shortest path from a given source vertex to all other vertices. Initially, you might
consider implementing Dijkstra’s algorithm for this task. However, if the graph contains
negative weights, Dijkstra’s algorithm cannot be used. Therefore, we need a different algorithm
that can handle such situations. The Bellman-Ford algorithm is a suitable alternative to
Dijkstra’s algorithm as it accommodates negative edge weights.
This article provides a comprehensive discussion of the Bellman-Ford algorithm. It covers its
functionality, complexity, highlights the disparities between Dijkstra’s and Bellman-Ford
algorithms, and presents various applications of the Bellman-Ford algorithm. Before delving
into the details of the Bellman-Ford algorithm, it is important to understand why negative
weights in a graph pose a challenge and warrant caution.

Why Should We Be Cautious Of Negative Weights?

It is important to exercise caution when dealing with negative weight edges in a graph due to
the possibility of generating negative weight cycles. Negative weight cycles are cycles that
have the same vertex at the beginning and end, and their total sum of edge weights is egative.
These cycles pose a challenge for shortest path algorithms as they cannot accurately detect
them, leading to incorrect results. Even the Bellman-Ford algorithm is unable to overcome this
limitation. To grasp this concept more clearly, please refer to the illustration provided below.

Fig 4.3.1
The vertices B, C, and D in this illustration form a cycle with B as the starting and ending
nodes. This cycle also behaves as a negative cycle because the total value is -1.
How Bellman-Ford Algorithm Works
The Bellman-Ford algorithm is a single-source shortest-path algorithm that can handle negative
weight edges. It works by iteratively relaxing all edges in the graph, reducing the estimated
distance from the source vertex to all other vertices until the actual shortest path is found.
Here are the steps involved in the Bellman-Ford algorithm:
• Step – 1 Initialize the distance to the source vertex as 0, and the distance to all other
vertices as infinity.
• Step – 2 Relax all edges in the graph |V| – 1 times, where |V| is the number of vertices
in the graph. For each edge (u, v) with weight w, check if the distance from the source
vertex to v can be reduced by going through u. If so, update the distance to v to the new,
shorter distance.
• Step – 3 Check for negative weight cycles. If there is a negative weight cycle in the
graph, the algorithm will never converge and will keep reducing the distance to some
vertices with each iteration. To detect such cycles, repeat step 2 one more time. If any
distance is updated in this extra iteration, there must be a negative weight cycle in the
graph.
• Step – 4 If there is no negative weight cycle, the shortest distance to each vertex from
the source vertex has been found.

• Dry Run of Bellman-Ford Algorithm

Let’s dry run the Bellman-Ford algorithm on an example graph.
1. Let A be the given source vertex. Except for the distance to the source itself, set all
distances to infinity. Because the graph has a total of 5 vertices, all edges must be
relaxed 4 times.

Fig 4.3.2
2. Let all edges be processed in the following fashion: (B, E), (D, B), (B, D), (A, B), (A,
C), (D, C), (B, C). (E, D). When we relax all edges for the first time, we get the
following distances.

Fig 4.3.3
3. The first round of iteration guarantees to generate all shortest paths which are at most
one edge long. When we process all edges a second time, we get the following
distances. (The last row shows final values).

Fig 4.3.4
4. The second iteration guarantees to generate all shortest paths which are at most two
edges long. All edges are processed two more times by the Bellman Ford algorithm.
After the second iteration, the distances are minimized, therefore the third and fourth
iterations do not update the distances.

• Code for Bellman Ford Algorithm

Let’s go into the implementation of Bellman Ford algorithm for various languages after
knowing how Bellman Ford Algorithm actually works In python programming.
class Graph:

def init(self, vertices):

self.V = vertices
self.graph = []

def addEdge(self, u, v, w):

self.graph.append([u, v, w])

def printArr(self, dist):

print("Vertex Distance from Source")
for i in range(self.V):
print("{0}\t\t{1}".format(i, dist[i]))

def BellmanFord(self, src):

dist = [float("Inf")] * self.V

dist[src] = 0

for _ in range(self.V - 1):

for u, v, w in self.graph:
if dist[u] != float("Inf") and dist[u] + w < dist[v]:
dist[v] = dist[u] + w

for u, v, w in self.graph:
if dist[u] != float("Inf") and dist[u] + w < dist[v]:
print("Graph contains negative weight cycle")
return

self.printArr(dist)

if __name__ == '__main__':
g = Graph(5)
g.addEdge(0, 1, -1)
g.addEdge(0, 2, 4)
g.addEdge(1, 2, 3)
g.addEdge(1, 3, 2)
g.addEdge(1, 4, 2)
g.addEdge(3, 2, 5)
g.addEdge(3, 1, 1)
g.addEdge(4, 3, -3)

g.BellmanFord(0)
Output:

Vertex Distance from Source

0 0
1 -1
2 2
3 -2
4 1

• Complexity Analysis of Bellman-Ford Algorithm

The time complexity of the Bellman-Ford algorithm is O(V*E), where V is the number of
vertices and E is the number of edges in the graph. This is because the algorithm relaxes each
edge in the graph V-1 times.In the worst case, the algorithm may need to perform one additional
relaxation for each vertex in the graph, which would take an additional O(V) time. This can
happen when the graph contains a negative weight cycle, which causes the algorithm to keep
reducing the distance of vertices along the cycle in each iteration, leading to an infinite negative
distance. Thus, the worst-case time complexity of the Bellman-Ford algorithm is O(VE + V),
which simplifies to O(VE).

The space complexity of the algorithm is O(V), as we need to maintain an array of size V to
store the distances of each vertex from the source vertex. Difference Between Dijkstra’s and
Bellman-Ford AlgorithmHere is a table comparing Dijkstra’s and Bellman-Ford algorithms:

Bellman-Ford
Dijkstra’s Algorithm
Algorithm

To find the shortest

To find the shortest
path from a single
path from a single
source vertex to all
source vertex to all
Purpose other vertices in a
other vertices in a
graph with negative
graph with non-
or non-negative edge
negative edge weights.
weights.
Bellman-Ford
Dijkstra’s Algorithm
Algorithm

O(V^2) with a naive

Time implementation or O(E
O(V * E)
Complexity * log V) with a priority
queue implementation.

Negative edge Cannot handle negative Can handle negative

weights edge weights. edge weights.

Cannot handle graphs Can detect and report

Negative weight
with negative weight graphs with negative
cycles
cycles. weight cycles.

Inspired by the
Inspired by the greedy dynamic
Implementation
approach. programming
approach.

• Applications of Bellman-Ford Algorithm

The Bellman-Ford algorithm has many practical applications in real life. Here are some
mentioned below:
• The Bellman-Ford algorithm is used in distance-vector routing protocols, such as the
Routing Information Protocol (RIP) and the Border Gateway Protocol (BGP), to find
the shortest path between routers in a network.
• GPS navigation systems use the Bellman-Ford algorithm to find the shortest route
between two locations.
• Airlines use the Bellman-Ford algorithm to optimize their flight routing and scheduling,
taking into account factors such as flight distance, flight time, and airport congestion.
• The Bellman-Ford algorithm can be used to optimize traffic flow by finding the shortest
path between two points on a road network.
• The Bellman-Ford algorithm is used to optimize supply chain management by finding
the shortest path between suppliers, manufacturers, and retailers.
• The Bellman-Ford algorithm is used in circuit design to optimize the routing of signals
between different components of a circuit.

4.4 FLOYD-WARSHALL ALGORITHM

he Floyd Warshall algorithm in C is a dynamic programming approach used to find the shortest
path between all pairs of vertices in a weighted graph. It is applicable to both directed and
undirected graphs, with the exception of graphs that contain negative weight cycles. This
algorithm proves to be the best choice when searching for the shortest path across every pair
of vertices in a graph, such as finding the shortest routes between cities in a state or country.
In this article, we will explore the workings of the Floyd Warshall algorithm in C and delve
into its implementation details. We will also discuss its time complexity compared to other
popular algorithms like Bellman-Ford and Dijkstra’s shortest path algorithm, highlighting why
Floyd Warshall is often the preferred option. Additionally, we will explore the various
applications of the Floyd-Warshall algorithm, showcasing its versatility and practical uses in
different scenarios.

What is the Floyd Warshall Algorithm in C?

Floyd-Warshall Algorithm is an algorithm which follows dynamic programming approach to

find the shortest path between all the pairs of vertices in a weighted graph. This algorithm
works well for both the directed and undirected weighted graphs. But, it does not work for
graphs that contain negative weight cycles.If you’re looking for an algorithm for scenarios like
finding the shortest path between every pair of cities in a state or in a country, then the best
man at work is our polynomial-time Floyd-Warshall algorithm.

In other words, the Floyd-Warshall algorithm is the best choice for finding the shortest path
across every pair of vertex in a graph. One restriction we have to follow is that the graph
shouldn’t contain any negative weight cycles. You see, the Floyd-Warshall algorithm does
support negative weight edges in a directed graph so long the weighted sum of such edges
forming a cycle isn’t negative. And that’s what here means by a negative weight cycle.If there
exists at least one such negative weight cycle, we can always just keep traversing this cycle
over and over while making the length of the path smaller and smaller. Keep repeating it, and
the length at some point reaches negative infinity which is wildly unreasonable.

Fig 4.4.1 Negative weight cycle

Also, if we notice, the algorithm cannot support negative weight edges in an undirected graph
at all. Such an edge forms a negative cycle in and of itself since we can traverse back and forth
along that edge infinitely as it’s an undirected graph. Well, you may suspect why we are
learning another algorithm when we can solve the same thing by expanding the Bellman-Ford
or Dijkstra’s shortest path algorithm on every vertex in the graph. Yes, you can, but the main
reason why we are not using Bellman-Ford or Dijkstra’s shortest path algorithm is their Time
complexity which we will discuss later in this article. Now that we have developed a fair
understanding as to what is and why we use the Floyd-Warshall algorithm, let’s take our
discussion ahead and see how it actually works.

Points to Remember:

• Negative weight cycle graphs are graphs where the sum of the weights of edges in a
cycle is negative).

• A weighted graph is a graph in which each edge has some weight(numerical value)
associated with it.
How does Floyd Warshall Algorithm in C Work?

Given Graph:

Fig 4.4.2 Initial graph

Follow the steps mentioned below to find the shortest path between all the pairs of vertices:
• Step 1:
Create a matrix A0 of dimension V*V, where V is the number of vertices. The row and
column are indexed as i and j, respectively. i and j are the graph’s vertices.
The value of cell A[i][j] means the distance from the ith vertex to the jth vertex. If there
is no path between the ith vertex and the jth vertex, the cell is left with infinity.

Fig 4.4.3 Initial Matrix

• Step 2:
Now, using matrix A0, construct matrix A1. The elements in the first column and row
stay unchanged. The remaining cells are filled out as follows.Let k be the intermediate
vertex on the shortest path from source to destination. k is the 0th vertex (i.e k = 0) in
this stage. If (A[i][j] > A[i][k] + A[k][j]), A[i][j] is filled with (A[i][k] + A[k][j]).In
other words, if the direct distance between the source and the destination is larger than
the path through the vertex k, the cell is filled with A[i][k] + A[k][j].
Fig 4.4.4 source vertex to the destination vertex.

This vertex k is used to compute the distance from the source vertex to the destination
vertex.
• Step 3:
Similarly, A2 is derived from A1. The elements in the second column and second row
remain unchanged. k is the 1st vertex in this stage (i.e. k = 1). The remaining steps are
similar to those in step 2.

Fig 4.4.5

For example: For A2[3,2], the direct distance from vertex 3 to 2 is infinity and the sum
of the distance from vertex 3 to 2 through vertex k(i.e. from vertex 3 to vertex 1 and
from vertex 1 to vertex 2) is 0. Since 0 is less than infinity, A2[3,2] is filled with 0.
• Step 4:
Similarly, A3 and A4 matrices are also constructed. When generating A3, k is a second
vertex(i.e. K = 2) and k is a third vertex(i.e. K = 3) during the construction of A4.
Fig 4.4.6

• Step 5:
A4 displays the shortest path between any pair of vertices.

Fig 4.4.7 shortest path between any pair of vertices.

Now, let us try to implement the above steps in a code.

// Implementation of Floyd-Warshall Algorithm in C++

#include <iostream>
using namespace std;

// defining the number of vertices

#define V 4
#define INF 9999

void printMatrix(int matrix[][V]);

// Implementing floyd warshall algorithm

void floydWarshall(int graph[][V]) {
int matrix[V][V], i, j, k;
for (i = 0; i < V; i++)
for (j = 0; j < V; j++)
matrix[i][j] = graph[i][j];

// Adding vertices individually

for (k = 0; k < V; k++) {
for (i = 0; i < V; i++) {
for (j = 0; j < V; j++) {
if (matrix[i][k] + matrix[k][j] < matrix[i][j])
matrix[i][j] = matrix[i][k] + matrix[k][j];
}
}
}
printMatrix(matrix);
}

void printMatrix(int matrix[][V]) {

for (int i = 0; i < V; i++) {
for (int j = 0; j < V; j++) {
if(i==j){
continue;
}
else if (matrix[i][j] == INF){
cout<<"no path exist between "<<i<<" and "<<j<<endl;
}
else{
cout<<"shortest path from "<<i<<" to "<<j<<" is "<<matrix[i][j]<<endl;
}
}
}
}

int main() {
int graph[V][V] = {{0,INF,-3,INF},
{5,0,4,INF},
{INF,INF,0,3},
{INF,-2,INF,0}};
floydWarshall(graph);
return 0;
// End of Program
}

Output:
shortest path from 0 to 1 is -2
shortest path from 0 to 2 is -3
shortest path from 0 to 3 is 0
shortest path from 1 to 0 is 5
shortest path from 1 to 2 is 2
shortest path from 1 to 3 is 5
shortest path from 2 to 0 is 6
shortest path from 2 to 1 is 1
shortest path from 2 to 3 is 3
shortest path from 3 to 0 is 3
shortest path from 3 to 1 is -2
shortest path from 3 to 2 is 0

• Time Complexity Analysis of the Floyd Warshall Algorithm

The overall time complexity of the Floyd Warshall algorithm is O(V^3) where V represents
the number of vertexes in the graph. Let’s dive into more details in the time complexity analysis
of the Floyd warshall algorithm. Since we are discussing time complexity, let’s move our
discussion in the direction of why Bellman-Ford or Dijkstra’s shortest path algorithm is not a
good choice for solving the shortest path problem concerning all pairs of vertices.
The Bellman-Ford and Dijkstra’s shortest path algorithm only computes the shortest path from
one vertex to all other vertexes and has an upper bound of O(VE) and O(V + ElogV) where V
and E denote the numbers of vertex and edges in the graph. One more point to note about
Dijkstra’s shortest path algorithm is that it doesn’t work for negatively weighted edges. And if
we expand the Bellman-Ford algorithm for all vertexes, the time complexity becomes V O(V E)
but since the Floyd Warshall algorithm depends only on the number of vertexes, not on the
number of edges in the graph.This makes the Floyd-Warshall algorithm an ideal choice for us.

• Applications of the Floyd Warshall Algorithm in C

Here are some applications of the Floyd warshall algorithm:
o Performs fast computation of Pathfinder networks.
o Helps in finding the Inversion of real matrices.
o Helps in finding the transitive closure of directed graphs.
o Helps in finding out if an undirected graph is bipartite or not.
o Used in finding the shortest path in the graph.

4.5 JOHNSON'S ALGORITHM

Johnson's algorithm is a shortest path algorithm that deals with the all-pairs shortest path
problem. The all-pairs shortest path problem takes in a graph with vertices and edges, and it
outputs the shortest path between every pair of vertices in that graph. Johnson's algorithm is
very similar to the Floyd-Warshall algorithm; however, Floyd-Warshall is most effective for
dense graphs (many edges), while Johnson's algorithm is most effective for sparse graphs (few
edges).The reason that Johnson's algorithm is better for sparse graphs is that its time complexity
depends on the number of edges in the graph, while Floyd-Warshall's does not. Johnson's

algorithm runs in time. So, if the number of edges is small (i.e.

the graph is sparse), it will run faster than the runtime of Floyd-Warshall.Johnson's
algorithm is interesting because it uses two other shortest path algorithms as subroutines. It
uses Bellman-Ford in order to reweight the input graph to eliminate negative edges and detect
negative cycles. With this new, altered graph, it then uses Dijkstra's shortest path algorithm to
calculate the shortest path between all pairs of vertices. The output of the algorithm is then the
set of shortest paths in the original graph.
Johnson's algorithm has three main steps.

1. A new vertex is added to the graph, and it is connected by edges of zero

weight to all other vertices in the graph.

2. All edges go through a reweighting process that eliminates negative weight

edges.

3. The added vertex from step 1 is removed and Dijkstra's algorithm is run on
every node in the graph.
These three steps can be seen in the graphic below. Figure 4.5(a) shows step 1. Figure
4.5 (b) shows step 2. Figures 4.5 (c)-(g) show step 3, Dijkstra's algorithm being run on
each of the 5 vertices in the graph.

Figure 4.5 Johnson's algorithm

4.6 SUMAMRY

In this Unit we have discussed about Dijkstra's algorithm, Bellman-Ford algorithm, Floyd-
Warshall algorithm as well as Johnson's algorithm an algorithm is essentially a set of
instructions, or a highly specific procedure of steps for a computer to follow to solve a
particular problem or specific task. In the case of those seeking help with efficient routing
maps, finding the shortest paths algorithms is the solution they desire. When using navigation
apps to assist you with your route planning, think of it as a blueprint for how to create a short
path algorithm that is specifically designed for computers to read, understand, and then act
upon to produce the shortest possible map route for users.

4.7 KEYWORDS

• Matrix
• dynamic programming
• polynomial-time
• Routing Information Protocol (RIP)
• Border Gateway Protocol (BGP)

4.8 QUESTION FOR SELF STUDY

1. Explain different types of shortest path algorithms in detail.

2. Briefly explain Dijkstra's algorithm with an example.
3. Discuss in detail about Bellman-Ford algorithm.
4. Why Should We Be Cautious of Negative Weights in Bellman-Ford algorithm?
5. How Bellman-Ford Algorithm Works
6. Write a note on Applications of Bellman-Ford Algorithm
7. Explain Floyd-Warshall algorithm with an example.
8. Time Complexity Analysis of the Floyd Warshall Algorithm
9. Explain Johnson's algorithm.

4.9 REFERENCE

• Akiyama, J. and Kano, M., Factors and factorization of graphs, J. Graph Theory 9
• Alavi, A. and Behzad, M., Complementary graphs and edge-chromatic numbers, SIAM
J. Apl. Math.
• Alspach, B. and Reid, K. B., Degree frequencies in digraphs and tournaments, J. Graph
Theory
• Anderson, I., Perfect matching of a graph, J. Combin. Theory Ser. B 10
• Balakrishnan, R. and Ranganathan, K., A Textbook of Graph Theory, Springer-Verlag,
UNIT 5: NETWORK FLOWS

Structure:
5.0 Objectives
5.1 Introduction
5.2 Flow networks and flow augmenting paths
5.3 Ford-Fulkerson algorithm
5.4 Maximum flow-minimum cut theorem
5.5 Summary
5.6 Keywords
5.7 Questions for self-study
5.8 Reference

5.0 OBJECTIVES

At the end of this unit, you will be able to

• Describe Flow networks and flow augmenting paths
• Elucidate Maximum flow-minimum cut theorem

5.1 INTRODUCTION

A Flow network is a directed graph where each edge has a capacity and a flow. They are
typically used to model problems involving the transport of items between locations, using a
network of routes with limited capacity. Examples include modeling traffic on a network of
roads, fluid in a network of pipes, and electricity in a network of circuit components.

For example, a company might want to ship packages from Los Angeles to New York City
using trucks to transport between intermediate cities. If there is only one truck for the route
connecting a pair of cities and each truck has a maximum load, then the graph describing the
transportation options will be a flow network. Each node will represent a city and each edge
will represent a truck route between those cities (e.g. a highway). The capacity for a particular
route will be the maximum load the truck for that route can carry. Using this model, the
company can decide how to split their packages between trucks so that the packages can reach
their destination using the available routes and trucks. The number of packages the company
decides to ship along a particular truck route is the flow for that route.

5.2 FLOW NETWORKS AND FLOW AUGMENTING PATHS

Imagine a courier service that wants to ship as many widgets as possible from city s to city t.
Unfortunately, there is no way to ship widgets directly from s to t, so the courier service must
ship the widgets using the intermediate cities a, b, c, and d. Particular pairs of cities are
connected by flights, which allow the transport of widgets between those cities. This
transportation network can be represented by the following directed graph, where nodes
represent cities and directed edges represent flights between those cities.

Fig 5.2.1 Graph representing possible flights between cities

Obviously, any realistic airplane can't carry an unlimited number of widgets. Therefore, every
flight has a maximum number of widgets that it can carry, dependent on the size of its cargo
bay. This maximum is called the capacity for that flight. It is a number associated with each
edge in the graph above and denotes the maximum number of widgets that can be transported
between cities. The graph below is the graph above plus the corresponding capacities. For
example, the flight from b to c can carry a maximum of 9 widgets, so edge �⃗bc has
capacity 9.
Fig 5.2.2 Graph representing flights between cities with corresponding capacities

Now that the courier service has a representation of the flights it can use to transport widgets,
as well as the maximum number of widgets that can be moved between cities, it can start the
task of deciding how many widgets to transport on each flight. The number of widgets
transported along each flight is known as the flow for that flight. The naive solution would be
to just assign the maximum number of widgets possible to each flight. However, this violates
the common-sense constraint that the number of widgets flown into an intermediate city must
equal the number of widgets being flown out of that city. Otherwise, widgets are being left
behind (in the case that there are more flown in than flown out) or created out of thin air (in the
case there are more flown out than flow in), two scenarios which are not in the courier's interest
or power. Indeed, assigning the maximum flows possible leaves 14 widgets entering city d and
only 11 widgets leaving, meaning 3 widgets of vanished.

So, for every node in the graph other than s and t, the total flow leaving that node must be equal
to the total flow entering that node. One possible assignment that respects these two constraints
(an edge's flow may not exceed its capacity and the total flow entering a node must equal the
total flows leaving that node) is shown below.

Fig 5.2.3 One valid assignment of flows for the transport of widgets between s and t
A natural question is to ask whether this is the assignment that maximizes the number of
widgets that can ship from city s to city t. Since the number of widgets being shipped to city t is
simply the number of widgets leaving s or entering t, summing either one yields that this flow
assignment allows the shipment of 9 widgets. In fact, an optimal assignment, known as
the maximum flow, is 23 widgets, which is shown in the graph below. This is one flow
assignment (it is not necessarily unique) that maximizes the courier service's stated objective
of maximizing the number of widgets to ship from s to t. Other problems and objectives are
described in the section Flow Network Problems below.

Fig 5.2.4 Assignment of flows that maximizes the transport of widgets between s and t

• Definition

A flow network N is a tuple N=(G,c,s,t) where

G=(V,E) is a directed graph of vertices V and directed edges E
c:E→R0+ is a mapping from the edges to the nonnegative reals, and c(e) is called
the capacity of edge e
s,t∈V are special vertices of G called the source and sink, respectively
An admissible flow. or, for short, a flow on N, is defined as
A flow f for a flow network N is a mapping f : E→R0+ satisfying the following two
constraints:

0 ≤ f (e) ≤ c(e)

for all edges e ∈ E

for each vertex where and denote the start and end vertex of
edge e, respectively

The first constraint (called the feasibility condition) simply says that the flow f (e) along each
edge e must be nonnegative and may not exceed the capacity c(e) for that edge. The second
constraint (called the flow conservation condition) says that the flows into a vertex must equal
the flows out of that vertex, except for the source and sink vertices.

5.3 FLOW NETWORKS AND FLOW AUGMENTING PATHS

The Ford-Fulkerson algorithm is an algorithm that tackles the max-flow min-cut problem. That
is, given a network with vertices and edges between those vertices that have certain weights,
how much "flow" can the network process at a time? Flow can mean anything, but typically it
means data through a computer network.It was discovered in 1956 by Ford and Fulkerson. This
algorithm is sometimes referred to as a method because parts of its protocol are not fully
specified and can vary from implementation to implementation. An algorithm typically refers
to a specific protocol for solving a problem, whereas a method is a more general approach to a
problem.

The Ford-Fulkerson algorithm assumes that the input will be a graph, G, along with a source
vertex, s, and a sink vertex, t. The graph is any representation of a weighted graph where
vertices are connected by edges of specified weights. There must also be a source vertex and
sink vertex to understand the beginning and end of the flow network. Ford-Fulkerson has a
complexity of O(|E| .f*), where f* is the maximum flow of the network. The Ford-Fulkerson
algorithm was eventually improved upon by the Edmonds-Karp algorithm, which does the
same thing in O(V2. E) time, independent of the maximum flow value.

• Intuition

The intuition behind the algorithm is quite simple (even though the implementation
details can obscure this). Imagine a flow network that is just a traffic network of cars.
Each road can hold a certain number of cars. This can be illustrated by the following
graphic.

Fig 5.3.1 Flow Network

• The intuition goes like this: as long as there is a path from the source to the sink
that can take some flow the entire way, we send it. This path is called
an augmenting path. We keep doing this until there are no more augmenting paths.
In the image above, we could start by sending 2 cars along the topmost path
(because only 2 cars can get through the last portion). Then we might send 3 cars
along the bottom path for a total of 5 cars. Finally, we can send 2 more cars along
the top path for two edges, send them down to bottom path and through to the sink.
The total number of cars sent is now 7, and it is the maximum flow.
Algorithm Pseudo-code

The pseudo-code for this method is quite short; however, there are some functions that bear
further discussion. The simple pseudo-code is below. This pseudo-code is not written in any
specific computer language. Instead, it is an informal, high-level description of the algorithm.

Ford-Fulkerson Algorithm ((Graph G, source s, sink ):t):

initialize flow to 0
path = findAugmentingPath(G, s, t)
while path exists:
augment flow along path #This is purposefully ambiguous for now
G_f = createResidualGraph()
path = findAugmentingPath(G_f, s, t)
return flow

Basically, what this simplified version says is that as long as there is a path from the source to
the sink that can handle more flow, send that flow. Here is a version of the pseudo-code that
explains the flow augmentation more in depth:
flow = 0
for each edge (u, v) in G:
flow(u, v) = 0
while there is a path, p, from s -> t in residual network G_f:
residual_capacity(p) = min(residual_capacity(u, v) : for (u, v) in p)
flow = flow + residual_capacity(p)
for each edge (u, v) in p:
if (u, v) is a forward edge:
flow(u, v) = flow(u, v) + residual_capacity(p)
else:
flow(u, v) = flow(u, v) - residual_capacity(p)
return flow

This algorithm is more well-defined. The only ambiguous point is the 'foward edge'
terminology on line 8. When a residual graph, Gf, is created, edges can be created that go in
the opposite direction when compared to the original graph. An edge is a 'forward edge' if the
edge existed in the original graph, G. If it is a reversal of an original edge, it is called a
'backwards edge.'

• Residual Graphs

Residual graphs are an important middle step in calculating the maximum flow. As noted in
the pseudo-code, they are calculated at every step so that augmenting paths can be found from
the source to the sink. To understand how these are created and how they are used, we can use
the graph from the intuition section. However, one important attribute is important to
understand before looking at residual graphs. Residual capacity is a term used in the above
pseudo-code, and it plays an important role in residual graph creation. Residual capacity is
defined as the new capacity after a given flow has been taken away. In other words, for a given
edge ((u,v), the residual capacity, c f is defined as

C f (u,v) = c(u,v) − f(u,v).

However, there must also be a residual capacity for the reverse edge as well. The max-flow
min-cut theorem states that flow must be preserved in a network. So, the following equality
always holds:

F (u,v) = −f(v,u).
With these tools, it is possible to calculate the residual capacity of any edge, forward or
backward, in the flow network. Then, those residual capacities are used to make a residual
network, Gf. Taking the original intuition network and adding labels to the vertices, we now
have this:

5.3.2 Flow network before Ford-Fulkerson

It was shown that 2 units of flow can be pushed along the top-most path initially. When this
happens, only three edges are affected: (S, A), (A, B), and (B, T). (S, A) and (A, B) are affected
in the same way because they have the same capacity. Two things happen:

1. In the forward direction, the edges now have a residual capacity equal to

cf (u,v)=c(u,v)−f(u,v).

The flow is equal to 2, so the residual capacity of (S, A) and (A, B) is reduced to 2, while
the edge (B, T) has a residual capacity of 0.

2. In the backward direction, the edges now have a residual capacity equal to

cf (v,u)=c(v,u)−f(v,u).

Because of flow preservation, this can be written as cf (v,u)=c(v,u)+f(u,v). And since the
capacity of those backward edges was initially 0, all of the backward edges (T, B), (B, A),
and (A, S) now have a residual capacity of 2. When a new residual graph is constructed
with these new edges, any edges with a residual capacity of 0—like (B, T)—are not
included.

5.3.2 Residual graph after 1 round

Now, a new augmenting path must be found ((the top-most path can never be used again
because the edge (B, T) was erased).). The bottom path can be chosen and flow of 3 can be
sent along it. The resulting graph is as follows:

5.3.3 Residual graph after 2 rounds

Finally, a flow of 2 can be sent along the path [(S, A), (A, B), (B, C), (C, D), (D, T)] because
the minimum residual capacity along that path is 2. The final residual graph after this is done
is as follows:

5.3.4 Residual graph after 3 rounds

There are not more paths from the source to the sink, so there can be no more augmenting
paths. Therefore, the loop is complete. The flow, 7, is a maximum flow.
5.4 MAXIMUM FLOW-MINIMUM CUT THEOREM
In our previous articles, we have seen how costly it is to calculate the maximum flow as well
as the minimum cut of a graph. If we want to calculate both of them, then we need not follow
the conventional method to calculate them instead we will bother about calculating only one
of them. In this article, we will discuss how we can find the maximum flow of a flow network
by calculating the minimum cut of the network or vice-versa. We have already seen what do
minimum cut and maximum flow means in previous articles, though we have given a slight
glimpse for revision of the concept. There is a theorem that can be proved to pose a relation
between maximum flow and minimum cut of a network and that theorem is known as
the "Max-flow Min-cut" theorem which has been briefly explained and proved in the article.
Let’s revisit the concepts of minimum cut and maximum flow and then we will see what does
the max-flow min-cut theorem states and how we can prove it.

• Minimum Cut Problem

Let us assume for some reason we want to split any given image into two non-similar portions.
If we approach this problem in terms of graph data structure, we would consider pixels as the
vertices of the graph and edges between two similar pixels of the image. In this case, a
minimum cut will be analogous to the partition of pixels into two parts where the two parts are
the most dissimilar. To understand the minimum cut (or simply min-cut) concept, we will take
the help of an example a simple undirected graph G with five vertices and six edges as shown
below

Fig: 5.4.1 The minimum cut (or simply min-cut) concept

To obtain the minimum cut of this graph, we will be interested in disconnecting this graph
into two components by removing the minimum possible number of edges. This can be done
by removing two edges, A↔B and D↔C as shown below.

Fig: 5.4.2 disconnecting this graph into two components

Formally minimum cut is defined as, the minimum sum of weight of the edges (minimum
edges in case of unweighted graphs) required to remove to disconnect the graph into two
components.Karger's algorithm is a randomized algorithm that is used to find the minimum cut
of any given graph G has been discussed in the Minimum cut article. We would strongly
recommend you to refer it to have an idea of its implementation.

• Maximum Flow Problem

If we consider a rail network that connects two cities (say A and B) by way of several
intermediate cities, where each railway line has some value assigned to it denoting its capacity.
Assuming steady conditions (number of trains on a railway line never exceeds its capacity), if
we need to find what is the maximum number of trains (maximum flow) that we can send from
City A to city B then the problem is known as the "Maximum Flow Problem”. In graph theory,
we have been given a network with a source (which has no incoming edge) node and a sink
(which has no outgoing edge) node and capacities of all the edges which correspond to the
maximum flow that an edge can allow to flow through it. We are interested in finding the
maximum amount of flow we can send from the source node to the sink node constrained with
the capacities of all the intermediate edges. To understand it more clearly, we will have a look
on an example have been given below
Fig: 5.4.3 Maximum Flow Problem

The given graph G has six vertices and nine edges where the first value of each edge represents
the flow through it (which is initially set to 0) and the second value represents its capacity.
For example - 0/50/5 is written over edge A↔B means capacity of this edges is 5 and
currently there is no flow along this edge. We can find the maximum flow through this network
by following the steps which have been explained briefly in Maximum Flow. After proceeding
through the steps, we find maximum flow through the network as

• Max Flow Min Cut Theorem

The max-flow min-cut theorem is the network flow theorem that says, maximum flow from the
source node to sink node in a given graph will always be equal to the minimum sum of weights
of edges which if removed disconnects the graph into two components. size of the minimum
cut of the graph. More formally, the max-flow min-cut theorem states that, the maximum flow
passing from the source node to the sink node is equal to the size of the minimum cut.

• Intuition
In all types of networks (whether they carry data or some other object), the amount of flow that
can flow through the network is restricted by the weakest connection (an edge with
comparatively less capacity) between disjoint sets of the network. Even if other connections
can allow a huge amount of flow through them but can never be used. Let’s have a look at an
example to understand this clearly

Fig: 5.4.4 The amount of flow that can flow through the network

In the above-shown network, edge s→A, s→B has a capacity of 50 units but we can't send that

much flow through them because in the later stage we have edges A→B and B→E with
capacities of 3 and 5 respectively. Hence, the maximum flow we can have through this graph
is only 8. Another important observation in this graph is the size of the minimum cut is also 8,
which can be obtained by removing edges A→E and B→E with the total sum of weights as
8 as shown below

Fig: 5.4.5 removing edges A→E and B→E with the total sum of weights as 8
• Proof of Max-Flow Min-Cut Theorem

Before beginning with the proof let's define some variables which we will use frequently while
proving the max-flow min-cut theorem G - The given network. S - Set that includes the source

node s. T - Set that includes the sink node t. f - A function represent flow through the

network. f∗ - Function f at its max value (Maximum flow).

Lemma - A statement that is assumed to be true and used as a basis to draw a conclusion.

Corollary - A statement which is direct result of a fact (in our case result of a Lemma).

Lemma 1:

For any flow through f (G) and cut (S,T) in a network, we can say that -

f (G) ≤ capacity (S,T)

This lemma also makes sense as we have seen in the above intuition, it is not possible to send
more flow through an edge than its capacity.

Corollary 2:

Because of lemma 1, for maximum flow f (G)∗ and minimum cut (S,T)∗ we have -

f (G)∗≤ capacity ((S, T) ∗)

The above mathematical result places an upper bound for the maximum flow through the graph.
According to the "Ford-Fulkerson method" let the initial flow f be 0. Now we will search

for an augmenting path between s and t in the residual graph which has been formed at each

step of the process. Let in an augmenting path p from s to t, c min is the minimum capacity of
any edge along the path.

So we will add this flow (cmin) in our maximum flow f∗

f∗ = f + cmin
This process is repeated until there are more augmenting paths in the residual network. Once
there is no augmenting path left, we denote all vertices which are reachable from the source
as V and all vertices which are not reachable from the source as ′V′. It is obvious that
sink t can't be in set V as there are no more paths between s and t.For any pair of

vertices, u and v where u is in set V and v is in set ′V′. The flow f (u,v) is maximized as no

augmenting paths are left also flow of (v,u) is 0 due to same reason. Therefore we can say
that,

f (u,v) = capacity(u,v), u ϵ V, v ϵ V′
And by corollary 2 we can conclude that

f∗=capacity((S,T)∗)

5.5: SUMMARY
In this unit we have discussed about the Flow networks and flow augmenting paths, Ford-
Fulkerson algorithm, Maximum flow-minimum cut theorem. A Flow network is a directed
graph where each edge has a capacity and a flow. They are typically used to model problems
involving the transport of items between locations, using a network of routes with limited
capacity. Examples include modeling traffic on a network of roads, fluid in a network of pipes,
and electricity in a network of circuit components.

5.6: KEYWORDS

• Maximum Flow Problem

• minimum cut
• maximum flow
• Residual capacity

5.7: QUESTION FOR SELF STUDY

1. What is flow of networks? Explain

2. Briefly explain the Flow networks and flow augmenting paths.
3. Explain the Residual Graphs with an example.
4. Explain the Ford-Fulkerson algorithm.
5. Explain the Minimum Cut Problem.
6. Explain Max Flow Min Cut Theorem.
7. Discuss in detail about the Maximum flow-minimum cut theorem.

5.8: REFERENCES
• Akiyama, J. and Kano, M., Factors and factorization of graphs, J. Graph Theory 9
• Alavi, A. and Behzad, M., Complementary graphs and edge-chromatic numbers, SIAM J.
Apl. Math.
• Alspach, B. and Reid, K. B., Degree frequencies in digraphs and tournaments, J. Graph
Theory
• Anderson, I., Perfect matching of a graph, J. Combin. Theory Ser. B 10
• Appel, K. and Haken, W., Every planar map is four colorable, Bull. Amer. Math. Soc. 82.
• Avery, P., Score sequences in oriented graphs, J. Graph Theory 15,
• Balakrishnan, R. and Ranganathan, K., A Textbook of Graph Theory, Springer-Verlag,
UNIT 6: PLANAR GRAPHS

Structure:
6.0 Objectives
6.1 Introduction
6.2 Planarity and planar embeddings
6.3 Euler's formula and Kuratowski's theorem
6.4 Dual graphs and duality theorems
6.5 Summary
6.6 Keywords
6.7 Questions for self-study
6.8 Reference

6.0 OBJECTIVES

At the end of this unit, you will be able to

• Discuss the Planarity and planar embeddings
• Explain Euler's formula and Kuratowski's theorem
• Elucidate Dual graphs and duality theorems

6.1 INTRODUCTION

A planar graph is a graph which can be drawn in the plane without any edges crossing. Some
pictures of a planar graph might have crossing edges, but it’s possible to redraw the picture to
eliminate the crossings. For example, although the usual pictures of K4 and Q3 have crossing
edges, it’s easy to redraw them so that no edges cross. For example, a planar picture of Q3 is
shown below. However, if you fiddle around with drawings of K3,3 or K5, there doesn’t seem
to be any way to eliminate the crossings. We’ll see how to prove that these two graphs aren’t
planar.
Fig: 6.1. A planar graph

Why should we care? Planar graphs have some interesting mathematical properties, e.g. they
can be colored with only 4 colors. Also, as we’ll see later, we can use facts about planar graphs
to show that there are only 5 Platonic solids. There are also many practical applications with a
graph structure in which crossing edges are a nuisance, including design problems for circuits,
subways, utility lines. Two crossing connections normally means that the edges must be run at
different heights. This isn’t a big issue for electrical wires, but it creates extra expense for some
types of lines e.g. burying one subway tunnel under another (and therefore deeper than you
would ordinarily need). Circuits, in particular, are easier to manufacture if their connections
live in fewer layers.

6.2 PLANARITY AND PLANAR EMBEDDINGS

Let G(V,E) be a graph with V = {v1, v2,..., vn} and E = {e1, e2,..., em}. Let S be any surface
(like the plane, sphere) and P = {p1, p2, ..., pn} be a set of n distinct points of S, pi
corresponding to vi , 1 ≤ i ≤ n. If ei = vjvk , draw a Jordan arc Ji on S from p j to pk such that
Ji does not pass through any other pi . Then P∪ {J1, J2,..., Jm} is called a drawing of G on S,
or a diagram representing G on S. The pi are called the points of the diagram and Ji , the lines
of the diagram. An embedding of a graph G on a surface S is a diagram of G drawn on the
surface such that the Jordan arcs representing any two edges of G do not intersect except at a
point representing a vertex of G. A graph is planar if it has an embedding on the plane. A graph
which has no embedding on the plane is nonplanar. That is, a graph G is said to be planar if
there exists some geometric representation of G which can be drawn on a plane such that no
two of its edges intersect and a graph that cannot be drawn on a plane without a crossover
between its edges is called nonplanar. In order that a graph G is nonplanar, we have to show
that of all possible geometric representations of G, none can be embedded in a plane.
Equivalently, a geometric graph G is planar if there exists a graph isomorphic to G that is
embedded in a plane. An embedding of a planar graph G on a plane is called a plane
representation of G. Figure 6.2.1 shows three diagrams of the same graph which is planar. The
two graphs in Figure 6.2.2 represent the same planar graph.

Figure 6.2.1 The three diagrams of the same graph which is planar

Figure 6.2.2 represent the same planar graph.

• Planar Embeddings
Graphs can be represented in a variety of ways, for instance, as an adjacency matrix or using
adjacency lists. In this chapter we explore another type of representations that are quite
different in nature, namely geometric representations of graphs. Geometric representations are
appealing because they allow to visualize a graph along with a variety of its properties in a
succinct manner. There are many degrees of freedom in selecting the type of geometric objects
and the details of their geometry. This freedom allows to tailor the representation to meet
specific goals, such as emphasizing certain structural aspects of the graph at hand or reducing
the complexity of the obtained representation. The most common type of geometric graph
representation is a drawing, where vertices are mapped to points and edges to curves. Making
such a map injective by avoiding edge crossings is desirable, both from a mathematically
aesthetic point of view and for the sake of the practical readability of the drawing. Those graphs
that allow such an embedding into the Euclidean plane are known as planar. Our goal in the
following is to study the interplay between abstract planar graphs and their plane embeddings.
Specifically, we want to answer the following questions:
o What is the combinatorial complexity of planar graphs (number of edges and faces)?
o Under which conditions are plane embeddings unique (in a certain sense)?
o How can we represent plane embeddings (in a data structure)?
o What is the geometric complexity of plane embeddings, that is, can we bound the size
of the coordinates used and the complexity of the geometric objects used to represent
edges?

Most definitions we use directly extend to multigraphs. But for simplicity, we use the term
“graph” throughout.

A curve is a set C ⊂ R2 that is of the form {γ(t) where γ : [0, 1] → R2 is a

continuous function. The function γ is called a parameterization of C. The points γ(0) and γ(1)
are the endpoints of the curve. For a closed curve, we have γ(0) = γ(1). A curve is simple, if it
admits a parameterization γ that is injective on [0, 1]. For a closed simple curve we allow as an
exception that γ(0) = γ(1). The following famous theorem describes an important property of
the plane.

Theorem 6.2 (Jordan). Any simple closed curve C partitions the plane into exactly two regions
(connected open sets), each bounded by C.

Fig: 6.2.3 A Jordan curve and two points in one of its faces (left); a simple closed curve that
does not disconnect the torus (right).

Observe that, for instance, on the torus there are closed curves that do not disconnect the surface
(and so the theorem does not hold there). Drawings. As a first criterion for a reasonable
geometric representation of a graph, we would like to have a clear separation between different
vertices and also between a vertex and nonincident edges. Formally, a drawing of a graph G =
(V, E) in the plane is a function f that assigns.
a point f(v) ∈ R2 to every vertex v ∈ V and
a simple curve f({u, v}) : [0, 1] → R2 with endpoints f(u) and f(v) to every edge {u, v} ∈ E,
such that
(1) f is injective on V and
(2) f({u, v}) ∩ f(V) = {f(u), f(v)}, for every edge {u, v} ∈ E.

A common point f(e) ∩ f(e‘) between two curves that represent distinct edges e, e’ ∈ E is called
a crossing if it is not a common endpoint of e and e’ . For simplicity, when discussing a drawing
of a graph G = (V, E) it is common to treat vertices and edges as geometric objects. That is, a
vertex v ∈ V is treated as the point f(v) and an edge e ∈ E is treated as the curve f(e). For
instance, the last sentence of the previous paragraph may be phrased as “A common point of
two edges that is not a common endpoint is called a crossing.” Often it is convenient to make
additional assumptions about the interaction of edges in a drawing. For example, in a
nondegenerate drawing one may demand that no three edges share a single crossing or that
every pair of distinct edges intersects in at most finitely many points.

Planar vs. plane. A graph is planar if it admits a drawing in the plane without crossings. Such
a drawing is also called a crossing-free drawing or a (plane) embedding of the graph. A planar
graph together with a particular plane embedding is called a plane graph. Note the distinction
between “planar” and “plane”: the former refers to an abstract graph and indicates the
possibility of an embedding, whereas the latter refers to a concrete embedding (Figure 6.2.4).

Fig: 6.2.4 A planar graph (left) and a plane embedding of it (right).

A geometric graph is a graph together with a drawing, in which all edges are realized as
straight-line segments. Note that such a drawing is completely defined by the mapping for the
vertices. A plane geometric graph is also called a plane straight-line graph (PSLG). In contrast,
a plane graph in which the edges may form arbitrary simple curves is called a topological plane
graph. The faces of a plane graph are the maximally connected regions of the plane that do not
contain any point used by the embedding (as the image of a vertex or an edge). Each embedding
of a finite graph has exactly one unbounded face, also called outer or infinite face. Using
stereographic projection, it is not hard to show that the role of the unbounded face is not as
special as it may seem at first glance.

6.3 EULER'S FORMULA AND KURATOWSKI'S THEOREM

Euler’s Theorem:The following important result due to Euler gives a relation between the
number of vertices, edges, regions and the components of a planar graph.
Theorem 6.3 If G is a planar graph with n vertices, m edges, f regions and k components, then
n−m+ f = k+1. (6.3.1)
Proof: We construct the graph G by the addition of successive edges starting from the null
graph Kn. For this starting graph, k = n, m = 0, f = 1, so that (6.3.1) is true.

Let Gi−1 be the graph at the start of ith stage and Gi be the graph obtained from Gi−1 by addition
of the ith edge e. If e connects two components of Gi−1, then f is not altered, m is increased by
1 and k is reduced by 1, so that (6.3.1) holds for Gi as it holds for Gi−1. If e joins two vertices
of the same components of Gi−1, k is unaltered, m is increased by 1 and f is increased by 1, so
that again (6.3.1) holds for Gi. The following relation between the number of vertices, edges
and regions is the discovery of Euler [75] and is also called as Euler’s formula for planar graphs.
The complete graph K5 and the complete bipartite graph K3,3 are called Kuratowski’s graphs,
after the polish mathematician Kasimir Kurtatowski, who found that K5 and K3,3 are nonplanar.
Theorem 6.3 The complete graph K5 with five vertices is nonplanar.

• Kuratowski’s Two Graphs

Proof Let the five vertices in the complete graph be named v1, v2, v3, v4, v5. Since in a
complete graph every vertex is joined to every other vertex by means of an edge, there is a
cycle v1v2v3v4v5v1 that is a pentagon. This pentagon divides the plane of the paper into two
regions, one inside and the other outside, Figure 6.3(a). Since vertex v1 is to be connected to
v3 by means of an edge, this edge may be drawn inside or outside the pentagon (without
intersecting the five edges drawn previously). Suppose we choose to draw the line from v1 to
v3 inside the pentagon, Figure 6.3(b). In case we choose outside, we end with the same
argument. Now we have to draw an edge from v2 to v4 and another from v2 to v5. Since neither
of these edges can be drawn inside the pentagon without crossing over the edge already drawn,
we draw both these edges outside the pentagon, Figure 6.3(c).

The edge connecting v3 and v5 cannot be drawn outside the pentagon without crossing the
edge between v2 and v4. Therefore, v3 and v5 have to be connected with an edge inside the
pentagon, Figure 6.3(d). Now, we have to draw an edge between v1 and v4 and this cannot be
placed inside or outside the pentagon without a crossover. Thus, the graph cannot be embedded
in a plane.

Theorem 6.2 The complete bipartite graph K3,3 is nonplanar.

Proof The complete bipartite graph has six vertices and nine edges. Let the vertices be u1, u2,
u3, v1, v2, v3. We have edges from every ui to each vi , 1 ≤ i ≤ 3. First we take the edges from
u1 to each v1, v2 and v3. Then we take the edges between u2 to each v1, v2 and v3, Figure 6.4(a).
Thus we get three regions namely I, II and III. Finally we have to draw the edges between u3
to each v1, v2 and v3. We can draw the edge between u3 and v3 inside the region II without any
crossover, Figure 6.4(b). But the edges between u3 and v1, and u3 and v2 drawn in any region
have a crossover with the previous edges. Thus the graph cannot be embedded in a plane. Hence
K3,3 is nonplanar.

We observe that the two graphs K5 and K3,3 have the following common properties. 1. Both are
regular. 2. Both are nonplanar. 3. Removal of one edge or a vertex makes each a planar graph.
4. K5 is a nonplanar graph with the smallest number of vertices, and K3,3 is the nonplanar graph
with smallest number of edges. Thus both are the simplest nonplanar graphs. The following
result given independently by Fary [77] and Wagner [260] implies that there is no need to bend
edges in drawing a planar graph to avoid edge intersections.

6.4 DUAL GRAPHS AND DUALITY THEOREMS

In the mathematical discipline of graph theory, the dual graph of a plane graph G is a graph
that has a vertex corresponding to each face of G, and an edge joining two neighboring faces
for each edge in G. The term "dual" is used because this property is symmetric, meaning that
if H is a dual of G, then G is a dual of H (if G is connected). The same notion of duality may
also be used for more general embeddings of graphs in manifolds. The notion described in this
page is different from the edge-to-vertex dual (Line graph) of a graph and should not be
confused with it.

Fig: 6.4.1 The red graph is the dual graph of the blue graph.
• Properties

Fig: 6.4.2 Two red graphs are duals for the blue one, but they are not isomorphic.

The dual of a plane graph is a plane multigraph - multiple edges.[1] If G is a connected plane
graph and if G′ is the dual of G then G is isomorphic to the dual of G′.Since the dual graph
depends on a particular embedding, the dual graph of a planar graph is not unique in the sense
that the same planar graph can have non-isomorphic dual graphs. In the picture, the red graphs
are not isomorphic because the upper one has a vertex with degree 6 (the outer face). However,
if the graph is 3-connected, then Whitney showed that the embedding, and thus the dual graph,
is unique.[2] Because of the duality, any result involving counting faces and ertices can be
dualized by exchanging them.

• Algebraic dual
Let G be a connected graph. An algebraic dual of G is a graph G★ such that G and G★ have
the same set of edges, any cycle of G is a cut of G★, and any cut of G is a cycle of G★. Every
planar graph has an algebraic dual, which is in general not unique (any dual defined by a plane
embedding will do). The converse is actually true, as settled by Hassler Whitney in the
Whitney's planarity criterion:[3]A connected graph G is planar if and only if it has an algebraic
dual. The same fact can be expressed in the theory of matroids: if M is the graphic matroid of
a graph G, then the dual matroid of M is a graphic matroid if and only if G is planar. If G is
planar, the dual matroid is the graphic matroid of the dual graph of G.

• Weak dual
The weak dual of a plane graph is the subgraph of the dual graph whose vertices correspond to
the bounded faces of the primal graph. A plane graph is outerplanar if and only if its weak dual
is a forest, and a plane graph is a Halin graph if and only if its weak dual is biconnected and
outerplanar. For any plane graph G, let G+ be the plane multigraph formed by adding a single
new vertex v in the unbounded face of G, and connecting v to each vertex of the outer face
(multiple times, if a vertex appears multiple times on the boundary of the outer face); then, G is
the weak dual of the (plane) dual of G+.[4][5]

6.5 SUMMARY

In this unit we have discussed about the Planarity and planar embeddings, Euler's formula and
Kuratowski's theorem, Dual graphs and duality theorems. A planar graph is a graph which can
be drawn in the plane without any edges crossing. Some pictures of a planar graph might have
crossing edges, but it’s possible to redraw the picture to eliminate the crossings.

6.6 KEYWORDS

• bipartite graph
• nonplanar
• geometric graph
• nondegenerate drawing
6.7 QUESTIONS FOR SELF STUDY

1. What is a planar graph? Explain

2. Briefly explain Planarity and planar embeddings
3. Discuss Planar vs. plane.
4. Explain Euler’s Theorem
5. Explain Kuratowski’s Two Graphs
6. Explain the Euler's formula and Kuratowski's theorem
7. What is weak dual of a plane graph?
8. Discuss in detail about Dual graphs and duality theorems

6.8 REFERENCES

• Akiyama, J. and Kano, M., Factors and factorization of graphs, J. Graph Theory 9
• Alavi, A. and Behzad, M., Complementary graphs and edge-chromatic numbers, SIAM J.
Apl. Math.
• Alspach, B. and Reid, K. B., Degree frequencies in digraphs and tournaments, J. Graph
Theory
• Anderson, I., Perfect matching of a graph, J. Combin. Theory Ser. B 10
• Appel, K. and Haken, W., Every planar map is four colorable, Bull. Amer. Math. Soc. 82.
• Avery, P., Score sequences in oriented graphs, J. Graph Theory 15,
• Balakrishnan, R. and Ranganathan, K., A Textbook of Graph Theory, Springer-Verlag,
UNIT 7: COLORING OF GRAPHS

Structure:

7.0 Objectives
7.1 Introduction
7.2 Graph Coloring
7.3 Greedy Algorithm
7.4 Four-color theorem
7.5 Summary
7.6 Keywords
7.7 Questions
7.8 Reference
7.0 OBJECTIVES
At the end of this unit, you will be able to

• Describe Graph Coloring

• Discuss Greedy Algorithm
• Explain Four-color theorem

7.1 INTRODUCTION

A graph coloring is an assignment of labels, called colors, to the vertices of a graph such that
no two adjacent vertices share the same color. The chromatic number χ(G) of a graph G is
the minimal number of colors for which such an assignment is possible. Other types of
colorings on graphs also exist, most notably edge colorings that may be subject to various
constraints.

The study of graph colorings has historically been linked closely to that of planar graphs and
the four color theorem, which is also the most famous graph coloring problem. That problem
provided the original motivation for the development of algebraic graph theory and the study
of graph invariants such as those discussed on this page. In modern times, many open problems
in algebraic graph theory deal with the relation between chromatic polynomials and their
graphs. Applications for solved problems have been found in areas such as computer science,
information theory, and complexity theory. Many day-to-day problems, like minimizing
conflicts in scheduling, are also equivalent to graph colorings.

7.2 GRAPH COLORING

Graph coloring problem is to assign colors to certain elements of a graph subject to certain
constraints.
Vertex coloring is the most common graph coloring problem. The problem is, given m colors,
find a way of coloring the vertices of a graph such that no two adjacent vertices are colored
using same color. The other graph coloring problems like Edge Coloring (No vertex is incident
to two edges of same color) and Face Coloring (Geographical Map Coloring) can be
transformed into vertex coloring.
Algorithm for graph coloring
Algorithm GRAPH COLORING(G, COLOR, i)
Description: Solve the graph coloring problem using backtracking
//Input: Graph G with n vertices, list of colors, initial vertex i
COLOR(1...n] is the array of n different colors
//Output: Colored graph with minimum color
if CHECK_VERTEX(i)==1 then
if i == N then
print COLOR[1...n]
else
j <- 1
while (S<=M) do
COLOR(i+1) <- j
j +j <- 1
end
end
end
Function CHECK_VERTEX(i)
for j <- 1to i -1 do
if Adjacent(i,j) then
if COLOR(i)==COLOR(j) then
return 0
end
end
end
return 1
Chromatic Number: The smallest number of colors needed to color a graph G is called its
chromatic number. For example, the following can be colored minimum 2 colors.

Chromatic number of this graph is 2 because in this above diagram we can use to color red
and green .

so chromatic number of this graph is 2 and is denoted x(G) ,means x(G)=2 .

Chromatic number define as the least no of colors needed for coloring the graph .

and types of chromatic number are:

1) Cycle graph

2) planar graphs

3) Complete graphs

4) Bipartite Graphs:

5) Trees

The problem to find chromatic number of a given graph is NP Complete.

The chromatic number is denoted by X(G). Finding the chromatic number for the graph is NP-
complete problem. Graph coloring problem is both, a decision problem as well as an
optimization problem. A decision problem is stated as, “With given M colors and graph G,
whether a such color scheme is possible or not?”.
The optimization problem is stated as, “Given M colors and graph G, find the minimum number
of colors required for graph coloring.” Graph coloring problem is a very interesting problem
of graph theory and it has many diverse applications.

Applications of Graph Coloring:

The graph coloring problem has huge number of applications.

1) Making Schedule or Time Table: Suppose we want to make am exam schedule for a
university. We have list different subjects and students enrolled in every subject. Many subjects
would have common students (of same batch, some backlog students, etc). How do we schedule
the exam so that no two exams with a common student are scheduled at same time? How many
minimum time slots are needed to schedule all exams? This problem can be represented as a
graph where every vertex is a subject and an edge between two vertices mean there is a common
student. So this is a graph coloring problem where minimum number of time slots is equal to
the chromatic number of the graph.
2) Mobile Radio Frequency Assignment: When frequencies are assigned to towers,
frequencies assigned to all towers at the same location must be different. How to assign
frequencies with this constraint? What is the minimum number of frequencies needed? This
problem is also an instance of graph coloring problem where every tower represents a vertex
and an edge between two towers represents that they are in range of each other.
3) Sudoku: Sudoku is also a variation of Graph coloring problem where every cell represents
a vertex. There is an edge between two vertices if they are in same row or same column or
same block.
4) Register Allocation: In compiler optimization, register allocation is the process of assigning
a large number of target program variables onto a small number of CPU registers. This problem
is also a graph coloring problem.
5) Bipartite Graphs: We can check if a graph is Bipartite or not by coloring the graph using
two colors. If a given graph is 2-colorable, then it is Bipartite, otherwise not. See this for more
details.
6) Map Coloring: Geographical maps of countries or states where no two adjacent cities cannot
be assigned same color. Four colors are sufficient to color any map (See Four Color Theorem)

7.3 GREEDY ALGORITHM

We introduced graph coloring and applications in previous topic. As discussed in the previous
topic, graph coloring is widely used. Unfortunately, there is no efficient algorithm available for
coloring a graph with minimum number of colors as the problem is a known NP Complete
problem. There are approximate algorithms to solve the problem though. Following is the basic
Greedy Algorithm to assign colors. It doesn’t guarantee to use minimum colors, but it
guarantees an upper bound on the number of colors. The basic algorithm never uses more than
d+1 colors where d is the maximum degree of a vertex in the given graph.
Basic Greedy Coloring Algorithm:
1. Color first vertex with first color.
2. Do following for remaining V-1 vertices.
….. a) Consider the currently picked vertex and color it with the
lowest numbered color that has not been used on any previously
colored vertices adjacent to it. If all previously used colors
appear on vertices adjacent to v, assign a new color to it.

Following is the implementation of the above Greedy Algorithm.

// A C++ program to implement greedy algorithm for graph coloring

#include <iostream>
#include <list>
using namespace std;
// A class that represents an undirected graph
class Graph
{
int V; // No. of vertices
list<int> *adj; // A dynamic array of adjacency lists
public:
// Constructor and destructor
Graph(int V) { this->V = V; adj = new list<int>[V]; }
~Graph() { delete [] adj; }

// function to add an edge to graph

void addEdge(int v, int w);

// Prints greedy coloring of the vertices

void greedyColoring();
};

void Graph::addEdge(int v, int w)

{
adj[v].push_back(w);
adj[w].push_back(v); // Note: the graph is undirected
}
// Assigns colors (starting from 0) to all vertices and prints
// the assignment of colors
void Graph::greedyColoring()
{
int result[V];
// Assign the first color to first vertex
result[0] = 0;
// Initialize remaining V-1 vertices as unassigned
for (int u = 1; u < V; u++)
result[u] = -1; // no color is assigned to u
// A temporary array to store the available colors. True
// value of available[cr] would mean that the color cr is
// assigned to one of its adjacent vertices
bool available[V];
for (int cr = 0; cr < V; cr++)
available[cr] = false;

// Assign colors to remaining V-1 vertices

for (int u = 1; u < V; u++)
{
// Process all adjacent vertices and flag their colors
// as unavailable
list<int>::iterator i;
for (i = adj[u].begin(); i != adj[u].end(); ++i)
if (result[*i] != -1)
available[result[*i]] = true;

// Find the first available color

int cr;
for (cr = 0; cr < V; cr++)
if (available[cr] == false)
break;
result[u] = cr; // Assign the found color
// Reset the values back to false for the next iteration
for (i = adj[u].begin(); i != adj[u].end(); ++i)
if (result[*i] != -1)
available[result[*i]] = false;
}
// print the result
for (int u = 0; u < V; u++)
cout << "Vertex " << u << " ---> Color "
<< result[u] << endl;
}

// Driver program to test above function

int main()

{
Graph g1(5);
g1.addEdge(0, 1);
g1.addEdge(0, 2);
g1.addEdge(1, 2);
g1.addEdge(1, 3);
g1.addEdge(2, 3);
g1.addEdge(3, 4);
cout << "Coloring of graph 1 \n";
g1.greedyColoring();
Graph g2(5);
g2.addEdge(0, 1);
g2.addEdge(0, 2);
g2.addEdge(1, 2);
g2.addEdge(1, 4);
g2.addEdge(2, 4);
g2.addEdge(4, 3);
cout << "\nColoring of graph 2 \n";
g2.greedyColoring();
return 0;
}

Output:
Coloring of graph 1

Vertex 0 ---> Color 0

Vertex 1 ---> Color 1

Vertex 2 ---> Color 2

Vertex 3 ---> Color 0

Vertex 4 ---> Color 1

Coloring of graph 2

Vertex 0 ---> Color 0

Vertex 1 ---> Color 1

Vertex 2 ---> Color 2

Vertex 3 ---> Color 0

Vertex 4 ---> Color 3

Time Complexity: O(V^2 + E) in worst case.

Space Complexity : O(1) ,as we are not using any extra space.

Analysis of Basic Algorithm

The above algorithm doesn’t always use minimum number of colors. Also, the number of
colors used sometime depend on the order in which vertices are processed. For example,
consider the following two graphs. Note that in graph on right side, vertices 3 and 4 are
swapped. If we consider the vertices 0, 1, 2, 3, 4 in left graph, we can color the graph using 3
colors. But if we consider the vertices 0, 1, 2, 3, 4 in right graph, we need 4 colors.
So the order in which the vertices are picked is important. Many people have suggested
different ways to find an ordering that work better than the basic algorithm on average. The
most common is Welsh–Powell Algorithm which considers vertices in descending order of
degrees.
How does the basic algorithm guarantee an upper bound of d+1?
Here d is the maximum degree in the given graph. Since d is maximum degree, a vertex cannot
be attached to more than d vertices. When we color a vertex, at most d colors could have already
been used by its adjacent. To color this vertex, we need to pick the smallest numbered color
that is not used by the adjacent vertices. If colors are numbered like 1, 2, …., then the value of
such smallest number must be between 1 to d+1 (Note that d numbers are already picked by
adjacent vertices).

7.4 FOUR COLOR THEOREM

In 1852, Francis Guthrie, a student of Augustus De Morgan, a notable British mathematician
and logician, proposed the 4-color problem. He defined the problem in terms of maps that
meet specific requirements, such as not having any holes and connecting every region (e.g.
country or state) so that no region exists in two or more non-contiguous sections.

Guthrie asserted that with such maps, no more than four colors would be required to color
the map so that no two adjacent parts were the same color.
If the regions of a Map M are colored so that adjacent regions are different , then no more
than 4 colors are required.
Every planar graph is 4-colorable (Vertex Coloring) but when a triangle is a graph or sub-
graph we need only 3 colors .
Mathematicians had been attempting for years to come up with a sophisticated proof (of four
color theorem) along the lines of the Six Color Theorem or the Five Color Theorem, and
using the brute force method almost appeared like hacking the process.
Every planar graph can be colored in four different ways.
Vertices and edges are found in graphs. We want adjacent vertices/ regions to be of different
colors.

How to color ?
Take any map and divide it into a set of connected regions : R 1,R2 … Rn with continuous
boundaries.
There must be some way to assign each region Ri -> in the set {R, G, B, Y} , such that if
two regions Ri and Rj are “touching” (i.e. they share some nonzero length of boundary
between them), they must receive different colors.
Example –
1.. The four-color map is shown below :

A planar map

Here, as you can see, every region that touches another region has a different color than the
touching one & we required a total of a maximum of four colors to color this map – Red,
Green, Blue & yellow.

2. The transformation of an uncolored Map G into a colored Map is shown below –

Map G

Here you can see that every region that touches another region has a different color than the
touching one & we required a total of maximum four colors to color this map – Red, Green,
Blue & yellow.

3. The transformation of an uncolored Map H into a colored Map is shown below –

Map H

Here also you can see that every region that touches another region has a different color than
the touching one & we required a total of a maximum of four colors to color this map – Red,
Green, Blue & yellow.

Kuratowski’s Theorem :
Kuratowski established the theorem establishing a necessary and sufficient condition for
planarity in 1930. The theorem states that –
"If G is non planar if and only if G contains a sub-graph that is a subdivision
of either K3,3 or K5."
To prove this theorem, we’ll go through some definitions and make sure that both K3,3 and
K5 are non-planar. Let’s have a look at K3,3.
Proposition 1 – K3,3 is not planar.
Proof :
Now, we will prove it by contradiction.
Say, to the contrary, that K 3,3 is planar. Then there is a plane embedding of K 3,3 satisfying :
Then, by Euler’s Formula : v − e + f = 2, where v = total vertices, e = no of edges , f =
total faces.

(a) K3,3 Graph (b) K5 Graph

In figure (a), the bi-partite graph : v= 6 and e= 9.

As K3,3 is bipartite, there are no 3-cycles in it(odd cycles can be there in it).
So, each face of the embedding must be bounded by at least 4 edges from K 3,3.
Moreover, each edge is counted twice among the boundaries for faces.
Hence, we must have : f ≤2 *e/4
⇒ f ≤ e/2
⇒ f ≤ 4.5.
Now put this data in the Euler’s formula :we get : 2 =v−e+f
⇒ 2 ≤ 6−9 + 4.5
⇒ 2 ≤ 1.5, which is obviously false.
So, we can say that K3,3 is a non-planar graph.
Proposition 2 – K5 is not planar.
Proof :
Every planar graph must follow : e ≤ 3v − 6 (corollary of Euler’s formula)
For graph (b) in the above diagram, e = 10 and v = 5.
LHS : e = 10
RHS : 3*v – 6 = 15 – 6 = 9
⇒ 10 ≤ 9, which is not true.
So, we can say that K5 is a non-planar graph.
Example :
1. Prove that : A planar graph’s sub-graphs are all planar.
Proof :
Let G be the graph & P be its sub-graph.
There exists a planar embedding of G, if G is planar. In the planar embedding of G, we can
locate the vertices and edges of every sub-graph P of G.
This is how a planar embedding of P is created.
2. A non-planar graph’s subdivisions are all non-planar.
Proof :
Assume that for G, a planar embedding of its subdivision, P, exists.
We acquire a planar embedding of G and find G planar when we remove the vertices
formed in edge-subdivisions and reconstruct the original edge (without affecting the shape
and position of the path).
As a result, if G is non-planar, so is every subdivision (P) of G .

7.5 SUMMARY

In this unit we have discussed in detail about graph coloring also covered the concepts of
chromatic number. At the end of this unit also dealt in detail about greedy algorithm and four-
color theorem.

7.6 KEYWORDS

Graph coloring, Edge coloring, Vertex coloring and Planar graph.

7.7 QUESTIONS

1. Write the algorithm for graph coloring.

2. Discuss greedy algorithm.
3. Write a short note on chromatic number.
4. Describe four color theorem.
5. Write the applications of graph coloring.

7.8 REFERENCES

1. "Introduction to Graph Theory" by Douglas B. West

2. "Graph Theory and Its Applications" by Jonathan L. Gross and Jay Yellen
3. "Networks, Crowds, and Markets: Reasoning About a Highly Connected World" by
David Easley and Jon Kleinberg.
UNIT 8: GRAPH ALGORITHMS ON STRINGS

Structure:

8.0 Objectives
8.1 Introduction
8.2 String matching algorithms
8.3 Knuth-Morris-Pratt algorithm
8.4 Boyer-Moore algorithm
8.5 Summary
8.6 Keywords
8.7 Questions
8.8 Reference

8.0 OBJECTIVES
At the end of this unit, you will be able to
• Explain String matching algorithms
• Discuss Knuth-Morris-Pratt algorithm
• Elucidate Boyer-Moore algorithm

8.1 INTRODUCTION

String matching algorithms have greatly influenced computer science and play an essential role
in various real-world problems. It helps in performing time-efficient tasks in multiple domains.
These algorithms are useful in the case of searching a string within another string. String
matching is also used in the Database schema, Network systems.
Let us look at a few string-matching algorithms before proceeding to their applications in real
world. String Matching Algorithms can broadly be classified into two types of algorithms –
1. Exact String-Matching Algorithms
2. Approximate String-Matching Algorithms
8.2 STRING MATCHING ALGORITHMS

Exact String-Matching Algorithms:

Exact string-matching algorithms is to find one, several, or all occurrences of a defined string
(pattern) in a large string (text or sequences) such that each matching is perfect. All alphabets
of patterns must be matched to corresponding matched subsequence. These are further
classified into four categories:

1. Algorithms based on character comparison:

• Naive Algorithm: It slides the pattern over text one by one and check
for a match. If a match is found, then slides by 1 again to check for
subsequent matches.
• KMP (Knuth Morris Pratt) Algorithm: The idea is whenever a
mismatch is detected, we already know some of the characters in the text
of the next window. So, we take advantage of this information to avoid
matching the characters that we know will anyway match.
• Boyer Moore Algorithm: This algorithm uses best heuristics of Naive
and KMP algorithm and starts matching from the last character of the
pattern.
• Using the Trie data structure: It is used as an efficient information
retrieval data structure. It stores the keys in form of a balanced BST.
2. Deterministic Finite Automaton (DFA) method:
• Automaton Matcher Algorithm: It starts from the first state of the
automata and the first character of the text. At every step, it considers
next character of text, and look for the next state in the built finite
automata and move to a new state.
3. Algorithms based on Bit (parallelism method):
• Aho-Corasick Algorithm: It finds all words in O(n + m + z) time where
n is the length of text and m be the total number characters in all words
and z is total number of occurrences of words in text. This algorithm
forms the basis of the original Unix command fgrep.
4. Hashing-string matching algorithms:
• Rabin Karp Algorithm: It matches the hash value of the pattern with
the hash value of current substring of text, and if the hash values match
then only it starts matching individual characters.
Approximate String-Matching Algorithms:
Approximate String-Matching Algorithms (also known as Fuzzy String Searching) searches
for substrings of the input string. More specifically, the approximate string matching approach
is stated as follows: Suppose that we are given two strings, text T[1…n] and pattern P[1…m].
The task is to find all the occurrences of patterns in the text whose edit distance to the pattern
is at most k. Some well-known edit distances are – Levenshtein edit distance and Hamming
edit distance.
These techniques are used when the quality of the text is low, there are spelling errors in the
pattern or text, finding DNA subsequences after mutation, heterogeneous databases, etc. Some
approximate string-matching algorithms are:

• Naive Approach: It slides the pattern over text one by one and check for
approximate matches. If they are found, then slides by 1 again to check for
subsequent approximate matches.
• Sellers Algorithm (Dynamic Programming)
• Shift or Algorithm (Bitmap Algorithm)
Applications of String-Matching Algorithms:
• Plagiarism Detection: The documents to be compared are decomposed into string
tokens and compared using string matching algorithms. Thus, these algorithms are
used to detect similarities between them and declare if the work is plagiarized or
original.

• Bioinformatics and DNA Sequencing: Bioinformatics involves applying

information technology and computer science to problems involving genetic
sequences to find DNA patterns. String matching algorithms and DNA analysis are
both collectively used for finding the occurrence of the pattern
set.
• Digital Forensics: String matching algorithms are used to locate specific text
strings of interest in the digital forensic text, which are useful for the
investigation.
• Spelling Checker: Trie is built based on a predefined set of patterns. Then, this
trie is used for string matching. The text is taken as input, and if any such pattern
occurs, it is shown by reaching the acceptance state.

• Spam filters: Spam filters use string matching to discard the spam. For example,
to categorize an email as spam or not, suspected spam keywords are searched in
the content of the email by string matching algorithms. Hence, the content is
classified as spam or

not.
• Search engines or content search in large databases: To categorize and
organize data efficiently, string matching algorithms are used. Categorization is
done based on the search keywords. Thus, string matching algorithms make it
easier for one to find the information they are searching

for.
• Intrusion Detection System: The data packets containing intrusion-related
keywords are found by applying string matching algorithms. All the malicious
code is stored in the database, and every incoming data is compared with stored
data. If a match is found, then the alarm is generated. It is based on exact string
matching algorithms where each intruded packet must be detected.

8.3 KNUTH-MORRIS-PRATT ALGORITHM

KMP Algorithm for Pattern Searching

Given a text txt[0 . . . N-1] and a pattern pat[0 . . . M-1], write a function search(char pat[],
char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that N > M.
Examples:
Input: txt[] = “THIS IS A TEST TEXT”, pat[] = “TEST”
Output: Pattern found at index 10
Input: txt[] = “AABAACAADAABAABA”
pat[] = “AABA”
Output: Pattern found at index 0, Pattern found at index 9, Pattern found at index 12

Arrivals of pattern in the text

Pattern searching is an important problem in computer science. When we do search for a string
in a notepad/word file or browser or database, pattern-searching algorithms are used to show
the search results.

We have discussed the Naive pattern-searching algorithm in the previous post. The worst
case complexity of the Naive algorithm is O(m(n-m+1)). The time complexity of the KMP
algorithm is O(n+m) in the worst case.
KMP (Knuth Morris Pratt) Pattern Searching:
The Naive pattern-searching algorithm doesn’t work well in cases where we see many
matching characters followed by a mismatching character.
Examples:
1) txt[] = “AAAAAAAAAAAAAAAAAB”, pat[] = “AAAAB”
2) txt[] = “ABABABCABABABCABABABC”, pat[] = “ABABAC” (not a worst case, but a
bad case for Naive)
The KMP matching algorithm uses degenerating property (pattern having the same sub-
patterns appearing more than once in the pattern) of the pattern and improves the worst-case
complexity to O(n+m).
The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some
matches), we already know some of the characters in the text of the next window. We take
advantage of this information to avoid matching the characters that we know will anyway
match.

Matching Overview
txt = “AAAAABAAABA”
pat = “AAAA”
We compare first window of txt with pat
txt = “AAAAABAAABA”
pat = “AAAA” [Initial position]
We find a match. This is same as Naive String Matching.
In the next step, we compare next window of txt with pat.
txt = “AAAAABAAABA”
pat = “AAAA” [Pattern shifted one position]
This is where KMP does optimization over Naive. In this second window, we only compare
fourth A of pattern
with fourth character of current window of text to decide whether current window matches or
not. Since we know
first three characters will anyway match, we skipped matching first three characters.

Need of Preprocessing?
An important question arises from the above explanation, how to know how many characters
to be skipped. To know this,
we pre-process pattern and prepare an integer array lps[] that tells us the count of
characters to be skipped

Preprocessing Overview:
• KMP algorithm preprocesses pat[] and constructs an auxiliary lps[] of
size m (same as the size of the pattern) which is used to skip characters while
matching.
• Name lps indicates the longest proper prefix which is also a suffix. A proper
prefix is a prefix with a whole string not allowed. For example, prefixes of “ABC”
are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A” and “AB”. Suffixes of
the string are “”, “C”, “BC”, and “ABC”.
• We search for lps in subpatterns. More clearly we focus on sub-strings of patterns
that are both prefix and suffix.
• For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores the length of the
maximum matching proper prefix which is also a suffix of the sub-pattern
pat[0..i].
lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i].

Note: lps[i] could also be defined as the longest prefix which is also a proper suffix. We need
to use it properly in one place to make sure that the whole substring is not considered.
Examples of lps[] construction:

For the pattern “AAAA”, lps[] is [0, 1, 2, 3]

For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]

For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]

For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]

For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3]

Preprocessing Algorithm:
In the preprocessing part,

• We calculate values in lps[]. To do that, we keep track of the length of the longest
prefix suffix value (we use len variable for this purpose) for the previous index
• We initialize lps[0] and len as 0.
• If pat[len] and pat[i] match, we increment len by 1 and assign the incremented
value to lps[i].
• If pat[i] and pat[len] do not match and len is not 0, we update len to lps[len-1]
• See computeLPSArray() in the below code for details
Illustration of preprocessing (or construction of lps[]):
pat[] = “AAACAAAA”

=> len = 0, i = 0:
• lps[0] is always 0, we move to i = 1
=> len = 0, i = 1:
• Since pat[len] and pat[i] match, do len++,
• store it in lps[i] and do i++.
• Set len = 1, lps[1] = 1, i = 2
=> len = 1, i = 2:
• Since pat[len] and pat[i] match, do len++,
• store it in lps[i] and do i++.
• Set len = 2, lps[2] = 2, i = 3
=> len = 2, i = 3:
• Since pat[len] and pat[i] do not match, and len > 0,
• Set len = lps[len-1] = lps[1] = 1
=> len = 1, i = 3:
• Since pat[len] and pat[i] do not match and len > 0,
• len = lps[len-1] = lps[0] = 0
=> len = 0, i = 3:
• Since pat[len] and pat[i] do not match and len = 0,
• Set lps[3] = 0 and i = 4
=> len = 0, i = 4:
• Since pat[len] and pat[i] match, do len++,
• Store it in lps[i] and do i++.
• Set len = 1, lps[4] = 1, i = 5
=> len = 1, i = 5:
• Since pat[len] and pat[i] match, do len++,
• Store it in lps[i] and do i++.
• Set len = 2, lps[5] = 2, i = 6
=> len = 2, i = 6:
• Since pat[len] and pat[i] match, do len++,
• Store it in lps[i] and do i++.
• len = 3, lps[6] = 3, i = 7
=> len = 3, i = 7:
• Since pat[len] and pat[i] do not match and len > 0,
• Set len = lps[len-1] = lps[2] = 2
=> len = 2, i = 7:
• Since pat[len] and pat[i] match, do len++,
• Store it in lps[i] and do i++.
• len = 3, lps[7] = 3, i = 8
We stop here as we have constructed the whole lps[].

Implementation of KMP algorithm:

Unlike the Naive algorithm, where we slide the pattern by one and compare all characters at
each shift, we use a value from lps[] to decide the next characters to be matched. The idea is to
not match a character that we know will anyway match.
How to use lps[] to decide the next positions (or to know the number of characters to be
skipped)?

• We start the comparison of pat[j] with j = 0 with characters of the current window
of text.
• We keep matching characters txt[i] and pat[j] and keep incrementing i and j while
pat[j] and txt[i] keep matching.
• When we see a mismatch
• We know that characters pat[0..j-1] match with txt[i-j…i-1] (Note that j
starts with 0 and increments it only when there is a match).
• We also know (from the above definition) that lps[j-1] is the count of
characters of pat[0…j-1] that are both proper prefix and suffix.
• From the above two points, we can conclude that we do not need to
match these lps[j-1] characters with txt[i-j…i-1] because we know that
these characters will anyway match. Let us consider the above example
to understand this.
Below is the illustration of the above algorithm:

Consider txt[] = “AAAAABAAABA“, pat[] = “AAAA“

If we follow the above LPS building process then lps[] = {0, 1, 2, 3}
-> i = 0, j = 0: txt[i] and pat[j] match, do i++, j++
-> i = 1, j = 1: txt[i] and pat[j] match, do i++, j++
-> i = 2, j = 2: txt[i] and pat[j] match, do i++, j++
-> i = 3, j = 3: txt[i] and pat[j] match, do i++, j++
-> i = 4, j = 4: Since j = M, print pattern found and reset j, j = lps[j-1] = lps[3] = 3
Here unlike Naive algorithm, we do not match first three
characters of this window. Value of lps[j-1] (in above step) gave us index of next character to
match.

-> i = 4, j = 3: txt[i] and pat[j] match, do i++, j++

-> i = 5, j = 4: Since j == M, print pattern found and reset j, j = lps[j-1] = lps[3] = 3
Again unlike Naive algorithm, we do not match first three characters of this window. Value of
lps[j-1] (in above step) gave us index of next character to match.
-> i = 5, j = 3: txt[i] and pat[j] do NOT match and j > 0, change only j. j = lps[j-1] = lps[2]
=2
-> i = 5, j = 2: txt[i] and pat[j] do NOT match and j > 0, change only j. j = lps[j-1] = lps[1]
=1
-> i = 5, j = 1: txt[i] and pat[j] do NOT match and j > 0, change only j. j = lps[j-1] = lps[0]
=0
-> i = 5, j = 0: txt[i] and pat[j] do NOT match and j is 0, we do i++.
-> i = 6, j = 0: txt[i] and pat[j] match, do i++ and j++
-> i = 7, j = 1: txt[i] and pat[j] match, do i++ and j++
We continue this way till there are sufficient characters in the text to be compared with the
characters in the pattern…

8.4 BOYER-MOORE ALGORITHM

Boyer Moore Algorithm for Pattern Searching

Pattern searching is an important problem in computer science. When we do search for a
string in a notepad/word file, browser, or database, pattern searching algorithms are used to
show the search results. A typical problem statement would be-
Given a text txt[0..n-1] and a pattern pat[0..m-1] where n is the length of the text and m is the
length of the pattern, write a function search(char pat[], char txt[]) that prints all occurrences
of pat[] in txt[]. You may assume that n > m.
Examples:

Input: txt[] = "THIS IS A TEST TEXT"

pat[] = "TEST"

Output: Pattern found at index 10

Input: txt[] = "AABAACAADAABAABA"

pat[] = "AABA"

Output: Pattern found at index 0

Pattern found at index 9

Pattern found at index 12

In this post, we will discuss the Boyer Moore pattern searching algorithm.
Like KMP and Finite Automata algorithms, Boyer Moore algorithm also preprocesses the
pattern.
Boyer Moore is a combination of the following two approaches.
1. Bad Character Heuristic
2. Good Suffix Heuristic
Both of the above heuristics can also be used independently to search a pattern in a text. Let us
first understand how two independent approaches work together in the Boyer Moore algorithm.
If we take a look at the Naive algorithm, it slides the pattern over the text one by one. KMP
algorithm does preprocessing over the pattern so that the pattern can be shifted by more than
one. The Boyer Moore algorithm does preprocessing for the same reason. It processes the
pattern and creates different arrays for each of the two heuristics. At every step, it slides the
pattern by the max of the slides suggested by each of the two heuristics. So it uses greatest
offset suggested by the two heuristics at every step.
Unlike the previous pattern searching algorithms, the Boyer Moore algorithm starts
matching from the last character of the pattern.
In this post, we will discuss the bad character heuristic and the Good Suffix heuristic in the
next post.
Bad Character Heuristic
The idea of bad character heuristic is simple. The character of the text which doesn’t match
with the current character of the pattern is called the Bad Character. Upon mismatch, we
shift the pattern until –
1. The mismatch becomes a match.
2. Pattern P moves past the mismatched character.
Case 1 – Mismatch become match
We will lookup the position of the last occurrence of the mismatched character in the pattern,
and if the mismatched character exists in the pattern, then we’ll shift the pattern such that it
becomes aligned to the mismatched character in the text T.

case 1

Explanation: In the above example, we got a mismatch at position 3. Here our mismatching
character is “A”. Now we will search for last occurrence of “A” in pattern. We got “A” at
position 1 in pattern (displayed in Blue) and this is the last occurrence of it. Now we will shift
pattern 2 times so that “A” in pattern get aligned with “A” in text.
Case 2 – Pattern move past the mismatch character
We’ll lookup the position of last occurrence of mismatching character in pattern and if
character does not exist we will shift pattern past the mismatching character.
case2

Explanation:
Here we have a mismatch at position 7. The mismatching character “C” does not exist in pattern
before position 7 so we’ll shift pattern past to the position 7 and eventually in above example
we have got a perfect match of pattern (displayed in Green). We are doing this because “C”
does not exist in the pattern so at every shift before position 7 we will get mismatch and our
search will be fruitless.

In the following implementation, we pre-process the pattern and store the last occurrence of
every possible character in an array of size equal to alphabet size. If the character is not present
at all, then it may result in a shift by m (length of pattern). Therefore, the bad character heuristic
takes time in the best case.

8.5 SUMMARY

In this unit we have discussed in detail about string matching algorithms and also covered
knuth-morris-pratt algorithm. At the end of this unit also covered in detail about Boyer Moore
algorithm.

8.6 KEYWORDS

String matching algorithms, KMP, Pre-processing, Search engines and Spam filters.
8.7 QUESTIONS

1. Write the applications of string-matching algorithms.

2. Explain exact string-matching algorithm.
3. Discuss approximate string matching algorithm.
4. Describe KMP algorithm for pattern searching.
5. Elucidate Boyer Moore algorithm for pattern searching
8.8 REFERENCES
1. "Introduction to Graph Theory" by Douglas B. West
2. "Graph Theory and Its Applications" by Jonathan L. Gross and Jay Yellen
3. "Networks, Crowds, and Markets: Reasoning About a Highly Connected
World" by David Easley and Jon Kleinberg.
UNIT 9: RANDOM GRAPHS

Structure

9.0 Objective
9.1 Introduction
9.2 Erdös-Renyí model
9.3 Properties of random graph
9.4 Applications of random graph
9.5 Summary
9.6 Keywords
9.7 Questions
9.8 Reference.

9.0 OBJECTIVE

• Describe Erdös-Renyí model

• Discuss Properties of random graph
• Explain Applications of random graph

9.1 INTRODUCTION

In mathematics, random graph is the general term to refer to probability

distributions over graphs. Random graphs may be described simply by a probability
distribution, or by a random process which generates them.[1][2] The theory of random graphs
lies at the intersection between graph theory and probability theory. From a mathematical
perspective, random graphs are used to answer questions about the properties of typical graphs.
Its practical applications are found in all areas in which complex networks need to be
modeled – many random graph models are thus known, mirroring the diverse types of complex
networks encountered in different areas. In a mathematical context, random graph refers
almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph
model may be referred to as a random graph.

9.2 ERDÖS-RENYÍ MODEL

Thus far, we have spent most of our time playing around with empirical networks. However, it
is often very instructive to understand the behavior of networks that are generated using simple
mechanisms. The simplest type of random graph is what is called the Erdös-Renyí Random
Graph. This is what people typically mean when they say “Random Graph” (though, you will
see later, that there are many different ways to be random).

These come in two flavors:

1. G(n,m) model, in which n nodes are randomly connected by m edges.

2. G(n,p) model, in which we have a graph of n nodes, and each pair of nodes has
probability p of being connected.

The main property of an Erdös-Renyí Random Graph is that, given n nodes and m edges (or
probability p of an edge between each pair of nodes), everything else is unconditioned–i.e.,
random.

Both of these types of random graphs can be created using a function called erdos.renyi.game().
Let’s first make a G(n,m) random graph with n = 20 and m = 38, calculate the density and plot
it.

library(igraph)

g1=erdos.renyi.game(20,38,type="gnm")

## IGRAPH 8f5d938 U--- 20 38 -- Erdos renyi (gnm) graph

## + attr: name (g/c), type (g/c), loops (g/l), m (g/n)

## + edges from 8f5d938:

## [1] 1-- 3 2-- 4 4-- 6 5-- 6 1-- 7 5-- 8 6-- 8 3-- 9 6-- 9 1--10

## [11] 8--10 1--11 10--11 5--12 6--12 7--12 4--13 9--13 10--14 12--14

## [21] 4--15 5--15 1--16 7--16 11--16 6--17 11--17 12--17 6--18 7--19

## [31] 13--19 14--19 15--19 18--19 2--20 3--20 11--20 17--20

graph.density(g1)
## [1] 0.2

plot(g1,layout=layout.circle)

This random graph will “look” different each time you run this set of codes—different sets of
nodes will be connected (Unless you’ve used the set.seed() function). However, the density
will always remain the same (#edges/[#dyads] = 38/[20*19/2] = 0.2).

Let’s contrast this now with a G(n,p) random graph with n = 20 and p = 0.2. We’ll print the
number of edges and the graph density, and plot the graph.

g2=erdos.renyi.game(20,0.2,type="gnp")

ecount(g2)

## [1] 32

graph.density(g2)

## [1] 0.1684211

plot(g2,layout=layout.circle)
Your output will look approximately like mine, but it’ll be a bit different. This is because now
the number of edges is a probabilistic outcome of having p = 0.2 chance of each dyad being
connected. This means that if you run this set of codes repeatedly (try it), you will get densities
that hover around 0.2.

9.2 Making ensembles of random graphs

Let’s now try to better understand the behavior of random graphs by creating an ensemble of
100 random graphs with some known property and calculating the mean density of these
graphs. We can do this by using “for-loops”

What we want to do is use the for-loop to calculate densities for 100 random graphs of n = 20
and p = 0.2, and take the mean of these values.

densities=vector(length=100) #set up empty vector

for (i in 1:100){

r=erdos.renyi.game(20,0.2,type="gnp") #random graph

densities[i]=graph.density(r) #store the density of random graph as the ith element of the vect
or

densities #print the resulting vector (I won't show this below) mean(densities) #calculate the
mean density

## [1] 0.1736842 0.2578947 0.2315789 0.2052632 0.2000000 0.1578947 0.2052632

## [8] 0.1894737 0.2210526 0.2157895 0.1684211 0.1789474 0.1736842 0.1631579

## [15] 0.1368421 0.2157895 0.1947368 0.1894737 0.1736842 0.1421053 0.2263158

## [22] 0.2157895 0.1473684 0.2263158 0.2210526 0.2789474 0.2631579 0.1736842

## [29] 0.1578947 0.1789474 0.1684211 0.2157895 0.2052632 0.2421053 0.1789474

## [36] 0.2105263 0.1315789 0.2105263 0.2210526 0.1736842 0.2684211 0.2000000

## [43] 0.1894737 0.2684211 0.2578947 0.2210526 0.2105263 0.1947368 0.1789474

## [50] 0.1894737 0.2000000 0.2263158 0.2052632 0.1736842 0.2157895 0.2157895

## [57] 0.2631579 0.2789474 0.2052632 0.2052632 0.1947368 0.2157895 0.1842105

## [64] 0.2052632 0.2263158 0.2315789 0.1631579 0.2000000 0.1947368 0.1789474

## [71] 0.2315789 0.2105263 0.2315789 0.1578947 0.1789474 0.2105263 0.2315789

## [78] 0.2000000 0.1894737 0.2578947 0.2000000 0.2157895 0.2052632 0.1631579

## [85] 0.1631579 0.2000000 0.1947368 0.1947368 0.1631579 0.1842105 0.2263158

## [92] 0.1947368 0.1894737 0.1789474 0.2052632 0.1894737 0.1894737 0.1631579

## [99] 0.2210526 0.1789474

The result should always be very close to 0.2. You can visualize this data by making a
histogram of the densities of your random graphs, and then compare it to the theoretical average
by drawing a line at density = 0.2. The peak of the histogram should be near the line.

hist(densities)

abline(v=0.2,lwd=3,lty=2,col="red") #draw a vertical line at x = 0.2. Make this line width = 3

, line type = 2 (dashed line), and the line color = red
While we’re at it, let’s visualize a set of random graphs. Here, we are going to first set up the
plotting region using the par() function. We will divide up the plotting region into a 3x3 grid
to accommodate 9 figures.

# Make a plot of 9 random graphs

par(mfrow=c(3,3),mar=c(1,1,1,1)) #the mfrow= argument sets up the number of rows and col
umns within the plotting region. mar= argument sets the margins of the figures:c(bottom,left,t
op,right).

for (i in 1:9){

r=erdos.renyi.game(20,p=0.2)

plot(r,layout=layout.circle,edge.color="black",edge.width=2,vertex.color="red",vertex.label=
"") #a bunch of arguments to make the figure look pretty.

}
9.3 PROPERTIES OF RANDOM GRAPHS

So, why are we fooling around with random graphs anyway? Well, the main reason is that it
serves as a good null hypothesis of what the structure of a basic system of n components
and m connections or p probability of connections look like, all else equal. The great thing
about Erdös-Renyí Random Graphs is that the process to generate them is extremely simple,
and the properties of the resulting system is highly predictable and simple.

An example: we can’t know exactly what the mean degree or density of any given random
graph is, but we can know what the average value of those things are given we make enough
random graphs with the same property.

Here are some basic properties of random graphs of size n and probability of links p:

Let’s confirm this by creating a bunch of random graphs and calculate their average properties.
Let’s do this with a set of random graphs with n = 20 and p = 0.2

#First, create a set of vectors in which you'll store the results of the simulations
m=vector(length=100)

mean.k=vector(length=100)

C.loc=vector(length=100)

C.glob=vector(length=100)

#Now, use a For-loop to create 100 random graphs, each time calculating the m, mean degre
e and clustering coefficient

n=20

p=0.2

for (i in 1:100){

r=erdos.renyi.game(n,p=p)

m[i]=ecount(r)

mean.k[i]=mean(degree(r))

C.loc[i]=transitivity(r,type="localaverage")

C.glob[i]=transitivity(r,type="global")

We can visualize the results with a histogram and a line representing the expected value. We
can plot all four results at once using a par() function, which lets us set some graphical
parameters:

par(mfrow=c(1,3))

hist(m)

abline(v=(n(n-1)/2)p,lty=2,col="red") # expected number of edges, which is simply the num

ber of dyads times p

hist(mean.k)

abline(v=(n-1)p,lty=2,col="red") # expected mean degree, which is (n-1)p

hist(C.glob)
abline(v=p,lty=2,col="red") #expected global clustering coefficient, which is simply p

We can see that the distribution of values fit our expectations

In graph theory, the Erdos–Rényi model is either of two closely related models for generating
random graphs.

There are two closely related variants of the Erdos–Rényi (ER) random graph model.

In the G(n, M) model, a graph is chosen uniformly at random from the collection of all graphs
which have n nodes and M edges. For example, in the G(3, 2) model, each of the three possible
graphs on three vertices and two edges are included with probability 1/3.

In the G(n, p) model, a graph is constructed by connecting nodes randomly. Each edge is
included in the graph with probability p independent from every other edge. Equivalently, all

graphs with n nodes and M edges have equal probability of

A graph generated by the binomial model of Erdos and Rényi (p = 0.01)

The parameter p in this model can be thought of as a weighting function; as p increases from 0
to 1, the model becomes more and more likely to include graphs with more edges and less and
less likely to include graphs with fewer edges. In particular, the case p = 0.5 corresponds to the

case where all graphs on n vertices are chosen with equal probability.

The article will basically deal with the G (n,p) model where n is the no of nodes to be created
and p defines the probability of joining of each node to the other.

Properties of G(n, p)
With the notation above, a graph in G(n, p) has on average edges. The distribution of the
degree of any particular vertex is binomial:

Where n is the total number of vertices in the graph.

Since as and np= constant This distribution is Poisson

for large n and np = const. In a 1960 paper, Erdos and Rényi described the behaviour of G(n,
p) very precisely for various values of p. Their results included that:

• If np < 1, then a graph in G(n, p) will almost surely have no connected components
of size larger than O(log(n)).
• If np = 1, then a graph in G(n, p) will almost surely have a largest component whose
size is of order .
• If np c > 1, where c is a constant, then a graph in G(n, p) will almost surely have
a unique giant component containing a positive fraction of the vertices. No other
component will contain more than O(log(n)) vertices.

• If , then a graph in G(n, p) will almost surely contain isolated vertices,

and thus be disconnected.

• If , then a graph in G(n, p) will almost surely be connected.

Thus is a sharp threshold for the connectedness of G(n, p). Further properties of the graph
can be described almost precisely as n tends to infinity. For example, there is a k(n)
(approximately equal to 2log2(n)) such that the largest clique in G(n, 0.5) has almost surely
either size k(n) or k(n) + 1. Thus, even though finding the size of the largest clique in a graph
is NP-complete, the size of the largest clique in a “typical” graph (according to this model) is
very well understood. Interestingly, edge-dual graphs of Erdos-Renyi graphs are graphs with
nearly the same degree distribution, but with degree correlations and a significantly higher
clustering coefficient.

Next I’ll describe the code to be used for making the ER graph. For implementation of the code
below, you’ll need to install the netwrokx library as well you’ll need to install the matplotlib
library. Following you’ll see the exact code of the graph which has been used as a function of
the networkx library lately in this article.

Erdos_renyi_graph(n, p, seed=None, directed=False)

Returns a G(n,p) random graph, also known as an Erdos-Rényi graph or a binomial graph.
The G(n,p) model chooses each of the possible edges with probability p. The functions
binomial_graph() and erdos_renyi_graph() are aliases of this function.

Parameters: n (int) – The number of nodes.

p (float) – Probability for edge creation.
seed (int, optional) – Seed for random number generator (default=None).
directed (bool, optional (default=False)) – If True, this function returns a directed graph.
Python

#importing the networkx library

>>> import networkx as nx
#importing the matplotlib library for plotting the graph
>> import matplotlib.pyplot as plt
>>> G= nx.erdos_renyi_graph(50,0.5)
>>> nx.draw(G, with_labels=True)

>>> plt.show()
Figure 1: For n=50, p=0.5

The above example is for 50 nodes and is thus a bit unclear.

When considering the case for lesser no of nodes (for example 10), you can clearly see the
difference.

Using the codes for various probabilities, we can see the difference easily:

Python

>>> I= nx.erdos_renyi_graph(10,0)
>>> nx.draw(I, with_labels=True)
>>> plt.show()

Figure 2: For n=10, p=0

Python

>>> K=nx.erdos_renyi_graph(10,0.25)
>>> nx.draw(K, with_labels=True)
>>> plt.show()

Figure 3: For n=10, p=0.25

Python

>>>H= nx.erdos_renyi_graph(10,0.5)
>>> nx.draw(H, with_labels=True)
>>> plt.show()

Figure 4: For n=10, p=0.5

This algorithm runs in O( ) time. For sparse graphs (that is, for small values of p),
fast_gnp_random_graph() is a faster algorithm. Thus the above examples clearly define the use
of erdos renyi model to make random graphs and how to use the foresaid using the networkx
library of python. Next we will discuss the ego graph and various other types of graphs in
python using the library networkx.
9.4 APPLICATIONS OF RANDOM GRAPH

Random graphs, which are mathematical structures representing random processes

in graph theory, have various applications across different fields. Here are some
notable applications:

1. Social Network Analysis:

• Random graphs can model social networks, where nodes represent
individuals and edges represent social connections. They help analyze
the structure and properties of these networks, providing insights into
information flow, influence, and community formation.
2. Epidemiology:
• In epidemiology, random graphs are used to model the spread of
infectious diseases. Nodes may represent individuals, and edges
represent potential transmission pathways. Studying random graphs
can help predict the likelihood and speed of disease outbreaks.
3. Communication Networks:
• Random graphs are applied in the study of communication networks,
such as the Internet or peer-to-peer networks. They can model the
connectivity between different devices or nodes in these networks and
help analyze their robustness and efficiency.
4. Biological Networks:
• In biology, random graphs are used to model biological networks like
protein-protein interaction networks or gene regulatory networks.
These models can help understand the underlying structure and
dynamics of biological systems.
5. Percolation Theory:
• Random graphs are fundamental in percolation theory, where they
model the behavior of liquids or gases passing through a porous
medium. This has applications in physics, materials science, and
hydrology.
6. Internet and World Wide Web:
• Random graphs are used to model the structure of the Internet and the
World Wide Web. They help understand the connectivity patterns,
study the resilience of these networks to failures, and optimize routing
algorithms.
7. Computer Science and Algorithms:
• Random graphs are utilized in the analysis of algorithms, particularly in
the design and analysis of randomized algorithms. They provide a
framework for studying the average-case behavior of algorithms, which
is important in practical applications.
8. Game Theory:
• Random graphs are employed in game theory to model strategic
interactions between players in a network. They help analyze the
emergence of cooperation, competition, and the stability of networks in
the presence of strategic behavior.
9. Statistical Physics:
• Random graphs are used in statistical physics to model complex
systems. They provide a framework for understanding phase
transitions, critical phenomena, and emergent properties in various
physical systems.
10. Finance:
• In finance, random graphs can model the relationships between
different financial entities, such as banks or stocks. They are used to
analyze systemic risk, interconnectedness, and the spread of financial
shocks through the network.

These applications highlight the versatility of random graphs in capturing and

understanding complex systems in various scientific, social, and technological
domains.

9.5 SUMMARY

In this unit we have discussed in detail about Erdös-Renyí model and also covered properties
of random graph. At the end of this unit also covered in detail applications of random graph.

9.6 KEYWORDS

Epidemiology, Communication networks, Biological networks and Four color theorem.

9.7 QUESTIONS

1. Explain Erdös-Renyí model.

2. Discuss properties of random graph
3. Briefly discuss application of random graph

9.8 REFERENCES
1. "Introduction to Graph Theory" by Douglas B. West
2. "Graph Theory and Its Applications" by Jonathan L. Gross and Jay Yellen
3. "Networks, Crowds, and Markets: Reasoning About a Highly Connected
World" by David Easley and Jon Kleinberg.

Indices and Logarithms-Addmath-Form-4
No ratings yet
Indices and Logarithms-Addmath-Form-4
14 pages
Assignment 2: Read The "General Instructions" Before Completing This Assignment
No ratings yet
Assignment 2: Read The "General Instructions" Before Completing This Assignment
2 pages
Discrete Chapter3
No ratings yet
Discrete Chapter3
40 pages
And Finite Number of Edges
No ratings yet
And Finite Number of Edges
10 pages
Graph Theory-2
No ratings yet
Graph Theory-2
19 pages
Ds Unit 4
No ratings yet
Ds Unit 4
24 pages
Discrete chapter 4
No ratings yet
Discrete chapter 4
16 pages
Chapter 4_2
No ratings yet
Chapter 4_2
86 pages
Graph Discrete Structure
No ratings yet
Graph Discrete Structure
88 pages
DMS UNIT-6
No ratings yet
DMS UNIT-6
13 pages
CPE133-Lecture Notes - L7-451-Graphs and Search
No ratings yet
CPE133-Lecture Notes - L7-451-Graphs and Search
36 pages
Unit V Graph Structures
No ratings yet
Unit V Graph Structures
39 pages
More Definitions On Graph
No ratings yet
More Definitions On Graph
9 pages
Let Us Switch To A New Topic:: Graphs Graph Theory
No ratings yet
Let Us Switch To A New Topic:: Graphs Graph Theory
27 pages
Chapter 10
No ratings yet
Chapter 10
49 pages
unit2 GRAPHS
No ratings yet
unit2 GRAPHS
15 pages
Graph Theory
No ratings yet
Graph Theory
50 pages
Unit2 Graphs
No ratings yet
Unit2 Graphs
15 pages
Graphrepresentation
No ratings yet
Graphrepresentation
59 pages
Module 4 - Graph Theory (Part -1)
No ratings yet
Module 4 - Graph Theory (Part -1)
44 pages
DM Unit 1
100% (1)
DM Unit 1
31 pages
Lecture Notes 1 - 4
No ratings yet
Lecture Notes 1 - 4
20 pages
194 - Mohini Brahma - CA2 - Btech3rdSem
No ratings yet
194 - Mohini Brahma - CA2 - Btech3rdSem
20 pages
03. Graph Theory
No ratings yet
03. Graph Theory
99 pages
Discrete Math 6
No ratings yet
Discrete Math 6
47 pages
Graph Theory notes
No ratings yet
Graph Theory notes
28 pages
Chapter 1 (1)
No ratings yet
Chapter 1 (1)
84 pages
Week 11 - Graphs
No ratings yet
Week 11 - Graphs
11 pages
Topic 5 - Graph and Trees Theory
No ratings yet
Topic 5 - Graph and Trees Theory
92 pages
Graphs: Graph G. Null Graph Is A Graph With Order Zero
No ratings yet
Graphs: Graph G. Null Graph Is A Graph With Order Zero
23 pages
Chapter 4 (Graph Theory Edited)
No ratings yet
Chapter 4 (Graph Theory Edited)
18 pages
EContent 11 2023 01 30 08 32 46 UNIT4GRAPHANDTREEpdf 2022 12 27 10 42 44
No ratings yet
EContent 11 2023 01 30 08 32 46 UNIT4GRAPHANDTREEpdf 2022 12 27 10 42 44
31 pages
Graphs and Graph Models
No ratings yet
Graphs and Graph Models
5 pages
UNIT-1-EPS
No ratings yet
UNIT-1-EPS
58 pages
CHAPTER 7 - C++ Graphs
No ratings yet
CHAPTER 7 - C++ Graphs
16 pages
fe9a1c29-3bdb-4273-82e3-3f0ff5815757
No ratings yet
fe9a1c29-3bdb-4273-82e3-3f0ff5815757
73 pages
Week 2 Graph Theory
No ratings yet
Week 2 Graph Theory
44 pages
Graph_notes_algorithm
No ratings yet
Graph_notes_algorithm
115 pages
DS Module-5 Notes
No ratings yet
DS Module-5 Notes
30 pages
Graph Theory and Applications
No ratings yet
Graph Theory and Applications
45 pages
Fall 24-25 dm_15_Graphs (1)
No ratings yet
Fall 24-25 dm_15_Graphs (1)
29 pages
Lecture12 Graphs Summary
No ratings yet
Lecture12 Graphs Summary
17 pages
Graphs in Data Structure Using C Programming
No ratings yet
Graphs in Data Structure Using C Programming
79 pages
Graph Theory Piyushwairale
No ratings yet
Graph Theory Piyushwairale
21 pages
Graph (Graph DS, BFS, DFS, Prim's, Krushkal's) PDF
No ratings yet
Graph (Graph DS, BFS, DFS, Prim's, Krushkal's) PDF
60 pages
DM-V
No ratings yet
DM-V
14 pages
chapter 4
No ratings yet
chapter 4
83 pages
Introduction To Graph Theory
No ratings yet
Introduction To Graph Theory
51 pages
DMGT-UNIT4
No ratings yet
DMGT-UNIT4
31 pages
DS - Module 5
No ratings yet
DS - Module 5
141 pages
unit4-graphs
No ratings yet
unit4-graphs
16 pages
The Mathematics of Graphs
No ratings yet
The Mathematics of Graphs
51 pages
Graph Theory
100% (2)
Graph Theory
92 pages
BM _Unit 5_Graph
No ratings yet
BM _Unit 5_Graph
89 pages
Unit 4 Graph Theory: Prepared By: Ramesh Rimal
No ratings yet
Unit 4 Graph Theory: Prepared By: Ramesh Rimal
129 pages
Graph Theory Introduction L 1 PDF
No ratings yet
Graph Theory Introduction L 1 PDF
18 pages
Graph
No ratings yet
Graph
46 pages
UNIT 5 Graph Theory
No ratings yet
UNIT 5 Graph Theory
19 pages
Graphs
No ratings yet
Graphs
19 pages
DMS Unit v GraphTheory
No ratings yet
DMS Unit v GraphTheory
158 pages
Graph 1
No ratings yet
Graph 1
19 pages
Complex Variables II Essentials
From Everand
Complex Variables II Essentials
Alan D. Solomon
No ratings yet
UNIT_2 Relations and Functions_PART-1
No ratings yet
UNIT_2 Relations and Functions_PART-1
72 pages
Class XI-Binomial Theorem
No ratings yet
Class XI-Binomial Theorem
37 pages
PART C DSC
No ratings yet
PART C DSC
7 pages
Prime - Composite Numbers To 100000
No ratings yet
Prime - Composite Numbers To 100000
1,835 pages
8 - Polynomial and Rational Functions - Domain and Range From The Graph of A Quadratic Function
No ratings yet
8 - Polynomial and Rational Functions - Domain and Range From The Graph of A Quadratic Function
3 pages
Module 4: Transportation Problem and Assignment Problem: Prasad A Y, Dept of CSE, ACSCE, B'lore-74
No ratings yet
Module 4: Transportation Problem and Assignment Problem: Prasad A Y, Dept of CSE, ACSCE, B'lore-74
37 pages
CH 35
No ratings yet
CH 35
17 pages
[Ebooks PDF] download Generating Functions in Engineering and the Applied Sciences 2nd Edition Rajan Chattamvelli full chapters
100% (2)
[Ebooks PDF] download Generating Functions in Engineering and the Applied Sciences 2nd Edition Rajan Chattamvelli full chapters
50 pages
Combinatorial and Computational Geometry-Cambridge University Press (2005)
No ratings yet
Combinatorial and Computational Geometry-Cambridge University Press (2005)
626 pages
Euler Circuit/Path Examples
No ratings yet
Euler Circuit/Path Examples
4 pages
2006 - Mathematics Trigonometry Test
No ratings yet
2006 - Mathematics Trigonometry Test
3 pages
Chapter 7 - Branch and Bound
No ratings yet
Chapter 7 - Branch and Bound
16 pages
Trigonometric Equations
No ratings yet
Trigonometric Equations
30 pages
Exercises of Discrete Mathematics
No ratings yet
Exercises of Discrete Mathematics
30 pages
Python Program
No ratings yet
Python Program
7 pages
GTA-Question - Bank (1) Conv
No ratings yet
GTA-Question - Bank (1) Conv
7 pages
Linear Operator Theory
No ratings yet
Linear Operator Theory
8 pages
DP 1
No ratings yet
DP 1
9 pages
AI LAB Manual Mohit Kumar
No ratings yet
AI LAB Manual Mohit Kumar
31 pages
DAA Question Bank
No ratings yet
DAA Question Bank
5 pages
MIT14 15JF09 Pajek
No ratings yet
MIT14 15JF09 Pajek
29 pages
Derivative Rules Chart
No ratings yet
Derivative Rules Chart
1 page
HW 6
No ratings yet
HW 6
4 pages
Functions and Graphs 2: Trigonometric Function
No ratings yet
Functions and Graphs 2: Trigonometric Function
20 pages
LAWCG 2024 Apresentacao Ingles
No ratings yet
LAWCG 2024 Apresentacao Ingles
53 pages
20 1 7 Rubinstein
No ratings yet
20 1 7 Rubinstein
58 pages
Combinatorial Mathematics: Part I - Enumeration 15
No ratings yet
Combinatorial Mathematics: Part I - Enumeration 15
6 pages
Power Functions Definition Behavior Graphs and Applications 1
No ratings yet
Power Functions Definition Behavior Graphs and Applications 1
21 pages