Design and Analysis of Algorithms
Design and Analysis of Algorithms
OF INDIA)
Department of CSE
(Emerging Technologies)
(Data Science, Cyber Security and Internet of Things)
LECTURE NOTES
B.Tech – CSE (Emerging Technologies) R-20
LECTURE NOTES
DEPARTMENT OF CSE
(EMERGING TECHNOLOGIES)
(Data Science, Cyber Security and Internet of Things)
(EMERGING TECHNOLOGIES)
Vision
Mission
Make the society as the hub of emerging technologies and thereby capture
opportunities in new age technologies.
To provide students a platform where independent learning and scientific study are
encouraged with emphasis on latest engineering techniques.
Quality Policy
To provide state of art infrastructure and expertise to impart the quality education
and research environment to students for a complete learning experiences.
To offer quality relevant and cost effective programmes to produce engineers as per
requirements of the industry need.
INDEX
B.Tech – CSE (Emerging Technologies) R-20
COURSE OBJECTIVES:
1. To analyze performance of algorithms.
2. To choose the appropriate data structure and algorithm design method for a specified
application.
3. To understand how the choice of data structures and algorithm design methods impacts the
performance of programs.
4. To solve problems using algorithm design methods such as the greedy method, divide and
conquer, dynamic programming, backtracking and branch and bound.
5. To understand the differences between tractable and intractable problems and to introduce P
and NP classes.
UNIT-I Introduction: Algorithms, Pseudo code for expressing algorithms, performance analysis-
Space complexity, Time Complexity, Asymptotic notation- Big oh notation, omega notation,
theta notation and little oh notation. Divide and Conquer: General method. Applications- Binary
search, Quick sort, merge sort, Strassen’s matrix multiplication.
UNIT-II Disjoint set operations, Union and Find algorithms, AND/OR graphs, Connected
components, Bi-connected components.
Greedy method: General method, applications- Job sequencing with deadlines, Knapsack
problem, Spanning trees, Minimum cost spanning trees, Single source shortest path problem.
UNIT-V Branch and Bound: General method, applications- Travelling sales person problem, 0/1
Knapsack problem- LC branch and Bound solution, FIFO branch and Bound solution.
NP-Hard and NP-Complete Problems: Basic concepts, Non deterministic algorithms, NP-Hard
and NPComplete classes, NP-Hard problems, Cook’s theorem.
TEXT BOOKS:
1. Fundamentals of Computer Algorithms, Ellis Horowitz, SartajSahni and Rajasekharan,
Universities press
2. Design and Analysis of Algorithms, P. h. Dave,2ndedition,Pearson Education.
REFERENCES:
1. Introduction to the Design And Analysis of Algorithms A Levitin Pearson Education
2. Algorithm Design foundations Analysis and Internet examples, M.T.Goodrich and R Tomassia
John Wiley and sons
3. Design and Analysis of Algorithms, S. Sridhar, Oxford Univ.Press
4. Design and Analysis of Algorithms,Aho , Ulman and Hopcraft , Pearson Education.
5. Foundations of Algorithms, R. NeapolitanandK.Naimipour , 4th edition
B.Tech – CSE (Emerging Technologies) R-20
UNIT-I
Introduction: Algorithms, Pseudo code for expressing algorithms, performance analysis-
Space complexity, Time Complexity, Asymptotic notation- Big oh notation, omega notation,
theta notation and little oh notation.
Divide and Conquer: General method. Applications- Binary search, Quick sort, merge sort,
Strassen’s matrix multiplication.
INTRODUCTION TO ALGORITHM
What is an Algorithm?
Algorithm is a set of steps to complete a task.
For example,
Algorithm:
"a set of steps to accomplish or complete a task that is described precisely enough that a
computer can run it ".
Algorithm arrayMax(A, n)
Input array A of n integers
Output maximum element of A
currentMax A[0]
for i 1 to n 1 do
if A[i] currentMax then
currentMax A[i]
return currentMax
PERFORMANCE ANALYSIS:
• What are the Criteria for judging algorithms that have a more direct relationship to performance?
• computing time and storage requirements.
• The space complexity of an algorithm is the amount of memory it needs to run to completion.
• The time complexity of an algorithm is the amount of computer time it needs to run to
completion.
Space Complexity:
The Space needed by each of these algorithms is seen to be the sum of the followingcomponent.
2. A variable part that consists of the space needed by component variables whose size is dependent on
the particular problem instance being solved, the space needed by referenced variables (to the extent
that is depends on instance characteristics), and the recursion stack space.
The space requirement s(p) of any algorithm p may therefore be written as,
S(P) = c+ Sp(Instance characteristics)
Where ‘c’ is a constant.
Example 2:
Algorithm sum(a,n)
{
s=0.0;
for I=1 to n do
s= s+a[I];
return s;
}
The problem instances for this algorithm are characterized by n,the number ofelements to be
summed. The space needed d by ‘n’ is one word, since it is of type integer.
The space needed by ‘a’a is the space needed by variables of tyepe array of floating point
numbers.
This is atleast ‘n’ words, since ‘a’ must be large enough to hold the ‘n’elements to be summed.
• So,we obtain Ssum(n)>=(n+s) [ n for a[],one each for n,I a& s]
Time Complexity:
• The time T(p) taken by a program P is the sum of the compile time and therun time(execution
time)
• The compile time does not depend on the instance characteristics. Also we may assume that a
compiled program will be run several times without recompilation .Thisrum time is denoted by
tp(instance characteristics).
• The number of steps any problem statement is assigned depends on the kind ofstatement.
We introduce a variable, count into the program statement to increment count with
initial value 0.Statement to increment count by the appropriate amount are introduced
into the program.
This is done so that each time a statement in the original program is executes
count is incremented by the step count of that statement.
1. If the count is zero to start with, then it will be 2n+3 on termination. So each invocation
of sum execute a total of 2n+3 steps.
2. The second method to determine the step count of an algorithm is to build atable in
which we list the total number of steps contributes by each statement.
o First determine the number of steps per execution (s/e) of the statement and
thetotal number of times (ie., frequency) each statement is executed.
o By combining these two quantities, the total contribution of all statements, the
step count for the entire algorithm is obtained.
Total 2n+3
Complexity of Algorithms
The complexity of an algorithm M is the function f(n) which gives the running time and/or
storage space requirement of the algorithm in terms of the size ‘n’ of the input data. Mostly,
the storage space required by an algorithm is simply a multiple of the data size ‘n’.
The function f(n), gives the running time of an algorithm, depends not only on the size ‘n’ of
the input data but also on the particular data. The complexity function f(n) for certain cases
are:
1. Best Case : The minimum possible value of f(n) is called the best case.
3. Worst Case : The maximum value of f(n) for any key possible input.
ASYMPTOTIC NOTATION
The following notations are commonly use notations in performance analysis and used to
characterize the complexity of an algorithm:
1. Big–OH (O) ,
2. Big–OMEGA (Ω),
3. Big–THETA (Θ) and
4. Little–OH (o)
Our approach is based on the asymptotic complexity measure. This means that we don’t try to
count the exact number of steps of a program, but how that number grows with the size of the
input to the program. That gives us a measure that will work for different operating systems,
compilers and CPUs. The asymptotic complexity is written using big-O notation.
O≈ ≤
Big ‘oh’: the function f(n)=O(g(n)) iff there exist positive constants c and no such that
f(n)<=c*g(n) for all n, n>= no.
Omega: the function f(n)=(g(n)) iff there exist positive constants c and no such that
f(n) >= c*g(n) for all n, n >= no.
Theta: the function f(n)=(g(n)) iff there exist positive constants c1,c2 and no such that c1
g(n) <= f(n) <= c2 g(n) for all n, n >= no
Big-O Notation
This notation gives the tight upper bound of the given function. Generally we represent it as
f(n) = O(g (11)). That means, at larger values of n, the upper bound off(n) is g(n). For
example, if f(n) = n4 + 100n2 + 10n + 50 is the given algorithm, then n4 is g(n). That means
g(n) gives the maximum rate of growth for f(n) at larger values of n.
O —notation defined as O(g(n)) = {f(n): there exist positive constants c and no such that
0 <= f(n) <= cg(n) for all n >= no}. g(n) is an asymptotic tight upper bound for f(n). Our
objective is to give some rate of growth g(n) which is greater than given algorithms rate of
growth f(n).
In general, we do not consider lower values of n. That means the rate of growth at lower
values of n is not important. In the below figure, no is the point from which we consider the
rate of growths for a given algorithm. Below no the rate of growths may be different.
Omega— Ω notation
Similar to above discussion, this notation gives the tighter lower bound of the given
algorithm and we represent it as f(n) = Ω (g(n)). That means, at larger values of n, the
tighter lower bound of f(n) is g
For example, if f(n) = 100n2 + 10n + 50, g(n) is Ω (n2).
The . Ω. notation as be defined as Ω (g (n)) = {f(n): there exist positive constants c and
no such that 0 <= cg (n) <= f(n) for all n >= no}. g(n) is an asymptotic lower bound for
f(n). Ω (g (n)) is the set of functions with smaller or same order of growth as f(n).
Theta- Θ notation
This notation decides whether the upper and lower bounds of a given function are same or
not. The average running time of algorithm is always between lower bound and upper bound.
None: For a given function (algorithm), if the rate of growths (bounds) for O and Ω are not
same then the rate of growth Θ case may not be same.
Now consider the definition of Θ notation It is defined as Θ (g(n)) = {f(71): there exist
positive constants C1, C2 and no such that O<=5 c1g(n) <= f(n) <= c2g(n) for all n >= no}.
g(n) is an asymptotic tight bound for f(n). Θ (g(n)) is the set of functions with the same
order of growth as g(n).
Important Notes
For analysis (best case, worst case and average) we try to give upper bound (O) and lower
bound (Ω) and average running time (Θ). From the above examples, it should also be clear
that, for a given function (algorithm) getting upper bound (O) and lower bound (Ω) and
average running time (Θ) may not be possible always.
For example, if we are discussing the best case of an algorithm, then we try to give upper
bound (O) and lower bound (Ω) and average running time (Θ).
In the remaining chapters we generally concentrate on upper bound (O) because knowing
lower bound (Ω) of an algorithm is of no practical importance and we use 9 notation if upper
bound (O) and lower bound (Ω) are same.
Little Oh Notation
The little Oh is denoted as o. It is defined as : Let, f(n} and g(n} be the non negative
functions then
General Method
If the subproblems are large enough then divide and conquer is reapplied. The generated subproblems
are usually of some type as the original problem.
a,b contants.
This is called the general divide and-conquer recurrence.
Advantages of DAndC:
The time spent on executing the problem using DAndC is smaller than other method.
This technique is ideally suited for parallel computation.
This approach provides an efficient algorithm in computer science.
The following theorem can be used to determine the running time of divide and conquer
algorithms. For a given program or algorithm, first we try to find the recurrence relation for
the problem. If the recurrence is of below form then we directly give the answer without
fully solving it.
If the recurrence is of the form T(n) = aT(n) + Θ (nklogpn), where a >= 1, b > 1, k >= O
2
and p is a real number, then we can directly give the answer as:
Merge Sort:
The merge sort splits the list to be sorted into two equal halves, and places them in separate
arrays. This sorting method is an example of the DIVIDE-AND-CONQUER paradigm i.e. it
breaks the data into two halves and then sorts the two half data sets recursively, and finally
merges them to obtain the complete sorted list. The merge sort is a comparison sort and has an
algorithmic complexity of O (n log n). Elementary implementations of the merge sort make use of
two arrays - one for each half of the data set. The following image depicts the complete procedure
of merge sort.
while(j<=high){
temp[k]=a[j];
j++;
k++;
}
for(k=low;k<=high;k++)
a[k]=temp[k];
}
void display(int a[10]){
int i;
printf("\n \n the sorted array is \n");
for(i=0;i<n;i++)
printf("%d \t",a[i]);}
mergesort(low,mid);
mergesort(mid+1,high); //Solve the sub-problems
Merge(low,mid,high); // Combine the solution
}
}
void Merge(low, mid,high){
k=low;
i=low;
j=mid+1;
while(i<=mid&&j<=high) do{
if(a[i]<=a[j]) then
{
temp[k]=a[i];
i++;
k++;
}
else
{
temp[k]=a[j];
j++;
k++;
}
}
while(i<=mid) do{
temp[k]=a[i];
i++;
k++;
}
while(j<=high) do{
temp[k]=a[j];
j++;
k++;
}
For k=low to high do
a[k]=temp[k];
}
For k:=low to high do a[k]=temp[k];
}
The time for the merging operation in proportional to n, then computing time for merge sort
is described by using recurrence relation.
Here c, a Constants.
If n is power of 2, n=2k
T(n)= 2T(n/2) + cn
2[2T(n/4)+cn/2] + cn
4T(n/4)+2cn
22 T(n/4)+2cn
23 T(n/8)+3cn
24 T(n/16)+4cn
2k T(1)+kcn
an+cn(log n)
T(n)=O(nlog n)
Quick Sort
Quick Sort is an algorithm based on the DIVIDE-AND-CONQUER paradigm that selects a pivot
element and reorders the given list in such a way that all elements smaller to it are on one side
and those bigger than it are on the other. Then the sub lists are recursively sorted until the list gets
completely sorted. The time complexity of this algorithm is O (n log n).
Auxiliary space used in the average case for implementing recursive function calls is
O (log n) and hence proves to be a bit space costly, especially when it comes to large
data sets.
2
Its worst case has a time complexity of O (n ) which can prove very fatal for large
data sets. Competitive sorting algorithms
Time Complexity
Name Best case Average Worst Space
Case Case Complexity
Bubble O(n) - O(n2) O(n)
Insertion O(n) O(n2) O(n2) O(n)
Selection O(n2) O(n2) O(n2) O(n)
Disjoint set operations, Union and Find algorithms, AND/OR graphs, Connected components, Bi-
connected components. Greedy method: General method, applications- Job sequencing with deadlines,
Knapsack problem, Spanning trees, Minimum cost spanning trees, Single source shortest path problem.
Disjoint Sets: If Si and Sj, i≠j are two sets, then there is no element that is in both Si and Sj..
For example: n=10 elements can be partitioned into three disjoint sets,
S1= {1, 7, 8, 9}
S2= {2, 5, 10}
S3= {3, 4, 6}
Tree representation of sets:
10
S1 S2 S3
Disjoint set Union: Means Combination of two disjoint sets elements. Form above
example S1 U S2 ={1,7,8,9,5,2,10 }
For S1 U S2 tree representation, simply make one of the tree is a subtree
of the other.
1 1
5 5 7 8 9
7 8 9
S1 U S2 10
2 2 10
S1 U S2 S2 U S1
Tress can be accomplished easily if, with each set name, we keep a pointer to the root of the
tree representing that set.
For presenting the union and find algorithms, we ignore the set names and identify sets just
by the roots of the trees representing them.
For example: if we determine that element ‘i’ is in a tree with root ‘j’ has a pointer to entry
‘k’ in the set name table, then the set name is just name[k]
If set contains numbers 1 through n, we represents tree node P[1:n]. n Maximum number of elements
i 1 2 3 4 5 6 7 8 9 10
P -1 5 -1 3 -1 3 1 1 1 5
If n numbers of roots are there then the above algorithms are not useful for union and find.
For union of n trees Union(1,2), Union(2,3), Union(3,4),…..Union(n-1,n).
For Find i in n trees Find(1), Find(2),….Find(n).
Time taken for the union (simple union) is O(1) (constant). For the n-1 unions O(n).
Time taken for the find for an element at level i of a tree is O(i).
For n finds O(n2).
To improve the performance of our union and find algorithms by avoiding the creation of
degenerate trees. For this we use a weighting rule for union(i, j)
For implementing the weighting rule, we need to know how many nodes there are
in every tree.
For this we maintain a count field in the root of every tree.i
root node
count[i] number of nodes in the tree.
Time required for this above algorithm is O(1) + time for remaining unchanged is
determined by using Lemma.
Collapsing find algorithm is used to perform find operation on the tree created byWeighted
Union.
Now process the following eight finds: Find(8), Find(8), ............ Find(8)
If SimpleFind is used, each Find(8) requires going up three parent link fields for a total of 24
moves to process all eight finds. When CollapsingFind is uised the first Find(8) requires going
up three links and then resetting two links. Total 13 movies requies for process all eight finds.
ALGORITHM:
1. Let G be a graph with only starting node INIT.
2. Repeat the followings until INIT is labeled SOLVED or h(INIT) > FUTILITY
a) Select an unexpanded node from the most promising path from INIT (call it NODE)
b) Generate successors of NODE. If there are none, set h(NODE) = FUTILITY (i.e.,
NODE is unsolvable); otherwise for each SUCCESSOR that is not an ancestor of
NODE do the following:
i. Add SUCCESSSOR to G.
ii. If SUCCESSOR is a terminal node, label it SOLVED andset h(SUCCESSOR)
= 0.
iii. If SUCCESSPR is not a terminal node, compute its h
c) Propagate the newly discovered information up the graph by doing the following: let S
be set of SOLVED nodes or nodes whose h values have been changed and need to have
values propagated back to their parents. Initialize S to Node. Until S is empty repeat the
followings:
i. Remove a node from S and call it CURRENT.
ii. Compute the cost of each of the arcs emerging from CURRENT. Assign
minimum cost of its successors as its h.
iii. Mark the best path out of CURRENT by marking the arc that had the minimum
cost in step ii
iv. Mark CURRENT as SOLVED if all of the nodes connected to it through new
labeled arc have been labeled SOLVED
v. If CURRENT has been labeled SOLVED or its cost was just changed,
propagate its new cost back up through the graph So add all of the ancestors of
CURRENT to S.
Node B is chosen for expansion. This process produces one new arc, the AND arc to E and F,
with a combined cost estimate of 10.so we update the f’ value of D to 10.Going back one more
level, we see that this makes the AND arc B-C better than the arc to D, so it is labeled as the
current best path.
STEP 3:
explore B first. This generates two new arcs, the ones to G and to H. Propagating their f’
values backward, we update f’ of B to 6(since that is the best we think we can do, which we
can achieve by going through G). This requires updating the cost of the AND arc B-C to
12(6+4+2). After doing that, the arc to D is again the better path from A, so we record that as
the current best path and either node E or node F will chosen for expansion at step 4.
STEP4:
` Connected Component:
Connected component of a graph can be obtained by using BFST (Breadth
first search andtraversal) and DFST (Dept first search and traversal). It is
also called the spanning tree.
Algorithm BFS(v)
// a bfs of G is begin at vertex v
// for any node I, visited[i]=1 if I has already been visited.
// the graph G, and array visited[] are global
{
U:=v; // q is a queue of unexplored vertices.
Visited[v]:=1;
Repeat{
For all vertices w adjacent from U do If
(visited[w]=0) then
{
Add w to q; // w is unexplored
Visited[w]:=1;
}
If q is empty then return; // No unexplored vertex.
Delete U from q; //Get 1st unexplored vertex.
} Until(false)
}
Maximum Time complexity and space complexity of G(n,e), nodes are in adjacency list.
T(n, e)=θ(n+e)
S(n, e)=θ(n)
Maximum Time complexity and space complexity of G(n,e), nodes are in adjacency list.
T(n, e)=θ(n+e)
S(n, e)=θ(n)
A biconnected component of G is a maximal set of edges such that any two edges in the set lie on a common
simple cycle
Feasible solution:- Most problems have n inputs and its solution contains a subset of inputs
that satisfies a given constraint(condition). Any subset that satisfies the constraint is called
feasible solution.
Optimal solution: To find a feasible solution that either maximizes or minimizes a given
objective function. A feasible solution that does this is called optimal solution.
The greedy method suggests that an algorithm works in stages, considering one input at a
time. At each stage, a decision is made regarding whether a particular input is in an optimal
solution.
Greedy algorithms neither postpone nor revise the decisions (ie., no back tracking).
Knapsack problem
The knapsack problem or rucksack (bag) problem is a problem in combinatorial optimization: Given a set of
items, each with a mass and a value, determine the number of each item to include in a collection so that the
total weight is less than or equal to a given limit and the total value is as large as possible
Maximize subject to
Maximize the sum of the values of the items in the knapsack so that the sum of the weights
must be less than the knapsack's capacity.
Greedy algorithm for knapsack
Algorithm GreedyKnapsack(m,n)
// p[i:n] and [1:n] contain the profits and weights respectively
// if the n-objects ordered such that p[i]/w[i]>=p[i+1]/w[i+1], m size of knapsack and
x[1:n] the solution vector
{
For i:=1 to n do x[i]:=0.0
U:=m;
For i:=1 to n do
{
if(w[i]>U) then break;
x[i]:=1.0;
U:=U-w[i];
}
If(i<=n) then x[i]:=U/w[i];
}
(0, 1, ½ ) 1 1
1x15+ x10 = 20 1x24+ x15 = 31.5
2 2
(½, ⅓, ¼ ) ½ x 18+⅓ x15+ ¼ x10 = 16. 5 ½ x 25+⅓ x24+ ¼ x15 =
12.5+8+3.75 = 24.25
Analysis: - If we do not consider the time considered for sorting the inputs then all of thethree
greedy strategies complexity will be O(n).
To complete a job one had to process the job on a machine for one unit of time. Only one
machine is available for processing jobs.
A feasible solution for this problem is a subset J of jobs such that each job in this subset can
be completed by its deadline.
The value of a feasible solution J is the sum of the profits of the jobs in J, i.e., ∑i∈ jPi
An optimal solution is a feasible solution with maximum value.
The problem involves identification of a subset of jobs which can be completed by its deadline.
Therefore the problem suites the subset methodology and can be solved by the greedy method.
In the example solution ‘3’ is the optimal. In this solution only jobs 1&4 are processed and
the value is 127. These jobs must be processed in the order j4 followed by j1. the process of
job 4 begins at time 0 and ends at time 1. And the processing of job 1 begins at time 1 and
ends at time2. Therefore both the jobs are completed within their deadlines. The optimization
measure for determining the next job to be selected in to the solution is according to the
profit. The next job to include is that which increases ∑pi the most, subject to the constraint
that the resulting “j” is the feasible solution. Therefore the greedy strategy is to consider the
jobs in decreasing order of profits.
The greedy algorithm is used to obtain an optimal solution.
We must formulate an optimization measure to determine how the next job is chosen.
Note: The size of sub set j must be less than equal to maximum deadline in given list.
Graphs can be used to represent the highway structure of a state or country with
vertices representing cities and edges representing sections of highway.
The edges have assigned weights which may be either the distance between the 2
cities connected by the edge or the average time to drive along that section of
highway.
For example if A motorist wishing to drive from city A to B then we must answer the
following questions
o Is there a path from A to B
o If there is more than one path from A to B which is the shortest path
The length of a path is defined to be the sum of the weights of the edges on that path.
Given a directed graph G(V,E) with weight edge w(u,v). e have to find a shortest path from
source vertex S∈ v to every other vertex v1∈ v-s.
Bellman-Ford Algorithm
Dijkstra’s algorithm
Bellman-Ford Algorithm:- allow –ve weight edges in input graph. This algorithm
either finds a shortest path form source vertex S∈ V to other vertex v∈ V or detect a –
ve weight cycles in G, hence no solution. If there is no negative weight cycles are
reachable form source vertex S∈ V to every other vertex v∈ V
Dijkstra’s algorithm:- allows only +ve weight edges in the input graph and finds a
shortest path from source vertex S∈ V to every other vertex v∈ V.
Consider the above directed graph, if node 1 is the source vertex, then shortest path
from 1 to 2 is 1,4,5,2. The length is 10+15+20=45.
As an optimization measure we can use the sum of the lengths of all paths so far
generated.
If we have already constructed ‘i’ shortest paths, then using this optimization measure,
the next path to be constructed should be the next shortest minimum length path.
The greedy way to generate the shortest paths from Vo to the remaining vertices is to
generate these paths in non-decreasing order of path length.
For this 1st, a shortest path of the nearest vertex is generated. Then a shortest path to
the 2nd nearest vertex is generated and so on.
SPANNING TREE: - A Sub graph ‘n’ of o graph ‘G’ is called as a spanning tree if
(i) It includes all the vertices of ‘G’
(ii) It is a tree
The greedy method suggests that a minimum cost spanning tree can be obtained by contacting
the tree edge by edge. The next edge to be included in the tree is the edge that results in a
minimum increase in the some of the costs of the edges included so far.
There are two basic algorithms for finding minimum-cost spanning trees, and both are greedy
Algorithms
Prim’s Algorithm
Kruskal’s Algorithm
Prim’s Algorithm: Start with any one node in the spanning tree, and repeatedly add the
cheapest edge, and the node it leads to, for which the node is not already in the spanning tree.
Notes: - At every state a decision is made about an edge of minimum cost to be included
into the spanning tree. From the edges which are adjacent to the last edge included in
the spanning tree i.e. at every stage the sub-graph obtained is a tree.
i) The algorithm will start with a tree that includes only minimum cost edge of
G. Then edges are added to this tree one by one.
ii) The next edge (i,j) to be added is such that i is a vertex which is
already included in the treed and j is a vertex not yet included in the
tree and cost of i,j is minimum among all edges adjacent to ‘i’.
iii) With each vertex ‘j’ next yet included in the tree, we assign a value
near ‘j’. The value near ‘j’ represents a vertex in the tree such that cost (j,
near (j)) is minimum among all choices for near (j)
iv) We define near (j):= 0 for all the vertices ‘j’ that are already in the tree.
v) The next edge to include is defined by the vertex ‘j’ such that (near (j))
0 and cost of (j, near (j)) is minimum.
Analysis: -
The time required by the prince algorithm is directly proportional to the no/: of vertices.
If agraph ‘G’ has ‘n’ vertices then the time required by prim’s algorithm is 0(n2)
Consider the above graph of , Using Kruskal's method the edges of this graph are considered
for inclusion in the minimum cost spanning tree in the order (1, 2), (3, 6), (4, 6), (2, 6), (1, 4),
(3, 5), (2, 5), (1, 5), (2, 3), and (5, 6). This corresponds to the cost sequence 10, 15, 20, 25,
30, 35, 40, 45, 50, 55. The first four edges are included in T. The next edge to be considered
is (I, 4). This edge connects two vertices already connected in T and so it is rejected. Next,
the edge (3, 5) is selected and that completes the spanning tree.
Dynamic Programming
In the all pairs shortest path problem, we are to find a shortest path between every pair of
vertices in a directed graph G. That is, for every pair of vertices (i, j), we are to find a
shortest path from i to j as well as one from j to i. These two paths are the same when G is
undirected.
When no edge has a negative length, the all-pairs shortest path problem may be solved
by using Dijkstra’s greedy single source algorithm n times, once with each of the n vertices
as the source vertex.
The all pairs shortest path problem is to determine a matrix A such that A (i, j) is the lengthof a
shortest path from i to j. The matrix A can be obtained by solving n single-source problems
using the algorithm shortest Paths. Since each application of this procedure requires O (n2) time,
the matrix A can be obtained in O (n3) time.
The dynamic programming solution, called Floyd’s algorithm, runs in O (n3) time. Floyd’s
algorithm works even when the graph has negative length edges (provided there are no
negative length cycles).
Ak (i, j) = {min {min {Ak-1 (i, k) + Ak-1 (k, j)}, c (i, j)}
1<k<n
1 2
3 4
g (2, T) = C21 = 5
g (3, T) = C31 = 6
g (4, ~) = C41 = 8
Using equation – (2) we obtain:
g (1,{2, 3, 4}) = min {c12 + g (2, {3,
4}, c13 + g (3, {2, 4}), c14 + g (4, {2, 3})}
g (2, {3, 4}) = min {c23 + g (3, {4}), c24 + g (4, {3})}
= min {9 + g (3, {4}), 10 + g (4, {3})}
g (3, {4}) = min {c34 + g (4, T)} = 12 + 8 = 20
g (4, {3}) = min {c43 + g (3, ~)} = 9 + 6 = 15
Therefore, g (2, {3, 4}) = min {9 + 20, 10 + 15} = min {29, 25} = 25
g (3, {2, 4}) = min {(c32 + g (2, {4}), (c34 + g (4, {2})}
Therefore, g (3, {2, 4}) = min {13 + 18, 12 + 13} = min {41, 25} = 25
g (4, {2, 3}) = min {c42 + g (2, {3}), c43 + g (3, {2})}
g (2, {3}) = min {c23 + g (3, ~} = 9 + 6 = 15
g (3, {2}) = min {c32 + g (2, T} = 13 + 5 = 18
g (1, {2, 3, 4}) = min {c12 + g (2, {3, 4}), c13 + g (3, {2, 4}), c14 + g (4, {2, 3})} = min
{10 + 25, 15 + 25, 20 + 23} = min {35, 40, 43} = 35
Let P (i) be the probability with which we shall be searching for 'ai'. Let Q (i) be the
probability of an un-successful search. Every internal node represents a point where a
successful search may terminate. Every external node represents a point where an
unsuccessful search may terminate.
The expected cost contribution for the internal node for 'ai' is:
Unsuccessful search terminate with I= 0 (i.e at an external node). Hence the costcontribution
for this node is:
Given a fixed set of identifiers, we wish to create a binary search tree organization. We
may expect different binary search trees for the same identifier set to have different
performance characteristics.
The computation of each of these c(i, j)’s requires us to find the minimum of m
quantities. Hence, each such c(i, j) can be computed in time O(m). The total time for all
c(i, j)’s with j – i = m is therefore O(nm – m2).
The total time to evaluate all the c(i, j)’s and r(i, j)’s is therefore:
Example 1:
Let n = 4, and (a1, a2, a3, a4) = (do, if, need, while) Let P (1: 4) = (3, 3, 1, 1) and Q (0:
4) = (2, 3, 1, 1, 1)
Solution:
Table for recording W (i, j), C (i, j) and R (i, j):
Column
Row 0 1 2 3 4
0 2, 0, 0 3, 0, 0 1, 0, 0 1, 0, 0, 1, 0, 0
1 8, 8, 1 7, 7, 2 3, 3, 3 3, 3, 4
2 12, 19, 1 9, 12, 2 5, 8, 3
3 14, 25, 2 11, 19, 2
4 16, 32, 2
This computation is carried out row-wise from row 0 to row 4. Initially, W (i, i) = Q
(i) and C (i, i) = 0 and R (i, i) = 0, 0 < i < 4.
Solving for C (0, n):
First, computing all C (i, j) such that j - i = 1; j = i + 1 and as 0 < i < 4; i = 0, 1, 2 and
3; i < k ≤ J. Start with i = 0; so j = 1; as i < k ≤ j, so the possible value for k = 1
Second, Computing all C (i, j) such that j - i = 2; j = i + 2 and as 0 < i < 3; i = 0, 1, 2; i <
k ≤ J. Start with i = 0; so j = 2; as i < k ≤ J, so the possible values for k = 1 and 2.
W (0, 2) = P (2) + Q (2) + W (0, 1) = 3 + 1 + 8 = 12
C (0, 2) = W (0, 2) + min {(C (0, 0) + C (1, 2)), (C (0, 1) + C (2, 2))} = 12
+ min {(0 + 7, 8 + 0)} = 19
ft (0, 2) = 1
Next, with i = 1; so j = 3; as i < k ≤ j, so the possible value for k = 2 and 3.
W(1, 3) = P (3) + Q (3) + W (1, 2) = 1 + 1+ 7 = 9
C (1, 3) = W (1, 3) + min {[C (1, 1) + C (2, 3)], [C (1, 3) 2) + C (3, 3)]}
= W(1, + min {(0 + 3), (7 + 0)} = 9 + 3 = 12
ft (1, 3) = 2
Next, with i = 2; so j = 4; as i < k ≤ j, so the possible value for k = 3 and 4.
Fourth, Computing all C (i, j) such that j - i = 4; j = i + 4 and as 0 < i < 1; i = 0; i < k ≤
J.
Start with i = 0; so j = 4; as i < k ≤ j, so the possible values for k = 1, 2, 3 and 4.
Hence the left sub tree is 'T01' and right sub tree is T24. The root of 'T01' is 'a1' and the
root of 'T24' is a3.
The left and right sub trees for 'T01' are 'T00' and 'T11' respectively. The root of T01 is
'a1'
The left and right sub trees for T24 are T22 and T34 respectively.
a1 a3
T 01 T 24 do read
a4
T 00 T 11 T 22 T 34 while
We are given n objects and a knapsack. Each object i has a positive weight wi and a positive
value Vi. The knapsack can carry a weight not exceeding W. Fill the knapsack so that the value
of objects in the knapsack is optimized.
A solution to the knapsack problem can be obtained by making a sequence of decisions on
the variables x1, x2, . . . . , xn. A decision on variable xi involves determining which of the
values 0 or 1 is to be assigned to it. Let us assume that
decisions on the xi are made in the order xn, xn-1,......... x1. Following a decision on xn,
we may be in one of two possible states: the capacity remaining in m – wn and a profit
of pn has accrued. It is clear that the remaining decisions xn-1, ........ , x1 must be optimal
with respect to the problem state resulting from the decision on xn. Otherwise, xn,. .
. . , x1 will not be optimal. Hence, the principal of optimality holds.
Fn (m) = max {fn-1 (m), fn-1 (m - wn) + pn} -- 1
Equation-2 can be solved for fn (m) by beginning with the knowledge fo (y) = 0 for all y
and fi (y) = - ~, y < 0. Then f1, f2, . . . fn can be successively computed using
equation–2.
When the wi’s are integer, we need to compute fi (y) for integer y, 0 < y < m. Since fi (y)
= - ~ for y < 0, these function values need not be computed explicitly. Since each fi
can be computed from fi - 1 in Θ (m) time, it takes Θ (m n) time to compute fn. When
the wi’s are real numbers, fi (y) is needed for real numbers y such that 0 < y < m. So, fi
cannot be explicitly computed for all y in this range. Even when the wi’s are integer, the
explicit Θ (m n) computation of fn may not be the most efficient computation. So, we
explore an alternative method for both cases.
The fi (y) is an ascending step function; i.e., there are a finite number of y’s, 0 = y1 < y2
< . . . . < yk, such that fi (y1) < fi (y2) < ............ < fi (yk); fi (y) = - ~ , y < y1; fi (y) = f
(yk), y > yk; and fi (y) = fi (yj), yj < y < yj+1. So, we need to compute only fi (yj), 1 < j
< k. We use the ordered set Si = {(f (yj), yj) | 1 < j < k} to represent fi (y). Each number
of Si is a pair (P, W), where P = fi (yj) and W = yj. Notice that S 0 = {(0, 0)}. We can
compute Si+1 from Si by first computing:
Now, Si+1 can be computed by merging the pairs in Si and Si 1 together. Note that if Si+1
contains two pairs (Pj, Wj) and (Pk, Wk) with the property that Pj < Pk and Wj > Wk,
then the pair (Pj, Wj) can be discarded because of equation-2. Discarding or purging
rules such as this one are also known as dominance rules. Dominated tuples get purged. In
the above, (Pk, Wk) dominates (Pj, Wj).
The problem is to design a system that is composed of several devices connected in series.
Let ri be the reliability of device Di (that is ri is the probability that device i will
function properly) then the reliability of the entire system is fT ri. Even if the individual
devices are very reliable (the ri’s are very close to one), the reliability of the system may
not be very good. For example, if n = 10 and ri = 0.99, i < i < 10, then fT ri = .904.
Hence, it is desirable to duplicate devices. Multiply copies of the same device type are
connected in parallel.
If stage i contains mi copies of device Di. Then the probability that all mi have a
malfunction is (1 - ri) mi. Hence the reliability of stage i becomes 1 – (1 - r )mi.
i
Backtracking is a methodical (Logical) way of trying out various sequences of decisions, until
you find one that “works”
Sorting the array of integers in a[1:n] is a problem whose solution is expressible by an n-tuple
xi is the index in ‘a’ of the ith smallest element.
The criterion function ‘P’ is the inequality a[xi]≤ a[xi+1] for 1≤ i ≤ n
Si is finite and includes the integers 1 through n.
mi size of set Si
m=m1m2m3---mn n tuples that possible candidates for satisfying the function P.
With brute force approach would be to form all these n-tuples, evaluate (judge) each one with P
and save those which yield the optimum.
By using backtrack algorithm; yield the same answer with far fewer than ‘m’ trails.
Many of the problems we solve using backtracking requires that all the solutions satisfy a
complex set of constraints.
For any problem these constraints can be divided into two categories:
Explicit constraints: Explicit constraints are rules that restrict each xi to take on values only
from a given set.
Example: xi ≥ 0 or si={all non negative real numbers}
Xi=0 or 1 or Si={0, 1}
li ≤ xi ≤ ui or si={a: li ≤ a ≤ ui }
The explicit constraint depends on the particular instance I of the problem being solved. All
tuples that satisfy the explicit constraints define a possible solution space for I.
Implicit Constraints:
The implicit constraints are rules that determine which of the tuples in the solution space of I
satisfy the criterion function. Thus implicit constraints describe the way in which the Xi must
relate to each other.
Applications of Backtracking:
N Queens Problem
Sum of subsets problem
Graph coloring
Hamiltonian cycles.
N-Queens Problem:
It is a classic combinatorial problem. The eight queen’s puzzle is the problem of placing eight
queens puzzle is the problem of placing eight queens on an 8×8 chessboard so that no two
queens attack each other. That is so that no two of them are on the same row, column, or
diagonal.
The 8-queens puzzle is an example of the more general n-queens problem of placing n queens on
an n×n chessboard.
For example:
If n=4 (w1, w2, w3, w4)=(11,13,24,7) and m=31.
Then desired subsets are (11, 13, 7) & (24, 7).
The two solutions are described by the vectors (1, 2, 4) and (3, 4).
In general all solution are k-tuples (x1, x2, x3---xk) 1 ≤ k ≤ n, different solutions may have
different sized tuples.
Wi weight of item i
The above equation specify that x1, x2, x3, --- xk cannot lead to an answer node if this condition
is not satisfied.
X[k]=1
If(S+w[k]=m) then write(x[1: ]); // subset found.
Else if (S+w[k] + w{k+1] ≤ M)
Then SumOfSub(S+w[k], k+1, r-w[k]);
if ((S+r - w{k] ≥ M) and (S+w[k+1] ≤M) ) then
{
X[k]=0;
SumOfSub(S, k+1, r-w[k]);
}
}
o Example:
o
DESIGN AND ANALYSIS OF ALGORITHMS Page 69
The above map requires 4 colors.
Many years, it was known that 5-colors were required to color this map.
After several hundred years, this problem was solved by a group of mathematicians with
the help of a computer. They show that 4-colors are sufficient.
Suppose we represent a graph by its adjacency matrix G[1:n, 1:n]
Ex:
Adjacency matrix is
g1
The above graph contains Hamiltonian cycle: 1,2,8,7,6,5,4,3,1
NP-Hard and NP-Complete Problems: Basic concepts, Non deterministic algorithms, NP-Hard
and NP Complete classes, NP-Hard problems, Cook’s theorem.
Branch & Bound
Branch & Bound (B & B) is general algorithm (or Systematic method) for finding optimal
solution of various optimization problems, especially in discrete and combinatorial
optimization.
The B&B strategy is very similar to backtracking in that a state space tree is used to solve
a problem.
The differences are that the B&B method
Does not limit us to any particular way of traversing the tree.
It is used only for optimization problem
It is applicable to a wide variety of discrete combinatorial problem.
B&B is rather general optimization technique that applies where the greedy method &
dynamic programming fail.
It is much slower, indeed (truly), it often (rapidly) leads to exponential time complexities
in the worst case.
The term B&B refers to all state space search methods in which all children of the “E-
node” are generated before any other “live node” can become the “E-node”
Live node is a node that has been generated but whose children have not yet been
generated.
E-node is a live node whose children are currently being explored.
Dead node is a generated node that is not to be expanded or explored any further. All
children of a dead node have already been expanded.
Two graph search strategies, BFS & D-search (DFS) in which the exploration of a new
node cannot begin until the node currently being explored is fully explored.
Both BFS & D-search (DFS) generalized to B&B strategies.
BFS like state space search will be called FIFO (First In First Out) search as the list of
live nodes is “First-in-first-out” list (or queue).
D-search (DFS) Like state space search will be called LIFO (Last In First Out) search
as the list of live nodes is a “last-in-first-out” list (or stack).
In backtracking, bounding function are used to help avoid the generation of sub-trees that
do not contain an answer node.
We will use 3-types of search strategies in branch and bound
1) FIFO (First In First Out) search
2) LIFO (Last In First Out) search
FIFO B&B:
FIFO Branch & Bound is a BFS.
In this, children of E-Node (or Live nodes) are inserted in a queue.
Implementation of list of live nodes as a queue
Least() Removes the head of the Queue
Add() Adds the node to the end of the Queue
Assume that node ‘12’ is an answer node in FIFO search, 1st we take E-node has ‘1’
LIFO B&B:
LIFO Brach & Bound is a D-search (or DFS).
In this children of E-node (live nodes) are inserted in a stack
Implementation of List of live nodes as a stack
Least() Removes the top of the stack
ADD() Adds the node to the top of the stack.
The search for an answer node can often be speeded by using an “intelligent” ranking
function. It is also called an approximate cost function “Ĉ”.
Example:
8-puzzle
Cost function: Ĉ = g(x) +h(x)
where h(x) = the number of misplaced tiles
and g(x) = the number of moves so far
Assumption: move one tile in any direction cost 1.
State space tree for the travelling salesperson problem with n=4 and i0=i4=1
The above diagram shows tree organization of a complete graph with |V|=4.
Each leaf node ‘L’ is a solution node and represents the tour defined by the path from the root
to L.
Cost(1)
= Sum of all reduction elements
=4 +5 +6+2 +1
= 18
Step-02:
Cost(4)
= Cost(1) + Sum of reduction elements + M[A,D]
= 18 + 5 + 3
= 26
Thus, we have-
Step-03:
Cost(5)
DESIGN AND ANALYSIS OF ALGORITHMS Page 87
= cost(3) + Sum of reduction elements + M[C,B]
= 25 + (13 + 8) + ∞
=∞
Xi=0 or 1 1 ≤ i ≤ n
Define two functions cˆ(x) and u(x) such that for every
node x,
cˆ(x) ≤ c(x) ≤ u(x)
For speeding up the search process here need to intelligent ranking function for live nodes. Each time, the
next E- node is selected on the basis of this ranking function. For this ranking function additional
computation (normally cost) is needed to reach the answer node from the live node. LC-search is a kind of
search in which least cost involved for reaching to answer node. At each E-node the probability of being an
answer node is checked. BFS and D-search are special cases of LC search. An LC search with bounding
functions in known as LC Branch and Bound search. The applications of LC Branch and Bound are 0/1
Knapsack problem and Travelling salesman problem.
The search mainly based on the state space tree. To find upper bound value and lower bound value at the each
nodeto get the ranking value to identify which node is least cost.
Basic concepts:
NP Nondeterministic Polynomial time
-)
The problems has best algorithms for their solutions have “Computing times”, that cluster
into two groups
Group 1 Group 2
> Problems with solution time bound by > Problems with solution times not
a polynomial of a small degree. bound by polynomial (simply non
polynomial )
No one has been able to develop a polynomial time algorithm for any problem in the 2nd
group (i.e., group 2)
So, it is compulsory and finding algorithms whose computing times are greater than
polynomial very quickly because such vast amounts of time to execute that even moderate
size problems cannot be solved.
Theory of NP-Completeness:
Showthat may of the problemswith no polynomial time algorithmsare computationaltime
algorithms are computationally related.
1. NP-Hard
2. NP-Complete
NP-Hard: Problem can be solved in polynomial time then all NP-Complete problems can be
solved in polynomial time.
All NP-Complete problemsare NP-Hard but some NP-Hard problemsare not know to be NP-
Complete.
Algorithmswhich contain operations whose outcomes are not uniquely defined but are
limited to specified set of possibilities. Such algorithms are called nondeterministic
algorithms.
The machine executing such operations is allowed to choose any one of these outcomes
subject to a termination condition to be defined later.
}
if( (W>m) or (P<r)) then Failure();
else Success();
}
time.
NP is the set of all decision problems solvable by nondeterministic algorithms in
-)
polynomial time.
The most famous unsolvable problems in Computer Science is Whether P=NP or P≠NP
In considering this problem, s.cook formulated the following question.
If there any single problem in NP, such that if we showed it to be in ‘P’ then that would
imply that P=NP.
-)Notation of Reducibility
Let L1 and L2 be problems, Problem L1 reduces to L2 (written L1 α L2) iff there is a way to
solve L1 by a deterministic polynomial time algorithm using a deterministic algorithm that
solves L2 in polynomial time
This implies that, if we have a polynomial time algorithm for L2, Then we can solve L1 in
polynomial time.
> If the length of ‘I’ is ‘n’ and the time complexity of A is p(n) for some polynomial
p() then length of Q is O(p3(n) log n)=O(p4(n))
The time needed to construct Q is also O(p3(n) log n).
> A deterministic algorithm ‘Z’ to determine the outcome of ‘A’ on any input ‘I’
Algorithm Z computes ‘Q’ and then uses a deterministic algorithm for the
satisfiability problem to determine whether ‘Q’ is satisfiable.
> If O(q(m)) is the time needed to determine whether a formula of length ‘m’ is
satisfiable then the complexity of ‘Z’ is O(p (n) log n + q(p (n)log n)).
3 3
> If satisfiability is ‘p’, then ‘q(m)’ is a polynomial function of ‘m’ and the
complexity of ‘Z’ becomes ‘O(r(n))’ for some polynomial ‘r()’.