Daa
Daa
OBJECTIVES:
The student should be made to Learn the algorithm analysis techniques. Become familiar
with the different algorithm design techniques. Understand the limitations of Algorithm
power.
UNIT I INTRODUCTION 9
Notion of an Algorithm – Fundamentals of Algorithmic Problem Solving – Important
Problem Types – Fundamentals of the Analysis of Algorithm Efficiency – Analysis
Framework – Asymptotic Notations and its properties – Mathematical analysis for
Recursive and Non-recursive algorithms.
OUTCOMES
At the end of the course, the student should be able to: Design algorithms for various
computing problems. Analyze the time and space complexity of algorithms. Critically
analyze the different algorithm design techniques for a given problem. Modify existing
algorithms to improve efficiency.
TEXT BOOK:
REFERENCES:
Algorithm
An algorithm is a sequence of unambiguous instructions for solving a problem,
i.e., for Obtaining a required output for any legitimate input in a finite amount of time.
Characteristics of an Algorithm
Finiteness
• terminates after a finite number of steps
Definiteness
• Each step must be rigorously and unambiguously specified.
-e.g.,”stir until lumpy”
Input
• Valid inputs must be clearly specified.
Output
• can be proved to produce the correct output given a valid input.
Effectiveness
• Steps must be sufficiently simple and basic.
Notation of an Algorithm
Fundamentals of Algorithmic Problem Solving
Algorithm efficiency depends on the input size n. And for some algorithms efficiency
depends on type of input. We have best, worst & average case efficiencies.
Worst-case efficiency: Efficiency (number of times the basic operation will be executed)
for the worst case input of size n. i.e. The algorithm runs the longest among all possible
inputs of size n.
Best-case efficiency: Efficiency (number of times the basic operation will be executed)
for the best case input of size n. i.e. The algorithm runs the fastest among all possible
inputs of size n.
Average-case efficiency: Average time taken (number of times the basic operation will
be executed) to solve all the possible instances (random) of the input. NOTE: NOT the
average of worst and best case
Asymptotic Notation
To choose the best algorithm, we need to check efficiency of each algorithm. The
efficiency can be measured by computing time complexity of each algorithm. Asymptotic
notation is a shorthand way to represent the time complexity.
Using Asymptotic Notation we can give time complexity as “fastest possible”,
“slowest possible” or “average time”. Various notations such as Ω , O , θ used are called
asymptotic notations.
Big oh Notation
The Big oh notation is denoted by ‘O’. It is a method if representing the upper
bound of algorithm’s running time. Using Big oh notation we can give longest amount of
time taken by the algorithm to complete.
Definition
Let f(n) and g(n) be two non-negative functions. Let n0 denotes some value of
input and n > n0 . Similarly c is some constant such that c > 0. We can write
f(n) ≤ c * g(n)
then f(n) is big oh of g(n). It is also denoted as f(n) € O(g(n)). In other words f(n)
is less than g(n) if g(n) is multiple of some constant c.
Omega
Notation
The Omega notation is denoted by ‘Ω’. It is a method if representing the lower
bound of algorithm’s running time. Using Omega notation we can give shortest amount
of time taken by the algorithm to complete.
Definition
Let f(n) and g(n) be two non-negative functions. Let n0 denotes some value of
input and n > n0 . Similarly c is some constant such that c > 0. We can write
f(n) ≥ c * g(n)
It is also denoted as f(n) € Ω (g(n)). In other words f(n) is greater than g(n) if g(n)
is multiple of some constant c.
Θ Notation
The theta notation is denoted by Θ . By this method the running time is between
upper bound and lower bound.
Definition
Let f(n) and g(n) be two non negative functions. There are two positive constants
namely c1 and c2 such that
c1 g(n) ≤ f(n) ≤ c2 g(n)
Properties of Order of Growth
1. If f1 (n) is order of g1 (n), and f2 (n) is order of g2 (n), then f1 (n) + f2 (n) ∈ O(
order log n < order nk where k > 0 < order an where a > 1 < order n! <
order nn
Example 1:
maxval ← A[0]
for i ← 1 to n − 1 do
maxval ← A[i]
return maxval
Analysis
1. Input Size = n
2. Basic operation
Brute Force
Brute force is a straightforward approach to solving a problem, usually directly based on
the problem statement and definitions of the concepts involved.
A brute-force algorithm to find the divisors of a natural number n would enumerate all
integers from 1 to the square root of n, and check whether each of them divides n without
remainder. A brute-force approach for the eight queens puzzle would examine all possible
arrangements of 8 pieces on the 64-square chessboard, and, for each arrangement, check
whether each (queen) piece can attack any other.
While a brute-force search is simple to implement, and will always find a solution if it
exists, its cost is proportional to the number of candidate solutions – which in many
practical problems tends to grow very quickly as the size of the problem increases.
Therefore, brute-force search is typically used when the problem size is limited, or when
there are problem-specific heuristics that can be used to reduce the set of candidate
solutions to a manageable size. The method is also used when the simplicity of
implementation is more important than speed.
Closest-Pair Problem
The closest-pair problem calls for finding the two closest points in a set of n points. It is
the simplest of a variety of problems in computational geometry that deals with proximity
of points in the plane or higher-dimensional spaces. Points in question can represent such
physical objects as airplanes or post offices as well as database records, statistical
samples, DNA sequences, and so on. An air-traffic controller might be interested in two
closest planes as the most probable collision candidates. A regional postal service
manager might need a solution to the closestpair problem to find candidate post-office
locations to be closed. One of the important applications of the closest-pair problem is
cluster analysis
in statistics. Based on n data points, hierarchical cluster analysis seeks to organize them in
a hierarchy of clusters based on some similarity metric. For numerical data, this metric is
usually the Euclidean distance; for text and other nonnumerical data, metrics such as the
Hamming distance (see Problem 5 in this section’s exercises) are used.
ALGORITHM //Finds distance between two closest points in the plane by brute force
//Input: A list P of n (n ≥ 2) points p1(x1, y1), . . . , pn(xn, yn)
//Output: The distance between the closest pair of points
d←∞
for i ←1 to n − 1 do
for j ←i + 1 to n do
d ←min(d, sqrt((xi
− xj )2 + (yi
− yj )2)) //sqrt is square root
return d
The basic operation of the algorithm is computing the square root. In the age of
electronic calculators with a square-root button, one might be led to believe that
computing the square root is as simple an operation as, say, addition or multiplication. Of
course, it is not. For starters, even for most integers, square roots are irrational numbers
that therefore can be found only approximately. Moreover, computing such
approximations is not a trivial matter. But, in fact, computing square roots in the loop can
be avoided! (Can you think how?) The trick is to realize that we can simply ignore the
square-root function and compare the values(xi − xj )2 + (yi − yj )2 themselves.We can do
this because the smaller a number of which we take the square root, the smaller its square
root, or, as mathematicians say, the square-root function is strictly increasing. Then the
basic operation of the algorithm will be squaring a number. The number of times it will
be executed can be computed as follows:
C(n) =n−1i=1nj=i+12 = 2n−1i=1(n − i) = 2[(n − 1) + (n − 2) + . . . + 1]= (n − 1)n ∈
_(n2).
Convex-Hull Problems
A region (set of points) in the plane is convex if every line segment between
two points in the region is also in the region. The convex hull of a finite set of points P is
the smallest convex region containing P.
Theorem: The convex hull of a finite set of points P is a convex polygon
whose vertices is a subset of P.
The convex hull problem is finding the convex hull given P.
Idea for Solving Convex Hull
Consider the straight line that goes through two points Pi and Pj . Suppose there are
points in P on both sides of this line. – This implies that the line segment between Pi and
Pj is not on the boundary of the convex hull.
Suppose all the points in P are on one side of the line (or on the line). – This implies that
the line segment between Pi and Pj is on the boundary of the convex hull.
Development of Idea for Convex Hull
The straight line through Pi = (xi, yi) and Pj = (xj, yj) can be defined by a
nonzero solution for:
axi + b yi = c
axj + b yj = c
One solution is a = yj − yi, b = xi − xj , and c = xiyj − yixj . The line segment from Pi to
Pj is on the convex hull if either ax + b y ≥ c or ax + b y ≤ c is true for all the points.
Brute force algorithm is Θ(n3).
Exhaustive Search
Exhaustive search requires searching all the possible solutions (typically combinatorial
objects) for the best solution.
Exhaustive search is simply a brute-force approach to combinatorial problems.
Basic Steps
Divide and Conquer is the most well known algorithm design strategy.
Divide the problem into two or more smaller sub problems.
Conquer the sub problems by solving them recursively.
Combine the solutions to the sub problems into the solutions
for the original problem.
The best case for the recursion is sub problems of constant size. Analysis
can be done using recurrence equations.
A problem of size n
A solution of sub
A solution of sub
problem 2
problem 1
Solution to the
Original problem
Algorithm
Algorithm DAndC ( P )
Begin
If small ( P ) then
return S(P)
else
Divide P into smaller instances P1 , P2 ,………. Pk ,
Apply DAndC to each of these sub problems
return combine (DAndC(P1) , (DAndC(P2) ,…… (DAndC(Pk)
End
Efficiency Analysis of Divide and Conquer
The computing time of Divide and Conquer(DAndC) on any input of size
n is described by the following recurrence relation.
T(n) = g(n) n is small
= T(n1 + n2 + ……+ nk) + f(n) otherwise
T(n) is the time for Divide and Conquer on any input size n.
g(n) is the time to compute the answer directly.
f(n) is the time for dividing P and combining the solutions.
Divide and Conquer Recurrence Relation
Suppose that a recursive algorithm divides a problem of size n into number
of sub problems where each sub problem of size n/b.
T(n) be the number of operations required to solve the problem of size n.
T(n) = T(1) n=1
aT(n/b) + f(n) n>1
where,
T(n) - Time for size n
a - number of sub instances
n/b - time for size n/b
f(n) - time required for dividing the problem into
sub problems.
Merge Sort
Merge sort is one of the external sorting technique.
Merge sort algorithm follows divide and conquer strategy.
Given a sequence of ‘n’ elements A[1],A[2],………A[N].
The basic idea behind the merge sort algorithm is to split the list into two
sub lists A[1],…….A[N/2] and A[(N/2)+1],…….A[N].
If the list has even length, split the list into equal sub lists.
If the list has odd length, divide the list in two by making the first sub list
one entry greater than the second sub list.
Then split both the sub list is to two and go on until each of the sub lists
are of size one.
Finally, start merging the individual sub list to obtain a sorted list.
Time complexity of merge sort is Θ(n log n).
Algorithm
Algorithm mergesort(A[0,1,……n-1,low,high)
Begin
if low < high then
mid <- low + high / 2
mergesort(A , low , mid)
mergesort(A, mid+1, high)
combine(A, low, mid, high)
end if
End
Algorithm combine(A[0,1,…..n-1,low,mid,high)
Begin
k <- low
i <- low
j <- mid + 1
while ( i <= mid and j < = high) do
{
if A[i] <= A[j] then
temp[k] <- A[i]
i <- i + 1
k <- k + 1
else
temp[k] <- A[j]
j <- j + 1
k <- k + 1
end if
}
while( i <= mid) do
{
temp[k] <- A[i]
i <- i + 1
k <- k + 1
}
while( j <= high) do
{
temp[k] <- A[j]
j <- j + 1
k <- k + 1
}
End
Analysis
The recurrence relation for the merge sort is
T(n) = T(n/2) + T(n/2) + cn
Where
T(n/2) - time taken by left sublist to get sorted
T(n/2) - time taken by right sublist to get sorted
cn - time taken for combining two sublists
Hence the average , best and worst case complexity of merge sort is O(n log n).
Quicksort
Quick Sort, as the name suggests, sorts any list very quickly. Quick sort is not stable
search, but it is very fast and requires very less aditional space. It is based on the rule
of Divide and Conquer(also called partition-exchange sort). This algorithm divides the
list into three main parts :
1. Elements less than the Pivot element
2. Pivot element
3. Elements greater than the pivot element
Algorithm
void quicksort(int a[], int p, int r)
{
if(p < r)
{
int q;
q = partition(a, p, r);
quicksort(a, p, q);
quicksort(a, q+1, r);
}
}
An Example:
Binary Search
Binary Search uses divide and conquer strategy. Divide and conquer
consists of following major phases.
Breaking the problem into several sub problems that are similar
to the original problem but smaller in size.
Solve the sub problem recursively.
Combine these solutions to sub problems to create a solution to
the original problem.
Binary search is an efficient searching method.
An element which is to be searched from the list of element stored in
array A[0,1,….n-1] is called key element.
Let A[m] be the mid element of array A.
There are three conditions that needs to be tested while searching the array
using this method.
o If key = A[m] then the desired element is present in the list.
o If key < A[m] then search the left sub list.
o If key > A[m] then search the right sub list.
This can be represented as
A[0],………A[m-1],A[m],A[m+1],……………..A[n-1]
Analysis
The basic operation in binary search is comparison of search key with the
array elements.
To analyze the efficiency of binary search based on number of times the
search key gets compared with the array elements.
The comparison is also called a three way comparison because algorithm
makes the comparison to determine whether key is smaller , equal to (or)
greater than A[m].
The worst case complexity of binary search given by
Cworst (n) = Cworst (n/2) + 1 n>1
Cworst (1) = 1
Where
Cworst (n/2) - time required to compare left (or) right sub list
1 - one comparison made with middle element
- - -
The asymptotic advantage of this algorithm notwithstanding, how practical is it? The
answer depends, of course, on the computer system and program quality implementing
the algorithm, which might explain the rather wide disparity of reported results. On some
machines, the divide-and-conquer algorithm has been reported to outperform the
conventional method on numbers only 8 decimal digits long and to run more than twice
faster with numbers over 300 decimal digits long—the area of particular importance for
modern cryptography. Whatever this outperformance “crossover point” happens to be on
a particular machine, it is worth switching to the conventional algorithm after the
multiplicands become smaller than the crossover point. Finally, if you program in an
object-oriented language such as Java, C++, or Smalltalk, you should also be aware that
these languages have special classes for dealing with large integers.
M1=(A00+A11)*(B00+B11)
M2=(A10+A11)*B00
M3=A00*(B01-B11)
M4=A11*(B10-B00)
M5=(A00+A01)*B11
M6=(A10-A00)*(B00+B01)
M7=(A01-A11)*(B10+B11)
Thus, to multiply two 2 × 2 matrices, Strassen’s algorithm makes seven multiplications
and 18 additions/subtractions, whereas the brute-force algorithm requires eight
multiplications and four additions.
M(n) (n2.807)
A(n) (n2.807)
In other words, the number of additions has the same order of growth as the number of
multiplications. This puts Strassen’s algorithm in _(nlog2 7), which is a better efficiency
class than _(n3) of the brute-force method.
3. We must check all the S1 points lying in this strip to every S2 points in the strip,
and get closest distance dbetween
Note that there can be only 6 S2 points. Note the points must lie also [yi -
d, yi + d]. Illustrate the worst case. So the time for this step is Θ(6n/2) =
Θ(3n). Draw diagram showing the six points in S2 with respect to the point
in S1.
4. To accomplish this we also need to sort the points along the y dimensions. We
do not want to a sort from scratch for each recursive division. So we use a merge
sort approach and the cost is of maintaining the sort along y is O(n).
4. Then the minimum distance is minimum distance is min(d, dbetween)
The recursive relation is
T(n) = 2T(n/2) + M(n), where M(n) is linear in n.
Using Master's Theorem (a =2, b = 2, d = 1)
T(n) ε O(n lg n)
Note that it has been shown that the best that can be done is Ω(n lg n). So we
have found one of the best solutions.
Convex-Hull Problem
Recall the convex hull is the smallest polygon containing all the points in a set, S,
f n points Pi = (xi, yi). The set of vertices defines the polygon and the points of the
vertices are found in the original set of points.
Recall the brute force algorithm. Make all possible lines from pairs of points and then
check if the rest of the points are all on the same side of the line. How much? There
are n(n-1)/2 such lines and then we check with n-2 remaining points. So the cost is cubic.
Algorthim
1. Sort the set of points, S, by the x-dimension with ties resolved by the y-
dimension.
2. Identify the first and last points of the sort P1 and Pn
Note P1 and Pn are vertices of the hull
The ray P1Pn divides S into sets of points, by points left (S1) or right (S2)
of the line, defined later.
We need to find the upper and lower hulls. We'll do this recursively.
Note also that S1 or S2 could be empty sets.
3. For S1 find the Pmax which is the maximum distance from line P1Pn, tires can be
resolved by the point that maximizes the angle PmaxP1Pn.
Note that the ray P1Pmax divides points of S1 into left and right sets. The
left points are S11.
Also PmaxPn identifies the left points S12 of S1
Pmax is vertex of the hull
The points inside the triangle P1PmaxPn cannot be vertices of the hull
There are no points to the left of both P1Pmax and PmaxPn
5. Recursively find the upper hull of the union of P1, S11 and Pmax, and the union
of Pmax, S12, and Pn
6. Do the like to find the lower hull
We need to identify if point (x3, y3) is left or right of the ray defined by points (x1, y1) and
(x2, y2). We use the sign of the determinate
│x1 y1 1│
│x2 y2 1│
│x3 y3 1│
Which has value of the area of the triangle with sign determine by order of the three
points. The sign has the properties we need. Sorting along the x-dimensions
cost Θ(n lg n). Finding Pmax cost Θ(n). Cost of determining the sets S1, S2, S11,
and S12 are each Θ(n).
How many recursive call in the worst case? O(n).
The worst case cost is Θ(n2) which beats the brute force O(n3)
We expect the average case to do much better because of the divide and conquer
approach, much like quick sort does. In addition for any reasonable and random
distribution of points many points in the triangle are eliminated. In fact for randomly
chosen points in a circle the average case cost is linear.
UNIT III DYNAMIC PROGRAMMING AND GREEDY TECHNIQUE 9
Computing a Binomial Coefficient – Warshall’s and Floyd’ algorithm – Optimal Binary
Search Trees – Knapsack Problem and Memory functions. Greedy Technique– Prim’s
algorithm- Kruskal’s Algorithm- Dijkstra’s Algorithm-Huffman Trees.
Definition
Dynamic programming (DP) is a general algorithm design technique for solving problems
with overlapping sub-problems. This technique was invented by American mathematician
“Richard Bellman” in 1950s.
Key Idea
The key idea is to save answers of overlapping smaller sub-problems to avoid re-
computation.
Dynamic Programming Properties
• An instance is solved using the solutions for smaller instances.
• The solutions for a smaller instance might be needed multiple times, so store their
results in a table.
• Thus each smaller instance is solved only once.
• Additional space is used to save time.
Dynamic Programming vs. Divide & Conquer
LIKE divide & conquer, dynamic programming solves problems by combining solutions
to sub-problems. UNLIKE divide & conquer, sub-problems are NOT independent in
dynamic programming.
Divide & Conquer Dynamic Programming
0 if n=0
F(n) = 1 if n=1
F(n-1) + F(n-2) if n >1
Algorithm F(n)
// Computes the nth Fibonacci number recursively by using its definitions
// Input: A non-negative integer n
// Output: The nth Fibonacci number
if n==0 || n==1 then
return n
else
return F(n-1) + F(n-2)
Algorithm F(n): Analysis
• Is too expensive as it has repeated calculation of smaller Fibonacci numbers.
• Exponential order of growth.
F(n)
F(n-1) + F(n-2)
2. Recursive definition
1 if k=0
C(n,k) = 1 if n = k
C(n-1,k-1) + C(n-1, k) if n>k>0
Solution
Using crude Divide & Conquer method we can have the algorithm as follows:
Algorithm binomial (n, )
if k==0 || k==n
return 1
else
return binomial(n-1, k) + binomial(n-1, k-1)
Algorithm binomial (n, k): Analysis
• Re-computes values large number of times
• In worst case (when k=n/2), we have O(2n/n) efficiency
4,2
3,2 + 3,1
1,1
Using Dynamic Programming method: This approach stores the value of C(n,k) as they
are computed i.e. Record the values of the binomial co-efficient in a table of n+1 rows
and k+1 columns, numbered from 0 to n and 0 to k respectively.
0 1 2 … k-1 k
0 1
1 1 1
2
…
n-1 C(n-1, k-1) C(n-1, k)
n C(n, k)
if j == 0 or j == i
then A[i, j]
else
A[i, j] A[i-1, j-1] + A[i-1, j]
return A[n, k]
Algorithm binomial (n, k): Analysis
• Input size: n, k
• Basic operation: Addition
• Let A(n, k) be the total number of additions made by the algorithm in computing
C(n,k)
• The first k+1 rows of the table form a triangle while the remaining n-k rows form
a rectangle. Therefore we have two parts in A(n,k).
Warshall’s Algorithm
• Directed Graph: A graph whose every edge is directed is called directed graph
OR digraph
• Adjacency matrix: The adjacency matrix A = {aij} of a directed graph is the
boolean matrix that has
1 - if there is a directed edge from ith vertex to the jth vertex
0 - Otherwise
• Transitive Closure: Transitive closure of a directed graph with n vertices can be
defined as the n-by-n matrix T={tij}, in which the elements in the ith row (1≤ i ≤
n) and the jth column(1≤ j ≤ n) is 1 if there exists a nontrivial directed path (i.e., a
directed path of a positive length) from the ith vertex to the jth vertex, otherwise
tij is 0.
The transitive closure provides reach ability information about a digraph.
Computing Transitive Closure:
• We can perform DFS/BFS starting at each vertex
• Performs traversal starting at the ith vertex.
• Gives information about the vertices reachable from the ith vertex
• Drawback: This method traverses the same graph several times.
• Efficiency : (O(n(n+m))
• Alternatively, we can use dynamic programming: the Warshall’s Algorithm
Underlying idea of Warshall’s algorithm:
• Let A denote the initial boolean matrix.
• The element r(k) [ i, j] in ith row and jth column of matrix Rk (k = 0, 1, …, n) is
equal to 1 if and only if there exists a directed path from ith vertex to jth vertex
with intermediate vertex if any, numbered not higher than k
• Recursive Definition:
• Case 1:
A path from vi to vj restricted to using only vertices from {v1,v2,…,vk} as
intermediate vertices does not use vk, Then
R(k) [ i, j ] = R(k-1) [ i, j ].
• Case 2:
A path from vi to vj restricted to using only vertices from {v1,v2,…,vk} as
intermediate vertices do use vk. Then
R(k) [ i, j ] = R(k-1) [ i, k ] AND R(k-1) [ k, j ].
We conclude:
R(k)[ i, j ] = R(k-1) [ i, j ] OR (R(k-1) [ i, k ] AND R(k-1) [ k, j ] )
NOTE:
• If an element rij is 1 in R(k-1), it remains 1 in R(k)
• If an element rij is 0 in R(k-1), it has to be changed to 1 in R(k) if and only if the
element in its row I and column k and the element in its column j and row k are
both 1’s in R(k-1)
return R(n)
Example:
Find Transitive closure for the given digraph using Warshall’s algorithm.
A C
D
B
Solution:
R(0) = A B C D
A 0 0 1 0
B 1 0 0 1
C 0 0 0 0
D 0 1 0 0
R(0) k=1 A B C D A B C D
Vertex 1 A 0 0 1 0 A 0 0 1 0
can be B 1 0 0 1 B 1 0 1 1
intermediate C 0 0 0 0 C 0 0 0 0
node D 0 1 0 0 D 0 1 0 0
R1[2,3]
= R0[2,3] OR
R0[2,1] AND R0[1,3]
= 0 OR ( 1 AND 1)
=1
R(1) k=2 A B C D A B C D
Vertex A 0 0 1 0 A 0 0 1 0
{1,2 } can B 1 0 1 1 B 1 0 1 1
be C 0 0 0 0 C 0 0 0 0
intermediate D 0 1 0 0 D 1 1 1 1
nodes R2[4,1]
= R1[4,1] OR
R1[4,2] AND R1[2,1]
= 0 OR ( 1 AND 1)
=1
R2[4,3]
= R1[4,3] OR
R1[4,2] AND R1[2,3]
= 0 OR ( 1 AND 1)
=1
R2[4,4]
= R1[4,4] OR
R1[4,2] AND R1[2,4]
= 0 OR ( 1 AND 1)
=1
R(2) k=3 A B C D A B C D
Vertex A 0 0 1 0 A 0 0 1 0
{1,2,3 } can B 1 0 1 1 B 1 0 1 1
be C 0 0 0 0 C 0 0 0 0
intermediate D 1 1 1 1 D 1 1 1 1
nodes
NO CHANGE
R(3) k=4 A B C D A B C D
Vertex A 0 0 1 0 A 0 0 1 0
{1,2,3,4 } B 1 0 1 1 B 1 1 1 1
can be C 0 0 0 0 C 0 0 0 0
intermediate D 1 1 1 1 D 1 1 1 1
nodes
R4[2,2]
= R3[2,2] OR
R3[2,4] AND R3[4,2]
= 0 OR ( 1 AND 1)
=1
R(4) A B C D
A 0 0 1 0 TRANSITIVE CLOSURE
B 1 1 1 1 for the given graph
C 0 0 0 0
D 1 1 1 1
Efficiency:
• Time efficiency is Θ(n3)
• Space efficiency: Requires extra space for separate matrices for recording
intermediate results of the algorithm.
Weighted Graph
The weighted graph is a graph in which weight (or) distance are given along
the edges.
The weighted graph can be represented by weighted matrix.
W[i][j]=0 if i=j
W[i][j]=∞ if there is no edge between the vertices
‘i’ and ‘j’
W[i][j]= weight of the edge
The series starts with D(0) with no intermediate vertex. D(0) is a matrix in
which Vi and Vj .
In D(1) matrix, the shortest distance going through one intermediate vertex
with maximum path length of two edges is given continuing in this fashion.
We will compute D(n) containing the length of shortest path among all paths
that can use all ‘n’ vertices as intermediate.
Finally, we can get all pair shortest path from matrix D(n)
Example
Find the All pair shortest path for the following graph.
5
1 1
8 2 1
Step 1:
1 2 3
(0)
D 1 0 8 5
2 2 0 ∞
3 ∞ 1 0
D(0) is the weighted matrix (or) adjacency matrix for the given graph.
Step 2:
Now the node 1 will be considered as the intermediate node. D(1) can be
calculated by using the intermediate vertex as 1.
1 2 3
D(1) 1 0 8 5
2 2 0 7
3 ∞ 1 0
In this matrix there is no edge between the vertex 2 and 3. But using the
intermediate vertex we can travel from 2 to 3 with the shortest path is
213=2+5=7
So we can replace the 2nd row and 3rd column by 7.
Step 3:
Now the node 2 will be considered as the intermediate node. D(2) can be
calculated by using the intermediate vertex as 1 and 2.
1 2 3
D(2) 1 0 8 5
2 2 0 7
3 3 1 0
Step 4:
Now the node 3 will be considered as the intermediate node. D(3) can be
calculated by using the intermediate vertex as 1,2 and 3.
1 2 3
D(3) 1 0 6 5
2 2 0 7
3 3 1 0
Algorithm
Algorithm AllPair(cost, A, n)
Begin
for i=1 to n do
for j=1 to n do
A[i,j] = cost[I,j]
end for
end for
for k=1 to n do
for i=1 to n do
for j=1 to n do
A[i,j] = min(A[i,j] , A[i,k] + A[k,j])
end for
end for
end for
End
Analysis
The basic operation is
D [ i , j ] = min { D [ i , j ] , D [ i , k ] + D [ k , j ] }
It has three nested for loops
n n n
C(n) = ∑ ∑ ∑ 1
k=1 j=1 i=1
C(n) = n3
The time complexity of finding all pair shortest path is θ(n3)
Definition
Given a set of n items of known weights w1,…,wn and values v1,…,vn and a knapsack of
capacity W, the problem is to find the most valuable subset of the items that fit into the
knapsack.
Knapsack problem is an OPTIMIZATION PROBLEM
Step 2:
Recursively define the value of an optimal solution in terms of solutions to smaller
problems.
Initial conditions:
V[ 0, j ] = 0 for j ≥ 0
V[ i, 0 ] = 0 for i ≥ 0
Recursive step:
max { V[ i-1, j ], vi +V[ i-1, j - wi ] }
V[ i, j ] = if j - wi ≥ 0
V[ i-1, j ] if j - wi < 0
Step 3:
Bottom up computation using iteration
Example
1 2 3
2 3 4
3 4 5
4 5 6
Solution:
Using dynamic programming approach, we have:
Step Calculation Table
1 Initial conditions:
V[ 0, j ] = 0 for j ≥ 0 V[i,j] j=0 1 2 3 4 5
V[ i, 0 ] = 0 for i ≥ 0 i=0 0 0 0 0 0 0
1 0
2 0
3 0
4 0
2 W1 = 2,
Available knapsack capacity = 1 V[i,j] j=0 1 2 3 4 5
W1 > WA, CASE 1 holds: i=0 0 0 0 0 0 0
V[ i, j ] = V[ i-1, j ] 1 0 0
V[ 1,1] = V[ 0, 1 ] = 0 2 0
3 0
4 0
3 W1 = 2,
Available knapsack capacity = 2 V[i,j] j=0 1 2 3 4 5
W1 = WA, CASE 2 holds: i=0 0 0 0 0 0 0
V[ i, j ] = max { V[ i-1, j ], 1 0 0 3
vi +V[ i-1, j - wi ] } 2 0
V[ 1,2] = max { V[ 0, 2 ], 3 0
3 +V[ 0, 0 ] }
4 0
= max { 0, 3 + 0 } = 3
4 W1 = 2,
Available knapsack capacity = V[i,j] j=0 1 2 3 4 5
3,4,5 i=0 0 0 0 0 0 0
W1 < WA, CASE 2 holds: 1 0 0 3 3 3 3
V[ i, j ] = max { V[ i-1, j ], 2 0
vi +V[ i-1, j - wi ] } 3 0
V[ 1,3] = max { V[ 0, 3 ],
4 0
3 +V[ 0, 1 ] }
= max { 0, 3 + 0 } = 3
5 W2 = 3,
Available knapsack capacity = 1 V[i,j] j=0 1 2 3 4 5
W2 >WA, CASE 1 holds: i=0 0 0 0 0 0 0
V[ i, j ] = V[ i-1, j ] 1 0 0 3 3 3 3
V[ 2,1] = V[ 1, 1 ] = 0 2 0 0
3 0
4 0
6 W2 = 3,
Available knapsack capacity = 2 V[i,j] j=0 1 2 3 4 5
W2 >WA, CASE 1 holds: i=0 0 0 0 0 0 0
V[ i, j ] = V[ i-1, j ] 1 0 0 3 3 3 3
V[ 2,2] = V[ 1, 2 ] = 3 2 0 0 3
3 0
4 0
7 W2 = 3,
Available knapsack capacity = 3 V[i,j] j=0 1 2 3 4 5
W2 = WA, CASE 2 holds: i=0 0 0 0 0 0 0
V[ i, j ] = max { V[ i-1, j ], 1 0 0 3 3 3 3
vi +V[ i-1, j - wi ] } 2 0 0 3 4
V[ 2,3] = max { V[ 1, 3 ], 3 0
4 +V[ 1, 0 ] } 4 0
= max { 3, 4 + 0 } = 4
8 W2 = 3,
Available knapsack capacity = 4 V[i,j] j=0 1 2 3 4 5
W2 < WA, CASE 2 holds: i=0 0 0 0 0 0 0
V[ i, j ] = max { V[ i-1, j ], 1 0 0 3 3 3 3
vi +V[ i-1, j - wi ] } 2 0 0 3 4 4
V[ 2,4] = max { V[ 1, 4 ], 3 0
4 +V[ 1, 1 ] } 4 0
= max { 3, 4 + 0 } = 4
9 W2 = 3,
Available knapsack capacity = 5 V[i,j] j=0 1 2 3 4 5
W2 < WA, CASE 2 holds: i=0 0 0 0 0 0 0
V[ i, j ] = max { V[ i-1, j ], 1 0 0 3 3 3 3
vi +V[ i-1, j - wi ] } 2 0 0 3 4 4 7
V[ 2,5] = max { V[ 1, 5 ], 3 0
4 +V[ 1, 2 ] }
4 0
= max { 3, 4 + 3 } = 7
10 W3 = 4,
Available knapsack capacity = V[i,j] j=0 1 2 3 4 5
1,2,3 i=0 0 0 0 0 0 0
W3 > WA, CASE 1 holds: 1 0 0 3 3 3 3
V[ i, j ] = V[ i-1, j ] 2 0 0 3 4 4 7
3 0 0 3 4
4 0
11 W3 = 4,
Available knapsack capacity = 4 V[i,j] j=0 1 2 3 4 5
W3 = WA, CASE 2 holds: i=0 0 0 0 0 0 0
V[ i, j ] = max { V[ i-1, j ], 1 0 0 3 3 3 3
vi +V[ i-1, j - wi ] } 2 0 0 3 4 4 7
V[ 3,4] = max { V[ 2, 4 ], 3 0 0 3 4 5
5 +V[ 2, 0 ] }
4 0
= max { 4, 5 + 0 } = 5
12 W3 = 4,
Available knapsack capacity = 5 V[i,j] j=0 1 2 3 4 5
W3 < WA, CASE 2 holds: i=0 0 0 0 0 0 0
V[ i, j ] = max { V[ i-1, j ], 1 0 0 3 3 3 3
vi +V[ i-1, j - wi ] } 2 0 0 3 4 4 7
V[ 3,5] = max { V[ 2, 5 ], 3 0 0 3 4 5 7
5 +V[ 2, 1 ] }
4 0
= max { 7, 5 + 0 } = 7
13 W4 = 5,
Available knapsack capacity = V[i,j] j=0 1 2 3 4 5
1,2,3,4 i=0 0 0 0 0 0 0
W4 < WA, CASE 1 holds: 1 0 0 3 3 3 3
V[ i, j ] = V[ i-1, j ] 2 0 0 3 4 4 7
3 0 0 3 4 5 7
4 0 0 3 4 5
14 W4 = 5,
Available knapsack capacity = 5 V[i,j] j=0 1 2 3 4 5
W4 = WA, CASE 2 holds: i=0 0 0 0 0 0 0
V[ i, j ] = max { V[ i-1, j ], 1 0 0 3 3 3 3
vi +V[ i-1, j - wi ] } 2 0 0 3 4 4 7
V[ 4,5] = max { V[ 3, 5 ], 3 0 0 3 4 5 7
6 +V[ 3, 0 ] }
4 0 0 3 4 5 7
= max { 7, 6 + 0 } = 7
Efficiency:
• Running time of Knapsack problem using dynamic programming algorithm is: O(
n*W)
• Time needed to find the composition of an optimal solution is: O( n + W )
Memory Function
• Memory function combines the strength of top-down and bottom-up approaches
• It solves ONLY sub-problems that are necessary and does it ONLY ONCE.
The method:
• Uses top-down manner.
• Maintains table as in bottom-up approach.
• Initially, all the table entries are initialized with special “null” symbol to indicate
that they have not yet been calculated.
• Whenever a new value needs to be calculated, the method checks he
corresponding entry in the table first:
• If entry is NOT “null”, it is simply retrieved from the table.
• Otherwise, it is computed by the recursive call whose result is then recorded in the
table.
Algorithm:
Algorithm MFKnap( i, j ) if
V[ i, j] < 0
if j < Weights[ i ]
value MFKnap( i-1, j )
else
value max {MFKnap( i-1, j ),
Values[i] + MFKnap( i-1, j - Weights[i] )}
V[ i, j ] value return
V[ i, j]
Example:
Apply memory function method to the following instance of the knapsack problem
Capacity W= 5
1 2 3
2 3 4
3 4 5
4 5 6
Computation Remarks
1 Initially, all the table entries are initialized
with special “null” symbol to indicate that V[i,j] j=0 1 2 3 4 5
they have not yet been calculated. Here i=0 0 0 0 0 0 0
null is indicated with -1 value. 1 0 -1 -1 -1 -1 -1
2 0 -1 -1 -1 -1 -1
3 0 -1 -1 -1 -1 -1
4 0 -1 -1 -1 -1 -1
2 MFKnap( 4, 5 )
V[ 1, 5 ] = 3
MFKnap( 3, 5 ) 6 + MFKnap( 3, 0 )
V[i,j] j=0 1 2 3 4 5
5 + MFKnap( 2, 1 )
i=0 0 0 0 0 0 0
MFKnap( 2, 5 )
1 0 -1 -1 -1 -1 3
2 0 -1 -1 -1 -1 -1
MFKnap( 1, 5 ) 4 + MFKnap( 1, 2 ) 3 0 -1 -1 -1 -1 -1
0 3
4 0 -1 -1 -1 -1 -1
MFKnap( 0, 5 ) 3 + MFKnap( 0, 3 )
0 3+0
3 MFKnap( 4, 5 )
V[ 1, 2 ] = 3
MFKnap( 3, 5 ) 6 + MFKnap( 3, 0 )
V[i,j] j=0 1 2 3 4 5
MFKnap( 2, 5 ) 5 + MFKnap( 2, 1 )
i=0 0 0 0 0 0 0
1 0 -1 3 -1 -1 3
2 0 -1 -1 -1 -1 -1
MFKnap( 1, 5 ) 4 + MFKnap( 1, 2 )
3
3 0 -1 -1 -1 -1 -1
3 0
4 0 -1 -1 -1 -1 -1
MFKnap( 0, 2 ) 3 + MFKnap( 0, 0 )
0 3+0
4 MFKnap( 4, 5 )
V[ 2, 5 ] = 7
Conclusion:
Optimal subset: { item 1, item 2 }
Efficiency:
• Time efficiency same as bottom up algorithm: O( n * W ) + O( n + W )
• Just a constant factor gain by using memory function
• Less space efficient than a space efficient version of a bottom-up algorithm
53
Optimal Binary Search Tree
We are searching a word from a dictionary. For every required word we are
looking in the dictionary then it becomes a time consuming process.
To perform this lookup more efficiently we can build the binary search tree of
common words as key elements.
We can make the binary search tree efficient by arranging frequently used words
nearer to the root and less frequently used words away from the root.
Searching process is an arrangement of BST is more simplified as well as
efficient.
The optimal binary search tree technique is invented for this approach.
The element having more probability of appearance should be placed nearer to the
root of the BST.
The element with lesser probability should be placed away from the root.
The BST created with such kind of arrangement is called as an Optimal Binary
Search Tree.
Let [ a1, a2,…. an] be the set of identifiers such that a1 ≤ a2 ≤ a3
Let P(i) be the probability with which we can search for ai (successful search ).
Let qi be the probability of searching an element such that ai < x < ai+1 where
0 ≤ i ≤ n . ( unsuccessful search )
The tree which is build with optimum cost from
n n
∑ P(i) and ∑ qi is called OBST.
i=1 i=1
To obtain the OBST for the key values using dynamic programming, we will
compute the cost of the tree Cij and the Root of the tree Rij.
Formula for calculating C [ i , j ]
j
s=i
Assume that
C[i,i–1]=0 ¥ i ranging from 1 to n + 1
54
C [ i , i ] = Pi where 1 ≤ i ≤ n
Optimum Binary Search Tree consists of two tables. Such as,
o Cost table
o Root table
The cost table should be constructed in this fashion. Assume n = 3
0 1 2 3 ( j ranging from 0 to n )
i ranging 1 0 P1
from 1 to 2 0 P2
n+1 3 0 P3
4 0
The root table should be constructed in this fashion.
1 2 3
1 i
2 i
3 i
Fill up R [ i , i ] by i
Fill up R [ i , j ] by k value which is obtained as minimum.
The table should be filled up diagonally.
Example
Find the OBST for the following nodes
do if int while
Probabilities : 0.1 0.2 0.4 0.3
Solution
First, number out the nodes ( do, if, int, while ) as 1, 2, 3, 4 respectively.
There are 4 nodes. So n=4
1 2 3 4
Probabilities : 0.1 0.2 0.4 0.3
55
Step 1
Initial Table
0 1 2 3 4
1
0 0.1
2 0 0.2
0 0.4
3
4 0 0.3
0
5
C[ 1 , 0 ] = 0
C[ 2 , 1 ] = 0
C[ 3 , 2 ] = 0 C[ i , i-1 ] = 0 & C[ n +1, n ] = 0
C[ 4 , 3 ] = 0
C[ 5 , 4 ] = 0
C[ 1 , 1 ] = 0.1
C[ 2 , 2 ] = 0.2 C [ i , i ] = P[i]
C[ 3 , 3 ] = 0.4
C[ 4 , 4 ] = 0.3
Root Table 1 2 3 4
1
1
2 2
3 3
4 4
R [ 1, 1 ] = 1
R [ 2, 2 ] = 2 R[i,i]=i
R [ 3 , 3] = 3
R [ 4, 4 ] = 4
Compute C[ i , j ]
j
56
s=i
Step 2
Compute C [ 1 , 2 ]
k can be either 1 ( or ) 2
Let i = 1 , j = 2
Compute C [ 2 , 3 ]
k can be either 2 ( or ) 3
Let i = 2 , j = 3
C [ 2 , 3 ] = 0.8 with k = 3
Cost Table C [ 2 , 3 ] = 0.8 & R [ 2, 3 ] = 3
Compute C [ 3 , 4 ]
k can be either 3 ( or ) 4
Let i = 3 , j = 4
C [ 3 , 4 ] = 1.0 with k = 3
Cost Table C [ 3 , 4 ] = 1.0 & R [ 3, 4 ] = 3
57
0 1 2 3 4 1 2 3 4
1 1
0 0.1 0.4 1 2
2 0 0.2 0.8 2 2 3
0 0.4 1.0
3 3 3 3
4 0 0.3
0 4
5 4
Step 3
Compute C [ 1 , 3 ]
k can be either 1 , 2 ( or ) 3
Let i = 1 , j = 3
C [ 1 , 3 ] = 1.1 with k = 3
Cost Table C [ 1 , 3 ] = 1.1 & R [ 1, 3 ] = 3
Compute C [ 2 , 4 ]
k can be either 2 , 3 ( or ) 4
Let i = 2 , j = 4
C [ 2 , 4 ] = 1.4 with k = 3
Cost Table C [ 2 , 4 ] = 1.4 & R [ 2, 4 ] = 3
58
Now the cost table and Root table is updated as
0 1 2 3 4 1 2 3 4
1 1
0 0.1 0.4 1.1 1 2 3
2 0 0.2 0.8 1.4 2 2 3 3
0 0.4 1.0
3 3 3 3
4 0 0.3
0 4
5 4
Step 4
Compute C [ 1 , 4 ]
k can be either 1 , 2 , 3 ( or ) 4
Let i = 1 , j = 4
C [ 1 , 4 ] = 1.7 with k = 3
Cost Table C [ 1 , 4 ] = 1.7 & R [ 1, 4 ] = 3
59
Step 5
To build a tree R [ 1 , n ] = R [ 1 , 4 ] = 3 becomes Root
1 2 3 4
do if int while
Tk
Ti , k-1 T k+1,j
Here i = 1 , j = 4 and k = 3
R(1,4) key = 3 ( int )
R(1,4)
key = 3 ( int )
if while
do 60
The tree can be with Optimum Cost C [ 1 , 4 ] = 1.7
Algorithm
Algorithm OBST(P[1,2,……..,n])
Begin
for i <- 1 to n
C[i,i-1] <- 0
C[i,i] <- P[i]
R[I,i] <- i
end for
for i <- 1 to n+1
for j <- 0 to n
min <- ∞
for k <- i to j
if (C[i,k-1] + C[k+1,j]) < min
min <- C[i,k-1] + C[k+1,j]
kmin <- k
end if
end for
R[i,j] <- k
sum <- P[i]
for s <- i+1 to j
sum <- sum + P[i]
C[i,j] <- min + sum
end for
end for
for i <- 1 to n+1
for j <- 0 to n
write Cost[i,j]
end for
end for
for i <- 1 to n
61
for j <- 1 to n
write Root[i,j]
end for
end for
End
Analysis
The basic operation of OBST algorithm is computation of C[i,j] by finding the
min valued k. The basic operation is located with in three nested for loops. Hence
the Time complexity can be O(n3).
Time Complexity – O(n3)
Space Complexity – O(n2)
Greedy Technique
A greedy algorithm is an algorithm that follows the problem solving heuristic of making the
locally optimal choice at each stage[1]with the hope of finding a global optimum. In many
problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a
greedy heuristic may yield locally optimal solutions that approximate a global optimal solution
in a reasonable time.
For example, a greedy strategy for the traveling salesman problem (which is of a high
computational complexity) is the following heuristic: "At each stage visit an unvisited city
nearest to the current city". This heuristic need not find a best solution but terminates in a
reasonable number of steps; finding an optimal solution typically requires unreasonably many
steps. Inmathematical optimization, greedy algorithms solve combinatorial problems having the
properties of matroids.
Prim’s algorithm
Start with tree T1 consisting of one (any) vertex and “grow” tree one vertex at a time to
produce MST through a series of expanding subtrees T1, T2, …, Tn
On each iteration, construct Ti+1 from Ti by adding vertex not in Ti that is closest to those
already in Ti (this is a “greedy” step!)
Stop when all vertices are included
Prim's Algorithm constructs a minimal spanning tree (MST) in a connect graph or component.
62
Minimal Spanning Tree
A minimal spanning tree of a weighted graph is a spanning tree that has minimal of sum of edge
weights.
Prim's Algorithm solves the greedy algorithm using the greedy technique. It builds the spanning
tree by adding the minimal weight edge to a vertex not already in the tree.
Algorithm Prim(G)
VT ← {v0}
ET ← {} // empty set
for i ← 1 to |V| do
find the minimum weight edge, e* = (v*, u*) such that v* is in VT and u is in V- VT
VT ← VT union {u*}
ET ← ET union {e*}
return ET
63
64
Kruskal's Algorithm
Start with T = EMPTY SET
Keep track of connected components of graph with edges T
Initially components are single nodes
At each stage, add the cheapest edge that connects two nodes not already connected
The algorithm begins by sorting the graph’s edges in nondecreasing order of their
weights. Then, starting with the empty subgraph, it scans this sorted list, adding the next edge on
the list to the current subgraph if such an inclusion does not create a cycle and simply skipping
the edge otherwise.
Algorithm Kruskal(G)
//Kruskal’s algorithm for constructing a minimum spanning tree
//Input: A weighted connected graph G = _V, E_
//Output: ET , the set of edges composing a minimum spanning tree of G
sort E in nondecreasing order of the edge weights w(ei1) ≤ . . . ≤ w(ei|E|)
ET←∅ ; ecounter ←0 //initialize the set of tree edges and its size
k←0 //initialize the number of processed edges
while ecounter < |V| − 1 do
k←k + 1
if ET
{eik
} is acyclic
ET←ET{eik
}; ecounter ←ecounter + 1
return ET
Applying Prim’s and Kruskal’s algorithms to the same small graph by hand may create the
impression that the latter is simpler than the former.
65
66
Dijkstra's Algorithm
One of the main reasons for the popularity of Dijkstra's Algorithm is that it is one of the most
important and useful algorithms available for generating (exact) optimal solutions to a large class
of shortest path problems. The point being that this class of problems is extremely important
theoretically, practically, as well as educationally.
Indeed, it is safe to say that the shortest path problem is one of the most important generic
problem in such fields as OR/MS, CS and artificial intelligence (AI). One of the reasons for this
is that essentially any combinatorial optimization problem can be formulated as a shortest path
problem. Thus, this class of problems is extremely large and includes numerous practical
problems that have nothing to do with actual ("genuine") shortest path problems.
New classes of genuine shortest path problem are becoming very important these days in
connection with practical applications of Geographic Information Systems (GIS) such as on line
computing of driving directions. It is not surprising therefore that, for example, Microsoft has a
research project on algorithms for shortest path problems.
Consider the best-known algorithm for the single-source shortest-paths problem, called
Dijkstra’s algorithm.4 This algorithm is applicable to undirected and directed graphs with
nonnegative weights only. Since in most applications this condition is satisfied, the limitation has
not impaired the popularity of Dijkstra’s algorithm. Dijkstra’s algorithm finds the shortest paths
to a graph’s vertices in order of their distance from a given source. First, it finds the shortest path
from the source to a vertex nearest to it, then to a second nearest, and so on.
67
68
Huffman Trees
Huffman coding is a lossless data compression algorithm. The idea is to assign variable-
legth codes to input characters, lengths of the assigned codes are based on the frequencies of
corresponding characters. The most frequent character gets the smallest code and the least
frequent character gets the largest code.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit
sequences) are assigned in such a way that the code assigned to one character is not prefix of
code assigned to any other character. This is how Huffman Coding makes sure that there is no
69
ambiguity when decoding the generated bit stream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d,
and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity
because code assigned to c is prefix of codes assigned to a and b. If the compressed bit stream is
0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”.
Huffman’s algorithm
Step 1 Initialize n one-node trees and label them with the symbols of the alphabet given. Record
the frequency of each symbol in its tree’s root to indicate the tree’s weight. (More generally, the
weight of a tree will be equal to the sum of the frequencies in the tree’s leaves.)
Step 2 Repeat the following operation until a single tree is obtained. Find two trees with the
smallest weight (ties can be broken arbitrarily, but see Problem 2 in this section’s exercises).
Make them the left and right subtree of a new tree and record the sum of their weights in the root
of the new tree as its weight.
A tree constructed by the above algorithm is called a Huffman tree. It defines—in the manner
described above—a Huffman code.
EXAMPLE Consider the five-symbol alphabet {A, B, C, D, _} with the following occurrence
frequencies in a text made up of these symbols: symbol A B C D _ frequency 0.35 0.1 0.2 0.2
0.15
70
71
UNIT IV ITERATIVE IMPROVEMENT 9
The Simplex Method-The Maximum-Flow Problem – Maximm Matching in Bipartite Graphs-
The Stable marriage Problem.
INTRODUCTION:
Algorithm design technique for solving optimization problems
Start with a feasible solution
Repeat the following step until no improvement can be found:
change the current feasible solution to a feasible solution with a better value of the
objective function
Return the last feasible solution as optimal
Note: Typically, a change in a current solution is “small” (local search)
Major difficulty: Local optimum vs. global optimum
Important Examples
Simplex method
Ford-Fulkerson algorithm for maximum flow problem
Maximum matching of graph vertices
Gale-Shapley algorithm for the stable marriage problem
Linear Programming
Linear programming (LP) problem is to optimize a linear function of several variables
subject to linear constraints:
maximize (or minimize) c1 x1 + ...+ cn xn
subject to ai1x1+ ...+ ain xn ≤ (or ≥ or =) bi ,
i = 1,...,m , x1 ≥ 0, ... , xn ≥ 0
The function z = c1 x1 + ...+ cn xn is called the objective function;
constraints x1 ≥ 0, ... , xn ≥ 0 are called non-negativity constraints
72
Example
maximize 3x + 5y
subject to x+ y ≤4
x + 3y ≤ 6
x ≥ 0, y ≥ 0
x + 3y = 6
( 0, 2 )
( 3, 1 )
x
( 0, 0 ) ( 4, 0 )
x+y=4
Feasible region is the set of points defined by the constraints
Geometric solution
Extreme Point Theorem Any LP problem with a nonempty bounded feasible region has an
optimal solution; moreover, an optimal solution can always be found at an extreme point of the
problem's feasible region.
maximize 3x + 5y
subject to x+ y ≤4
x + 3y ≤ 6
x ≥ 0, y ≥ 0
73
y
( 0, 2 )
( 3, 1 )
x
( 0, 0 ) ( 4, 0 )
3x + 5y = 20
3x + 5y = 14
3x + 5y = 10
Possible outcomes in solving an LP problem
has a finite optimal solution, which may not be unique
unbounded: the objective function of maximization (minimization) LP problem is
unbounded from above (below) on its feasible region
infeasible: there are no points satisfying all the constraints, i.e. the constraints are
contradictory
The Simplex Method
Simplex method is the classic method for solving LP problems, one of the most
important algorithms ever invented
Invented by George Dantzig in 1947 (Stanford University)
Based on the iterative improvement idea:
Generates a sequence of adjacent points of the problem’s feasible region with improving values
of the objective function until no further improvement is possible
Outline of the Simplex Method
Step 0 [Initialization] Present a given LP problem in standard form and set up initial tableau.
74
Step 1 [Optimality test] If all entries in the objective row are nonnegative — stop: the
tableau represents an optimal solution.
Step 2 [Find entering variable] Select (the most) negative entry in the objective row.
Mark its column to indicate the entering variable and the pivot column.
Step 3 [Find departing variable]
• For each positive entry in the pivot column, calculate the θ-ratio by dividing that row's
entry in the rightmost column by its entry in the pivot column.
(If there are no positive entries in the
pivot column — stop: the problem is unbounded.)
• Find the row with the smallest θ-ratio, mark this row to indicate the departing variable
and the pivot row.
Step 4 [Form the next tableau]
• Divide all the entries in the pivot row by its entry in the pivot column.
• Subtract from each of the other rows, including the objective row, the new pivot row
multiplied by the entry in the pivot column of the row in question.
• Replace the label of the pivot row by the variable's name of the pivot column and go
back to Step 1.
Example of Simplex Method
maximize
z = 3x + 5y + 0u + 0v
subject to
x+ y+ u =4
x + 3y + v =6
x≥0, y≥0, u≥0, v≥0
75
x y u v x y u v x y u v
2 1 x
u 1 1 1 0 4 u 0 1 2 1 0 3/2 1/3 3
3 3
1 1 y 0 1 1/2
v 1 3 0 1 6 y 1 0 2 1/2 1
3 3
4 5 0 0 2 1 14
3 5 0 0 0 0 0 10
3 3
76
Basic feasible solutions
A basic solution to a system of m linear equations in n unknowns (n ≥ m) is obtained by setting n
– m variables to 0 and solving the resulting system to get the values of the other m variables.
The variables set to 0 are called nonbasic;
the variables obtained by solving the system are called basic.
A basic solution is called feasible if all its (basic) variables are nonnegative.
Example x+ y+u =4
x + 3y + v = 6
(0, 0, 4, 6) is basic feasible solution
(x, y are nonbasic; u, v are basic)
Simplex Table
maximize
z = 3x + 5y + 0u + 0v
subject to
x+ y+ u =4
x + 3y + v =6
x≥0, y≥0, u≥0, v≥0
77
The Maximum-Flow Problem
Problem of maximizing the flow of a material through a transportation network (e.g., pipeline
system, communications or transportation networks)
Formally represented by a connected weighted digraph with n vertices numbered from 1 to n
with the following properties:
• contains exactly one vertex with no entering edges called the source (numbered 1)
• contains exactly one vertex with no leaving edges, called the sink (numbered n)
• has positive integer weight uij on each directed edge (i.j), called the edge
capacity, indicating the upper bound on the amount of the material that can be
sent from i to j through this edge
In other words, the total amount of the material entering an intermediate vertex
must be equal to the total amount of the material leaving the vertex. This condition
is called the flow-conservation requirement.
78
79
Example:
80
Maximum Matching in Bipartite Graphs
Our goal is to find the maximum matching in a graph. Note that a maximal matching can
be found very easily — just keep adding edges to the matching until no more can be
added.
We will construct a 1directed graph G0(V0, E0), in which V0 which contains all the
nodes of V along with a source nodes and a sink node t. For every edge in E, we add a
directed edge in E0 from X to Y . Finally we add a directed edge from s to all nodes in X
and from all nodes of Y to t. Each edge is given unit capacity.
81
82
The Stable marriage Problem
In mathematics, economics, and computer science, the stable marriage problem (SMP) is
the problem of finding a stable matching between two sets of elements given a set of
preferences for each element. A matching is a mapping from the elements of one set to the
elements of the other set. A matching is stable whenever it is not the case that both:
a. some given element A of the first matched set prefers some given element B of the second
matched set over the element to which A is already matched, and
b. B also prefers A over the element to which B is already matched
83
In other words, a matching is stable when there does not exist any alternative pairing (A, B) in
which both A and B are individually better off than they would be with the element to which
they are currently matched.
The stable marriage problem is commonly stated as:
Given n men and n women, where each person has ranked all members of the opposite
sex with a unique number between 1 and n in order of preference, marry the men and
women together such that there are no two people of opposite sex who would both rather
have each other than their current partners. If there are no such people, all the marriages
are "stable".
A marriage matching M is a set of n (m, w) pairs whose members are selected from disjoint n-
element sets Y and X in a one-one fashion, i.e., each man m from Y is paired with exactly one
woman w from X and vice versa.
This algorithm guarantees that:
Everyone gets married
Once a woman becomes engaged, she is always engaged to someone. So, at the end, there
cannot be a man and a woman both unengaged, as he must have proposed to her at some
point (since a man will eventually propose to everyone, if necessary) and, being
unengaged, she would have had to have said yes.
The marriages are stable
Let Alice be a woman and Bob be a man who are both engaged, but not to each other.
Upon completion of the algorithm, it is not possible for both Alice and Bob to prefer each
other over their current partners. If Bob prefers Alice to his current partner, he must have
proposed to Alice before he proposed to his current partner. If Alice accepted his
proposal, yet is not married to him at the end, she must have dumped him for someone
she likes more, and therefore doesn't like Bob more than her current partner. If Alice
rejected his proposal, she was already with someone she liked more than Bob.
84
Stable marriage algorithm
Input: A set of n men and a set of n women along with rankings of the women
by each man and rankings of the men by each woman with no ties
allowed in the rankings
Output: A stable marriage matching
Step 0 Start with all the men and women being free.
Step 1 While there are free men, arbitrarily select one of them and do the following:
Proposal The selected free man m proposes to w, the next woman on his preference list (who is
the highest-ranked woman who has not rejected him before). Response If w is free, she accepts
the proposal to be matched with m. If she is not free, she compares m with her current mate. If
she prefers to him, she accepts m’s proposal, making her former mate free; otherwise, she simply
rejects m’s proposal, leaving m
free.
Step 2 Return the set of n matched pairs.
85
UNIT V COPING WITH THE LIMITATIONS OF ALGORITHM POWER 9
86
Methods for Establishing Lower Bounds
Trivial lower bounds
Information-theoretic arguments (decision trees)
Adversary arguments
Problem reduction
Decision Trees
Decision tree — A convenient model of algorithms involving
Comparisons in which:
Internal nodes represent comparisons
Leaves represent outcomes (or input cases)
Decision tree for 3-element insertion sort
87
Decision Trees and Sorting Algorithms
Any comparison-based sorting algorithm can be represented by a decision tree (for each fixed
n)
Number of leaves (outcomes) n!
Height of binary tree with n! leaves log2n!
Minimum number of comparisons in the worst case log2n! for any comparison-based
sorting algorithm, since the longest path represents the worst case and its length is the height
log2n! n log2n (by Sterling approximation)
This lower bound is tight (mergesort or heapsort)
Ex. Prove that 5 (or 7) comparisons are necessary and sufficient for sorting 4 keys (or 5 keys,
respectively).
Adversary Arguments
Adversary argument: It’s a game between the adversary and the (unknown) algorithm. The
adversary has the input and the algorithm asks questions to the adversary about the input. The
adversary tries to make the algorithm work the hardest by adjusting the input (consistently). It
wins the “game” after the lower bound time (lower bound proven) if it is able to come up with
two different inputs.
88
Example 1: “Guessing” a number between 1 and n using yes/no questions (Is it larger than x?)
Adversary: Puts the number in a larger of the two subsets generated by last question
Adversary: Keep the ordering b1 < a1 < b2 < a2 < … < bn < an in mind and answer comparisons
consistently
Claim: Any algorithm requires at least 2n-1 comparisons to output the above ordering (because it
has to compare each pair of adjacent elements in the ordering)
Ex: Design an adversary to prove that finding the smallest element in a set of n elements
requires at least n-1 comparisons.
Reduction from Q to P: Given a set X = {x1, …, xn} of numbers (i.e. an instance of the
uniqueness problem), we form an instance of MST in the Cartesian plane: Y = {(0,x1), …,
(0,xn)}. Then, from an MST for Y we can easily (i.e. in linear time) determine if the elements in
X are unique.
89
Classifying Problem Complexity
Is the problem tractable, i.e., is there a polynomial-time (O(p(n)) algorithm that solves it?
Possible answers:
yes (give example polynomial time algorithms)
no
o because it’s been proved that no algorithm exists at all (e.g., Turing’s halting
problem)
o because it’s been be proved that any algorithm for it would require exponential
time
unknown. How to classify their (relative) complexity using reduction?
90
Class NP
NP (nondeterministic polynomial): class of decision problems whose proposed solutions can be
verified in polynomial time = solvable by a nondeterministic polynomial algorithm
A nondeterministic polynomial algorithm is an abstract two-stage procedure that:
Generates a solution of the problem (on some input) by guessing
Checks whether this solution is correct in polynomial time
By definition, it solves the problem if it’s capable of generating and verifying a solution on one
of its tries
Why this definition?
led to development of the rich theory called “computational complexity”
91
we have:
P NP
Big (million dollar) question: P = NP ?
NP-Complete Problems
A decision problem D is NP-complete if it is as hard as any problem in NP, i.e.,
D is in NP
every problem in NP is polynomial-time reducible to D
Other NP-complete problems obtained through polynomial- time reductions from a known NP-
complete problem
92
combinatorial nature
P = NP ? Dilemma Revisited
P = NP would imply that every problem in NP, including all NP-complete problems,
could be solved in polynomial time
If a polynomial-time algorithm for just one NP-complete problem is discovered, then
every problem in NP can be solved in polynomial time, i.e. P = NP
Most but not all researchers believe that P NP , i.e. P is a proper subset of NP. If P
NP, then the NP-complete problems are not in P, although many of them are very
useful in practice.
Introduction:
Backtracking & Branch-and-Bound are the two algorithm design techniques for solving
problems in which the number of choices grows at least exponentially with their instance size.
Both techniques construct a solution one component at a time trying to terminate the process as
soon as one can ascertain that no solution can be obtained as a result of the choices already
made. This approach makes it possible to solve many large instances of NP-hard problems in an
acceptable amount of time.
93
Both Backtracking and branch-and-bound uses a state-space tree-a rooted tree whose
nodes represent partially constructed solutions to the problem. Both techniques terminate a node
as soon as it can be guaranteed that no solution to the problem can be obtained by considering
choices that correspond to the node’s descendants.
Backtracking
Backtracking constructs its state-space tree in the depth-first search fashion in the
majority of its applications. If the sequence of choices represented by a current node of the state-
space tree can be developed further without violating the problem’s constraints, it is done by
considering the first remaining legitimate option for the next component. Otherwise, the method
backtracks by undoing the last component of the partially built solution and replaces it by the
next alternative.
2 Quenns Problem
94
Q Q Q
Q
Illegal Illegal
Q Q
Q Q
Illegal Illegal
4 - Quenns Problem
The solution for the 4 – Queens problem is easily obtained using the backtracking
algorithm.
The aim of the problem is to place 4 – Queens on the chessboard in such a way that none
of the queens hit each other.
Solving Procedure
Step 1:
Let us assume the queens are placed in the row wise. The 1st Q is placed I 1st row at (1,1)
Q1
Step 2:
The second Queen has to be placed in 2nd row. It is not possible to place the Q2 at the
following places.
95
(2,1) – placing the queens in the same column
(2,2) – placing the queens in the same diagonal.
So the queen can be placed in (2,3)
Q1
Q2
Step 3:
The third Queen has to be placed in 3rd row. It is not possible to place the Q3 at the
following places.
(3,1) – placing the queens in the same column
(3,2) – placing the queens in the same diagonal.
(3,3) – placing the queens in the same column
(3,4) – placing the queens in the same diagonal.
Q1
Q2
X X X X
So this tells that Q3 can not be placed at 3rd row and it will not given a solution. So
backtracking to the previous step ie) step 2. So instead of placing the Q2 at (2,3), Q2 can
be placed in (2,4).
Q1
Q2
96
Step 4:
The third Queen has to be placed in 3rd row. It is not possible to place the Q3 at the
following places.
(3,1) – placing the queens in the same column
(3,3) – placing the queens in the same diagonal.
(3,4) – placing the queens in the same column
So the queen has to be placed in (3,2)
Q1
Q2
Q3
Step 5:
The fourth Queen has to be placed in 4th row. It is not possible to place the Q4 at the
following places.
(4,1) – placing the queens in the same column
(4,2) – placing the queens in the same column
(4,3) – placing the queens in the same diagonal.
(4,4) – placing the queens in the same column
Q1
Q2
Q3
X X X X
Backtracking to the previous solution
So this tells that Q4 can not be placed at 4th row and it will not given a solution. So
backtracking to the previous step ie) step 4. So instead of placing the Q3 at (3,2), Q3 can
be placed in (3,3) (or) (3,4), that is also not possible because it comes in a same diagonal
and same column. So again backtracking to the previous step of step 4 ie) step3 and do
97
the same process. Finally Q3 is placed in (3,1)
Q1
Q2
Q3
Step 6:
Q1
Q2
Q3
Q4
Basic Terminologies
Solution Space
Tuples that satisfy the constraints
The solution space can be organized into a tree
State Space
State Space is the set of all paths from root node to other nodes.
State Space Tree
State Space Tree is the tree organization of the solution space.
In backtracking technique while solving a given problem a tree is constructed based on
the choices made.
Such a tree with all possible solutions is called State Space Tree.
Promising and Non-Promising Node
A node in State Space Tree is said to be promising
The node which are not promising for solution in a state space tree is called non –
promising node.
98
Q1 = 1 Q1 = 3
1
Q1= 2
2 6
Q2 = 3 Q2 = 4 10 Q2 = 1
3 4 7 11
Not a Q3 = 2 Q3 = 1 Q4 = 2
Solution
5 8 12
Not a Q4 = 3 Q4 = 2
Solution
9 13
Solution Solution
8 – Queens Problem
The solution for the 8 Queens problem is easily obtained using backtracking algorithm.
The aim of the problem is to place 8 queens on 8 * 8 chessboard in such a way that none
of queens hit each other.
The following constraints are used to place queens on the chess board.
No two queens should be placed on the same diagonal
No two queens should be placed on the same column
No two queens should be placed on the same row
Let us assume two queens are placed at positions ( i , j ) and ( k , l ) then, the two queens
are said to be in same diagonal, if the following conditions are satisfied.
i+j=k+1
i–j=k–1
The above two equations can be rearranged as follows
i-k=1-j
i–k=j–1
Combining these two equations
Abs ( i – k ) = Abs ( j - 1 )
99
If this condition is true, then the queens are in the same diagonal.
In any stage, we are unable to place a queen then we have to backtrack and change the
position of the previous queen.
This is repeated till we place all the 8 queens on the board.
Algorithm
Algorithm Nqueens(k,n)
Begin
for i=1 to n do
if place(k,i) then
x[k] = I
if ( k = n) then
write (x[1:n])
else
Nqueens(k+1,n)
end for
End
Algorithm place(k,i)
Begin
for j=1 to k-1 do
if x[j] = i
return false
else
return true
end for
End
Time complexity of 8 Queen problem is O(k-1)
100
Q1
Q1
Q2
Q1
Q2
Q3
101
Q1
Q2
Q3
Q4
Q1
Q2
Q3
Q4
Q5
Q1
Q2
Q3
Q4
102
Q5
Q6
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
103
Q1 = 1 1
Q1 = 2 Q1 = 3
2
9 14
Q2 = 3 Q2 = 4 Q2 = 6
3 15
11
Q3 = 2
Q3 = 6 Q3 = 1
16
4 11 Q4 = 7
Q4 = 3 Q4 = 3
5 12 17 Q5 = 1
Q5 = 7 Q5 = 5
6 13 18
Q6 = 2 Not a
Q6 = 2 Solution
7 Q7 = 4 19
Q7 = 8
8 a
Not 20 Q8 = 5
Solution
21
Solution
Analysis
Time complexity of 8 Queen’s problem is O ( k – 1 )
104
Sum of Subsets
The sum of subsets is solved using the backtracking method.
Problem Definition
Given n distinct positive numbers (weights) Wi, 1 ≤ i ≤ n, and m, find all subsets of the
Wi whose sum is m.
The element Xi of the solution vector is either one (or) zero depending on whether the
weight Wi is included or not.
Xi = 1 , Wi is included
Xi = 0 , Wi is not included
Solution to Sum of Subsets Problem
Sort the weights in ascending order.
The root of the space tree represents the starting point, with no decision about the given
elements.
The left and right child represents inclusion and exclusion of Xi in a set.
The node to be expanded, check it with the following condition.
k
Σ Wi Xi + Wi +1 ≤ m
i=1
The bounded node can be identified with the following condition.
The choice for the bounding function is Bk (X1,……… Xk ) =true iff
k n
Σ Wi Xi + Σ Wi ≥m
i=1 i=k+1
Backtrack the bounded node and find alternative solution.
Thus a path from the root to a node on the ith level of the tree indicates which of the
first i numbers have been included in the subsets represented by that node.
We can terminate the node as non promising if either of the 2
ingratiation holds
S’ + Wi +1 > m (where S’ in too large)
S’ + Wi +1 < m (where S’ in too small)
k
105
S’ = Σ Wi Xi
i=1
Constraints
Implicit Constraints
No two elements in the subset can be same.
Explicit Constraints
The tuple values can be any of the value between ‘1’ and ‘n’ and needed to be ascending
order.
Procedure
Let ‘S’ be a set of elements and ‘m’ be the expected sum of subsets.
Step 1
Start with an empty set.
Step 2
Add to the subset, the next element from the list
Step 3
If the subset is having sum ‘m’ then stop with that subset as the solution.
Step 4
If the subset if not feasible (or) if we have reached the end of the set then
backtrack through the subset until we find the most suitable value.
Step 5
If the subset is feasible, then repeat step 2.
Step 6
If we have visited, all the elements without finding a suitable subset and no
backtracking is possible then stop without solution.
Example
Consider a set S = {3, 5, 6, 7} m=15. Solve it for obtaining sum of subsets?
Subset Sum Action
3 3 < 15 Add next element
3, 5 8 < 15 Add next element
3, 5, 6 14 < 15 Add next element
106
3, 5, 6, 7 21 > 15 Backtrack ( sum exceeds )
Condition satisfies. Subset = sum.
3,5,7 15 = 15
Solution obtained
Solution set X = { 1, 1, 0, 1 }
Algorithm
Algorithm sumofsubset(s,m,n)
Begin
solution := 0
for i <- 1 to n
solution <- solution + Wi
if ( solution > m )
solution <- solution – Wi
Xi <- 0
end if
Xi <- 1
end for
write “ The tuples X[1………n]
End
Branch-and-Bound
It is an algorithm design technique that enhances the idea of generating a state-space tree
with the idea of estimating the best value obtainable from a current node of the decision tree: if
such an estimate is not superior to the best solution seen up to that point in the processing, the
node is eliminated from further consideration.
A feasible solution, is a point in the problem’s search space that satisfies all the
problem’s constraints, while an optimal solution is a feasible
solution with the best value of the objective function compared to backtracking branch-and-
bound requires 2 additional items:
107
1) A way to provide, for every node of a state-space tree a bound on the best value of the
objective function on any solution that can be obtained by adding further components to
the partial solution represented by the node.
2) The best value of the best solution seen so far.
If this information is available, we can compare a node’s bound with the value of the best
solution seen so far: if the bound value is not better than the best solution seen so far- i.e., not
smaller for a minimization problem and not larger for a maximization problem- the node is
nonpromising and can be terminated because no solution obtained from it can yield a better
solution than the one already available.
In general, we terminate a search path at the current node in a state-space tree of a branch
& bound algorithm for any one of the following three reasons:
1) The value of the node’s bound is not better than the value of the best solution seen so far.
2) The node represents no feasible solutions because the constraints of the problem are
already violated.
3) The subset of feasible solutions represented by the node consists of a simple point. Compare
the value of the objective function for this feasible solution with that of the best solution seen
so far and update the latter with the former if the new solution is better.
Assignment problem
It is a problem where each job will be assigned to each person. And no 2 jobs can be
assigned to same person and no 2 person should be assigned with the same job.
108
Example
Job 1 Job 2 Job 3 Job 4
Person a 9 2 7 8
Person b 6 4 3 7
Person c 5 8 1 8
Person d 7 6 9 4
Lower bound: Any solution to this problem will have total cost
at least: 2 + 3 + 1 + 4 (or 5 + 2 + 1 + 4)
Knapsack problem
Given a items of known weights Wi and values Vi, i=1,2,........n and a knapsack of
capacity W, find the most valuable subset of the items that items that fit in the knapsack.
109
Traveling salespersons problem
Visit all the vertices with a low cost yields the optimal solution.
State-Space tree with the list of vertices in a node specifies a beginning part of the Hamiltonian
circuits represented by the node
The lower bound is obtained as lb = s/2 ; where s is the sum of the distance of the n cities.
i.e., [ [ (1+5) + (3+6) + (1+2) + (3+5) +(2+3) ]/2 ] =16, which is the optimal tour.
110
Approximation Algorithms for NP-hard problems
It is the combinatorial problems which fall under NP-Hard problems. Accuracy ratio and
performance ratio has to be calculated. Nearest-neighbour algorithm and Twice-around-the-tree
algorithm. Greedy algorithm is used for the continuous knapsack problem for the fractional
version.
Approximation algorithms are often used to find approximation solutions to difficult
problems of combinatorial optimization. Theperformance ratio is the principal metric for
measuring the accuracy of such approximation algorithms.Apply a fast (i.e., a polynomial-time)
approximation algorithm to get a solution that is not necessarily optimal but hopefully close to it
Accuracy measures:
accuracy ratio of an approximate solution
sar(sa) = f(sa) / f(s*) for minimization problems r(sa) = f(s*) /
f(sa) for maximization problems
where f(sa) and f(s*) are values of the objective function f for the approximate solution sa and
actual optimal solution s*,performance ratio of the algorithm A the lowest upper bound of r(sa)
on all instances.The nearest-neighbor is a greedy method for approximating a solution to the
traveling salesman problem. The performance ratio is unbounded above, even for the important
subset of Euclidean graphs.
Starting at some city, always go to the nearest unvisited city, and, after visiting all the
cities, return to the starting one
Note: Nearest-neighbor tour may depend on the starting city
Accuracy: RA = ∞ (unbounded above) – make the length of AD arbitrarily large in the
above example
Twice-around-the-tree is an approximation algorithm for TSP with the performance ratio
of 2 for Euclidean graph. The algorithm is based on modifying a walk around a MST by
shortcuts.
111
Stage 1: Construct a minimum spanning tree of the graph (e.g., by Prim’s or Kruskal’s
algorithm)
Stage 2: Starting at an arbitrary vertex, create a path that goes twice around the tree
and returns to the same vertex
Stage 3: Create a tour from the circuit constructed in Stage 2 by making shortcuts to avoid
visiting intermediate vertices more than once
112
A sensible greedy algorithm for the knapsack problem is based on processing an input’s
items in descending order of their value-to-weight ratios. For continuous version, the algorithm
always yields an exact optimal solution.
Greedy Algorithm for Knapsack Problem
Step 1: Order the items in decreasing order of relative values: v1/w1≥… ≥ vn/wn
Step 2: Select the items in this order skipping those that don’t fit into the knapsack
Accuracy
RA is unbounded (e.g., n = 2, C = m, w1=1, v1=2, w2=m, v2=m)
yields exact solutions for the continuous version
Approximation Scheme for Knapsack Problem
Step 1: Order the items in decreasing order of relative values: v1/w1≥… ≥ vn/wn
Step 2: For a given integer parameter k, 0 ≤ k ≤ n, generate all subsets of k items or
less and for each of those that fit the knapsack,add the remaining items in decreasing
order of their value to weight ratios
Step 3: Find the most valuable subset among the subsets generated in Step 2 and return it
as the algorithm’s output
• Accuracy: f(s*) / f(sa) ≤ 1 + 1/k for any instance of size n
• Time efficiency: O(knk+1)
• There are fully polynomial schemes: algorithms with polynomial running time as
functions of both n and k
113