CIR 305 Design and Analysis of Algrithm - Notes
CIR 305 Design and Analysis of Algrithm - Notes
YEAR 3 SEMESTER 1
COURSE NOTES
• An algorithm is “any well-defined computational procedure that takes some input and produces
some output.”
• Algorithms are an essential component not just of inherently electronic domains such as
electronic commerce or the infrastructure of the Internet, but also of science and industry, as
exemplified by the Human Genome Project or an airline that wishes to minimize costs by
optimizing crew assignments.
• Studying algorithms allows us both to understand (and put to use) the capabilities of computers,
and to communicate our ideas about computation to others.
• Algorithms are a basis for design: they serve as building blocks for new technology, and provide
a language in which to discuss aspects of that technology.
• Design of algorithm deals with how the computational procedures are organised to accept and
produce the output in the most efficient and effective manner.
• The field of computer science, which studies efficiency of algorithms, is known as analysis of
algorithms.
Definition of algorithm
An Algorithm is a finite sequence of instructions, each of which has a clear meaning and can be
performed with a finite amount of effort in a finite length of time.
No matter what the input values may be, an algorithm terminates after executing a finite number
of instructions
Characteristics of Algorithm
Input: there are zero or more quantities, which are externally supplied;
Output: at least one quantity is produced;
Definiteness: each instruction must be clear and unambiguous;
Finiteness: if we trace out the instructions of an algorithm, then for all cases the algorithm will
terminate after a finite number of steps;
Effectiveness: every instruction must be sufficiently basic that it can in principle be carried out
by a person using only pencil and paper. It is not enough that each operation be definite, but it
must also be feasible.
1
The question of whether a certain algorithm is effective depends on the context in which it is to be used.
Often relevant are the following two concerns:
• Correctness. An algorithm is said to be correct if, for every possible input, the algorithm halts
with the desired output. Usually we will want our algorithms to be correct, though there are
occasions in which incorrect algorithms are useful, if their rate of error can be controlled.
• Efficiency. The best algorithms are ones which not just accomplish the desired task, but use
minimal resources while doing so. The resources of interest may include time, money, space in
memory, or any number of other “costs.”
Algorithm Design Paradigms: Are the general approaches to the construction of efficient solutions to
problems.
Although more than one technique may be applicable to a specific problem, it is often the case that an
algorithm constructed by one approach is clearly superior to equivalent solutions built using alternative
techniques.
1. Brute force
2. Greedy algorithms
3. Divide-and-conquer, decrease-and-conquer
4. Dynamic programming
5. Transform-and-conquer
6. Backtracking and branch-and-bound
Brute Force
Brute force is a straightforward approach to solve a problem based on the problem’s statement
and definitions of the concepts involved.
It is considered as one of the easiest approach to apply and is useful for solving small – size
instances of a problem.
2
Bubble sort
Sequential search
Exhaustive search: Traveling Salesman Problem, Knapsack problem.
The solution is constructed through a sequence of steps, each expanding a partially constructed
solution obtained so far.
At each step the choice must be locally optimal – this is the central point of this technique.
Examples:
Greedy techniques are mainly used to solve optimization problems. They do not always give the best
solution.
Divide-and-Conquer, Decrease-and-Conquer
split this into several smaller sub-instances (of the same problem),
independently solve each of the sub-instances
then combine the sub-instance solutions so as to yield a solution for the original instance.
With the divide-and-conquer method the size of the problem instance is reduced by a factor (e.g. half the
input size), while with the decrease-and-conquer method the size is reduced by a constant.
Insertion sort
Binary Tree traversals: inorder, preorder and postorder (recursion)
3
Computing the length of the longest path in a binary tree (recursion)
Computing Fibonacci numbers (recursion)
Reversing a queue (recursion)
Dynamic Programming
Examples:
Transform-and-Conquer
The method is used for state-space search problems. State-space search problems are problems, where the
problem representation consists of:
initial state
goal state(s)
a set of intermediate states
a set of operators that transform one state into another.
4
a cost function – evaluates the cost of the operations (optional)
a utility function – evaluates how close is a given state to the goal state (optional)
The solving process solution is based on the construction of a state-space tree, whose nodes represent
states, the root represents the initial state, and one or more leaves are goal states. Each edge is labeled
with some operator.
Analysis of an algorithm is the examination of an algorithm in order to determine the space and time
required by an algorithm for the purpose of evaluating its performance.
Complexity of algorithms
The complexity of an algorithm is determined by the memory space and the time required by an
algorithm to complete its task successfully.
The complexity of an algorithm depends on the number of input that the algorithm is expected to
work on.
A good algorithm is expected to take a small memory space and the shortest time possible.
In cases where both small memory space and the shortest time cannot be achieved concurrently, a
compromise should be reached that enables an algorithm to perform optimally.
Space complexity: Determines how much space is required by an algorithm
Time complexity: Determines how much time is required to run the algorithm
Determines the maximum space or time required by an algorithm to complete a given task.
The worst-case running time of an algorithm is an upper bound on the running time for any input.
Determines the minimum space or time required by an algorithm to complete a given task.
Determines the average space or time required by an algorithm to complete a given task.
The average-case running time of an algorithm is an estimate of the running time for an "average"
input.
5
Computation of average-case running time entails knowing all possible input sequences, the
probability distribution of occurrence of these sequences, and the running times for the individual
sequences. Often it is assumed that all inputs of a given size are equally likely.
Amortized complexity
Here the time required to perform a sequence of (related) operations is averaged over all the
operations performed.
Amortized analysis can be used to show that the average cost of an operation is small, if one
averages over a sequence of operations, even though a simple operation might be expensive.
Amortized analysis guarantees the average performance of each operation in the worst case.
• Using functions
• Experimental studies
• Counting primitive operations
• Asymptotic analysis
Using functions
Some of the most important functions that are used to analyze algorithms are:
• Polynomial functions
• Logarithmic functions
• Exponential functions
Polynomial Functions
2 3 d
f(n) = a + a n + a n + a n + … + a n , where a ,a ,…,a are constants, called the coefficients of the
0 1 2 3 d 0 1 d
polynomial, and a ≠ 0. Integer d, which indicates the highest power in the polynomial, is called the
d
degree of the polynomial.
2 3 2
• f(n) = 2 + 5n + n • f(n) = 1 + n • f(n) = 1 • f(n) = n • f(n) = n .
Running times that are polynomials with degree, d, are generally better than polynomial running times
with large degree.
6
The Constant Function
The constant function is defined as: f(n) = c, for some fixed constant c, such as c = 5, c = 27, or c
10
=2 .
That is, for any argument n, the constant function f(n) assigns the value c. In other words, it
doesn't matter what the value of n is; f (n) will always be equal to the constant value c.
An integer constant function is g(n) = 1. Any other constant function, f(n) = c, can be written as
a constant c times g(n).
A linear function is defined as: f(n)= n. That is, given an input value n, the linear function f
assigns the value n itself.
This function arises in algorithm analysis any time we have to do a single basic operation for each
of n elements.
For example, comparing a number x to each element of an array of size n will require n
comparisons.
The linear function also represents the best running time we can hope to achieve for any
algorithm that processes a collection of n objects that are not already in the computer's memory,
since reading in the n objects itself requires n operations.
2
Quadratic function is defined as: f(n) = n .
That is, given an input value n, the function f assigns the product of n with itself (in other words,
"n squared").
The main reason why the quadratic function appears in the analysis of algorithms is that
there are many algorithms that have nested loops, where the inner loop performs a linear
number of operations and the outer loop is performed a linear number of times.
2
Thus, in such cases, the algorithm performs n · n = n operations.
7
The Cubic Function and Other Polynomials
3
The cubic function is defined as: f(n) = n , which assigns to an input value n the product of n
with itself three times.
This function appears less frequently in the context of algorithm analysis than the constant, linear,
and quadratic functions.
The logarithm function is defined as f(n) = log n, for some constant b > 1.
b
Since computers store integers in binary, the most common base for the logarithm function in
computer science is 2.
The n-log-n function is defined as: f(n) = nlogn, that is, the function that assigns to an input n the
value of n times the logarithm base-two of n.
This function grows a little faster than the linear function and a lot slower than the quadratic
function.
Experimental studies
If an algorithm has been implemented, you can study its running time by executing it on various
test inputs and recording the actual time spent in each execution.
Such measurements can be taken in an accurate manner by using system calls that are built into
the language or operating system (for example, by using the System.current Time Millis ()
method or calling the run-time environment with profiling enabled).
Such tests assign a specific running time to a specific input size, but you should determine the
general dependence of running time on the size of the input.
8
In order to determine this dependence, you should perform several experiments on many
different test inputs of various sizes.
Then you can visualize the results of such experiments by plotting the performance of each run
of the algorithm as a point with x-coordinate equal to the input size, n, and y-coordinate equal to
the running time, t. (as shown in the figure below).
From this visualization and the data that supports it, you can perform a statistical analysis that
seeks to fit the best function of the input size to the experimental data.
To be meaningful, this analysis requires that you choose good sample inputs and test enough of
them to be able to make sound statistical claims about the algorithm's running time.
Primitive Operations
Because of the limitations of experimental analysis, you may wish to analyze a particular algorithm
without performing experiments on its running time, by performing an analysis directly on the high-level
pseudo-code instead. You can define a set of primitive operations such as the following:
Asymptotic analysis
Asymptotic analysis enables data structures and algorithms to be analyzed using a mathematical
notation for functions that disregards constant factors.
You characterize the running times of algorithms by using functions that map the size of the
input, n, to values that correspond to the main factor that determines the growth rate in terms of n.
This approach allows you to focus attention on the primary "big-picture" aspects in a running
time function. In addition, the same approach lets you characterize space usage for data structures
and algorithms, where you define space usage to be the total number of memory cells used.
Asymptotic analysis
Asymptotic analysis enables data structures and algorithms to be analyzed using a mathematical
notation for functions that disregards constant factors.
9
You characterize the running times of algorithms by using functions that map the size of the
input, n, to values that correspond to the main factor that determines the growth rate in terms of n.
This approach allows you to focus attention on the primary "big-picture" aspects in a running
time function. In addition, the same approach lets you characterize space usage for data structures
and algorithms, where you define space usage to be the total number of memory cells used.
Asymptotic notation
It is a way to describe the characteristics of a function in the limit.
It describes the rate of growth of functions.
Focus on what’s important by abstracting away low-order terms and constant factors.
It is a way to compare “sizes” of functions in terms of :
O≈ ≤ : Big-Oh
Ω≈ ≥ : Big-Omega
Θ ≈ = : Big-Theta
o ≈ < : Small-oh
ω ≈ > : Small-Omega
Definition:
Let f(n) and g(n) be functions mapping nonnegative integers to real numbers.
We say that f(n) is O(g(n)) if there is a real constant c > 0 and an integer constant n ≥ 1 such that
0
10
g(n) is an asymptotic upper bound for f(n)
Justification:
By the big-Oh definition, you need to find a real constant c > 0 and an integer constant n ≥ 1
0
such that 8n − 2 ≤ cn for every integer n ≥ n .
0
It is easy to see that a possible choice is c = 8 and n = 1. This is one of infinitely many choices
0
available because any real number greater than or equal to 8 will work for c, and any integer
greater than or equal to 1 will work for n
0
The big-Oh notation allows you to say that a function f(n) is "less than or equal to" another
function g(n) up to a constant factor and in the asymptotic sense as n grows toward infinity.
This ability comes from the fact that the definition uses "≤" to compare f(n) to a g(n) times a
constant, c, for the asymptotic cases when n≥n .
0
The big-Oh notation allows you to ignore constant factors and lower order terms and focus on the main
components of a function that affect its growth.
4 3 2 4
Example 2: 5n + 3n + 2n + 4n + 1 is O(n ).
4 3 2 4 4
Justification: Note that 5n + 3n + 2n + 4n+ 1 ≤ (5 + 3 + 2 + 4+ 1)n = cn , for c = 15, when n ≥n0=1.
2 d
Justification: Note that, for n ≥ 1, you have 1 ≤ n ≤ n ≤ … ≤ n ; hence,
2 d d
a + a n + a n + … + a n ≤ (a + a + a + … + a )n
0 1 2 d 0 1 2 d
d
Therefore, you can show f(n) is O(n ) by defining c = a + a +… + a and n = 1.
0 1 d 0
Thus, the highest-degree term in a polynomial is the term that determines the asymptotic growth rate of
that polynomial.
2 2
Example 4: 5n + 3nlog n+ 2n+ 5 is O(n ).
11
2 2
Justification: 5n + 3nlogn + 2n + 5 ≤ (5 + 3 + 2+5)n =cn2 , for c= 15, when n≥ n = 2 (note that n log n
0
is zero for n = 1).
3 3
Example 5: 20n + 10n log n + 5 is O(n ).
3 3
Justification: 20n + 10n log n + 5 ≤ 35n , for n ≥ 2.
Justification: 3 log n + 2 ≤ 5 log n, for n ≥ 2. Note that log n is zero for n = 1. That is why n ≥ n = 2 is
0
used in this case.
Justification: 2n + 100 log n ≤ 102 n, for n ≥ n = 2; hence, you can take c = 102 in this case.
0
Big-Omega
12
Just as the big-Oh notation provides an asymptotic way of saying that a function is "less than or equal to"
another function, the following notations provide an asymptotic way of saying that a function grows at a
rate that is "greater than or equal to" that of another.
Let f(n) and g(n) be functions mapping nonnegative integers to real numbers. We say that f(n) is Ω(g(n))
(pronounced "f(n) is big-Omega of g(n)") if g(n) is O(f(n)) , that is, there is a real constant c > 0 and an
integer constant n ≥ 1 such that
0
This definition allows us to say asymptotically that one function is greater than or equal to another, up to
a constant factor.
Big-Theta
Big-Theta is a notation that allows us to say that two functions grow at the same rate, up to constant
factors. We say that f(n) is Θ (g(n)) (pronounced "f(n) is big-Theta of g(n)") if f(n) is O(g(n)) and f(n) is
Ω(g(n)) , that is, there are real constants c ′ > 0 and c ′′ > 0, and an integer constant n ≥ 1 such that
0
13
c ′g(n) ≤ f(n) ≤ c ′′g(n), for n ≥ n .
0
14
Recurrences and ways of solving the
Recursion is a particularly powerful kind of reduction, which can be described loosely as follows:
If the given instance of the problem is small or simple enough, just solve it.
Otherwise, reduce the problem to one or more simpler instances of the same problem.
Recursion is generally expressed in terms of recurrences. When an algorithm calls itself, it can often be
described by its running time using a recurrence equation which describes the overall running time of a
problem of size n in terms of the running time on smaller inputs.
We substitute the guessed solution for the function when applying the inductive hypothesis to smaller
values. Hence the name “substitution method”. This method is powerful, but we must be able to guess the
form of the answer in order to apply it.
T (n) = 2T(n/2) + bnlogn. (assuming the base case T(n) = b for n < 2)
This looks very similar to the recurrence equation for the merge sort routine, so we might make the
following as our first guess:
First guess: T (n) < cnlogn. for some constant c > 0. We can certainly choose c large enough to make this
true for the base case, so consider the case when n > 2. If we assume our first guesses an inductive
hypothesis that is true for input sizes smaller than n, then we have:
15
T (n) = 2T(n/2) + bnlogn
≤ 2(c(n/2) log(n/2)) + bnlogn
≤ cn(logn – log2) + bnlogn
≤ cnlogn – cn + bnlogn.
But there is no way that we can make this last line less than or equal to cnlogn for n ≥ 2. Thus, this first
guess was not sufficient. Let us therefore try:
Better guess: T (n) ≤ cnlog2n. for some constant c > 0. We can again choose c large enough to make this
true for the base case, so consider the case when n ≥ 2. If we assume this guess as an inductive
hypothesis that is true for input sizes smaller than n, then we have inductive hypothesis that is true for
input sizes smaller than n, then we have:
Provided c ≥ b. Thus, we have shown that T (n) is indeed O(nlog2n) in this case.
Just because one inductive hypothesis for T(n) does not work, that does not necessarily imply that
another one proportional to this one will not work.
Example: Consider the following recurrence equation (assuming the base case T(n) = b for n < 2):
This recurrence is the running time for the bottom-up heap construction. Which is O(n).
if we try to prove this fact with the most straightforward inductive hypothesis, we will run into
some difficulties. In particular, consider the following:
We can choose c large enough to make this true for the base case, so consider the case when
n ≥ 2.
If we assume this guess as an inductive hypothesis that is true of input sizes smaller than n, then
we have:
T(n) = 2T(n/2) + logn
≤ 2(c (n/2)) + logn
= cn + logn
But there is no way that we can make this last line less than or equal to cn for n > 2.
16
Thus, this first guess was not sufficient, even though T(n) is indeed O(n). Still, we can show this
fact is true by using:
We can again choose c large enough to make this true for the base case;
we can show that it is true any time n < 8. So consider the case when n ≥ 8. If we assume this
guess as an inductive hypothesis that is true for input sizes smaller than n, then we have:
Provided c ≥ 3 and n ≥ 8. Thus, we have shown that T(n) is indeed O(n) in this case.
Example:
Solution:
Use the substitution method, guess that the solution is T(n) = O(nlog n).
We need to prove T(n) <= cnlogn for some c and all n > no.
From the recurrence equation, it can be inferred that T(n) > = n. So, T (n) = Ω(n)
Example:
Solve the recurrence relation T(n) = 3T(n/3 + 5) + n/2. Use the substitution method, guess that the
solution is T(n) = O(nlogn)
Solution:
We need to prove T(n) < = cnlogn for some c and all n > no.
17
T(n) <= 3 * c(n/3 + 5) * (log (n/3 + 5)) + n/2
If we take n0 as 15, then for all n > n0, n/3+5 <= 2n/3, so
Similarly, by guessing T(n) >= c1nlogn for some c1 and all n > n0 and substituting in the recurrence, we
get.
Note that the general form of this equation shifts to the base case,
T(n) = b,
where n = 2i , that is, when i = logn, which implies:
T(n) = bn + bnlogn.
In other words, T(n) is O(nlogn).
18
In a general application of the iterative substitution technique,
we hope that we can determine a general pattern for T(n) and
that we can also figure out when the general form of T(n) shifts to the base case.
2 , n=1
Solution: We first start by labeling the main part of the relation as Step 1:
T(n) = 2.T(n/2) + 7
Step 2: Figure out what T(n/2) is; everywhere you see n, replace it with n/2
T(n/2) = 2.T(n/22) + 7
Now substitute this back into the last T(n) definition (last line from step 1):
T(n) = 2.(2.T(n/22) + 7) + 7
=22.T(n/22) + 3.7
Step 3: let's expand the recursive call, this time, it's T(n/22) :
T(n/22) = 2.T(n/23) +7
Now substitute this back into the last T(n) definition (last line from step 2):
T(n) = 22.(2.T(n/23) + 7) + 7
=23.T(n/23) + 7.7
From this, first, we notice that the power of 2 will always match i, current. Second, we notice that the
multiples of 7 match the relation 2i – 1 : 1, 3, 7, 15. So, we can write a general solution describing the
state of T(n).
19
The number of steps to be taken will not be indefinitely.
it would stop when n has been cut in half so many times that it is effectively 1.
Then, the original definition of the recurrence relation would give us the terminating condition
T(1) = 1 and restrict size n = 2i
When,
1 = n/2i
2i = n
log22i = log2n
i.log22 = log2n
i = log2n
Now we can substitute this "last value for i" back into our general Step 4 equation:
Example:
Imagine that you have a recursive program whose run time is described by the following recurrence
relation:
1 , n=1
Solve the relation with iterated substitution and use your solution to determine a tight big-oh bound.
Solution:
Now substitute this back into the last T(n) definition (last line from step 1):
20
Step 3: figure out what T(n/22) is:
Now substitute this back into the last T (n) definition (last time from step 2):
The parameter to the recursive call in the last line will equal 1 when i = log2 n (use the same analysis as
the previous example). In which case T (1) = 1 by the original definition of the recurrence relation.
Now we can substitute this back into a our general Step 4 equation:
Example:
Write a recurrence relation and solve the recurrence relation for the following fragment of program:
Write T(n) in terms of T for fractions of n, for example, T(n/k) or T(n - k).
return (a [start]);
21
max2 = findmax (a, mid+1, end); // recursive call: T(n/2) + 1
return (max1);
else
return (max2);
Solution:
2 , n=1
i-1
= 2iT(n/2i) + 8 ( ∑ ) 2k)
k=0
22
If n = 2i, then i = logn so, T(n/2i) = T(1) = 2
Just like the iterative substitution method, this technique uses repeated substitution to solve a
recurrence equation,
but it differs from the iterative substitution method in that, rather than being an algebraic
approach, it is a visual approach.
In using the recursion tree method, we draw a tree R where each node represents a different
substitution of the recurrence equation.
Thus, each node in R has a value of the argument n of the function T(n) associated with it.
In addition, we associate an overhead with each node v in R, defined as the value of the non-
recursive part of the recurrence equation for v.
the overhead corresponds to the running time needed to merge the subproblem solutions coming
from the children of v.
The recurrence equation is then solved by summing the overheads associated with all the nodes
of R.
This is commonly done by first summing values across the levels of R and
then summing up these partial sums for all the levels of R.
b if n < 3
T(n) = 3T(n/3) + bn if n ≥ 3
23
each internal node v has three children and
has a size and an overhead associated with it,
o which corresponds to the time needed to merge the sub-problem solutions
produced by v’s children.
bn
bn
bn
This method does not require mathematical sophistication in order to be used effectively.
It is the method for solving divide-and-conquer recurrence equations that is quite general
and does not require explicit use of induction to apply correctly.
The master method is a “cook-book” method for determining the asymptotic
characterization of a wide variety of recurrence equations.
It is used for recurrence equations of the form:
c if n < d
24
Where d > 1 is an integer constant, a > 0, c > 0, and b > 1 are real constants, and f(n) is a
function that is positive for n ≥ d.
The master method for solving such recurrence equations involves simply writing down
the answer based on whether one of the three cases applies.
Each case is distinguished by comparing f(n) to the special function nlogba
1. If there is a small constant є > 0, such that f (n) is O(nlogba-є) , then T(n) is Θ nlogba
2. If there is a constant K ≥ 0, such that f (n) is Θ(nlogba logk n), then T(n) is Θ(nlogba logk+1
n)
3. If there are small constant є > 0 and σ < 1, such that f (n) is Ω(nlogba+є) and
af(n/b) < σf(n), for n ≥ d, then T(n) is Θ(f(n)).
Case 1 characterizes the situation where f(n) is polynomial smaller than the special function,
nlogba
Case 2 characterizes the situation when f(n) is asymptotically close to the special function, and
Case 3 characterizes the situation when f(n) is polynomially larger than the special function.
The usage of the master method is illustrate with a few examples (with each taking the
assumption that
Example: Consider the recurrence T(n) = 2T(n/2) + nlogn this case, nlogba = nlog2 = n.
Thus, we are in case 2, with k = 1, for f(n) is Θ(nlogn). This means that T(n) is Θ(n log2n)
by the master method.
Thus, we are in Case 3, for f (n) is Ω(n0+є ), for є = 1, and af(n/b) = n/3 = (1/3)f(n).
25
Design and analysis of Divide and Conquer Algorithms
Algorithm DandC(P)
{
if small(P)
then return S(P)
else
{ divide P into smaller instances P1, P2, ....., Pk
Apply DandC to each sub problem
Return combine(DandC(P1) + DandC(P2) +....... + DandC(Pk))
}
}
26
The dividing process ends when we have split the sub-sequences down to a single
item.
A sequence of length one is trivially sorted.
2. Conquer: Sort each subsequence (by calling MergeSort recursively on each iteration).
3. Combine: Merge the two sorted sub-sequences into a single sorted list.
The key operation where all the work is done is in the combine stage, which
merges together two sorted lists into a single sorted list.
It turns out that the merging process is quite easy to implement.
Input: 7 5 2 4 1 6 3 0 output: 0 1 2 3 4 5 6 7
7 5 2 4 1 6 3 0 5 7 2 4 1 6 0 3
7 5 2 4 1 6 3 0 7 5 2 4 1 6 3 0
Merge sort
We’ll assume that the procedure that merges two sorted list is available, but its
implementation will be done later.
Because the algorithm is called recursively on sublists, in addition to passing in the array
itself, we will pass in two indices, which indicate the first and last indices of the subarray
that are to be sorted.
The call MergeSort(A, p, r) will sort the sub-array A[ p..r ] and return the sorted result in
the same subarray.
27
The overview of the algorithm
If r = p, then this means that there is only one element to sort, and we may return
immediately.
Otherwise (if p < r) there are at least two elements, and we will invoke the divide-and-
conquer.
We find the index q, midway between p and r, namely q = ( p + r ) / 2 (rounded down to
the nearest integer).
Then we split the array into subarrays A[ p..q ] and A[ q + 1...r ] .
Call MergeSort recursively to sort each subarray.
Finally, we invoke a procedure called Merge which merges these two subarrays into a
single sorted array.
Merging
Merging is used to describe the procedure called Merge that merges two sorted lists.
Merge(A, p, q, r) assumes that the left subarray, A[ p..q ] , and the right subarray, A [ q +
1 ..r ] , have already been sorted.
We merge these two subarrays by copying the elements to a temporary working array
called B.
For convenience, we will assume that the array B has the same index range as A, that is,
B[ p..r ]
28
We have two indices i and j, that point to the current elements of each subarray.
We move the smaller element into the next position of B (indicated by index k) and then
increment the corresponding index (either i or j).
When we run out of elements in one array, then we just copy the rest of the other array
into B.
Finally, we copy the entire contents of B back into A.
29
Now, how do we describe the running time of the entire MergeSort algorithm?
We will do this through the use of a recurrence, that is, a function that is defined
recursively in terms of itself.
To avoid circularity, the recurrence for a given value of n is defined in terms of values
that are strictly smaller than n.
Finally, a recurrence has some basis values (e.g. for n = 1), which are defined explicitly.
Let’s see how to apply this to MergeSort.
Let T( n ) denote the worst case running time of MergeSort on an array of length
n.
For concreteness we could count whatever we like: number of lines of
pseudocode, number of comparisons, number of array accesses, since these will
only differ by a constant factor.
Since all of the real work is done in the Merge procedure, we will count the total
time spent in the Merge procedure.
First observe that if we call MergeSort with a list containing a single element,
then the running time is a constant. Since we are ignoring constant factors, we can
just write
T( n ) =1 .
When we call MergeSort with a list of length n >1, e.g. Merge(A, p, r), where r −
p +1 = n, the algorithm first computes q = ( p + r ) / 2 .
The subarray A[ p..q ] , which contains q − p + 1 elements.
You can verify that is of size n/ 2.
Thus the remaining subarray A[ q +1...r ] has n/ 2 elements in it.
How long does it take to sort the left subarray?
We do not know this, but because n/2< n for n >1 , we can express this as T(n/ 2).
Similarly, we can express the time that it takes to sort the right subarray as
T(n/ 2).
Finally, to merge both sorted lists takes n time.
In conclusion we have
T(n) =1 if n = 1,
=2T(n/ 2) + n , otherwise.
Solving the above recurrence we can see that merge sort has a time complexity of Θ (n log n).
QUICKSORT
Worst-case running time: O(n2).
Expected running time: O(nlogn).
Sorts in place.
Description of quicksort
Quicksort is based on the three-step process of divide-and-conquer.
To sort the subarrayA[p . . r ]:
30
Divide: Partition A[p . . r ], into two (possibly empty) subarrays A[p . . q − 1] and
A[q + 1 . . r ], such that each element in the first subarray A[p . . q − 1] is ≤ A[q]
and A[q] is ≤ each element in the second subarray A[q + 1 . . r ].
Conquer: Sort the two subarrays by recursive calls to QUICKSORT.
Combine: No work is needed to combine the subarrays, because they are sorted
in place.
Perform the divide step by a procedure PARTITION, which returns the index q that
marks the position separating the subarrays.
QUICKSORT(A, p, r)
if p < r
then q ← PARTITION(A, p, r )
QUICKSORT(A, p, q − 1)
QUICKSORT(A, q + 1, r)
Partitioning
Partitions subarray A [p . . . r] by the following procedure:
PARTITION(A, p, r)
x ← A[r ]
i ← p –1
for j ← p to r –1
do
if A[ j ] ≤ x
then
i←i+1
exchangeA[i ] ↔ A[ j ]
exchangeA[i + 1] ↔ A[r ]
return i + 1
PARTITION always selects the last element A[r] in the subarray A[p . . r ] as the pivot
the element around which to partition.
As the procedure executes, the array is partitioned into four regions, some of which may
be empty:
i p, j r
8 1 6 4 0 3 9 5
31
i p j r
8 1 6 4 0 3 9 5
P,i j r
1 8 6 4 0 3 9 5
P,i j r
1 8 6 4 0 3 9 5
p i j r
1
r 4 6 8 0 3 9 5
A[r] : pivot
A[j..r-1] : not yet examined
A[r+1 ..j-1] : known to be > pivot
A[p .. i] : known to be <= pivot
p i j r
r1 4 0 8 6 3 9 5
p i j r
1 4 0 3 6 8 9 5
p i r
1 4 0 3 6 8 9 5
p i r
1 4 0 3 5 8 9 6
The index j disappear because it is no longer needed once the fore loop is exited
Performance of Quicksort
The running time of Quicksort depends on the partitioning of the subarrays:
If the subarrays are balanced, then Quicksort can run as fast as mergesort.
If they are unbalanced, then Quicksort can run as slowly as insertion sort.
32
Worst case
Occurs when the subarrays are completely unbalanced.
Have 0 elements in one subarray and n − 1 elements in the other subarray.
Get the recurrence
T(n) = T(n − 1) + T (0) + Θ (n)
= T(n − 1) + Θ (n)
= O(n2) .
Same running time as insertion sort.
the worst-case running time occurs when Quicksort takes a sorted array as input, but
insertion sort runs in O(n) time in this case.
Best case
Occurs when the subarrays are completely balanced every time.
Each subarray has ≤ n/2 elements.
Get the recurrence
T(n) = 2T (n/2) + Θ (n) = O(n lgn).
Balanced partitioning
QuickSort’s average running time is much closer to the best case than to the worst case.
Imagine that PARTITION always produces a 9-to-1 split.
Get the recurrence
T(n) ≤ T(9n/10) + T(n/10) + T(n) = O(n lgn).
Intuition: look at the recursion tree.
It’s like the one for T(n) = T(n/3) + T(2n/3) + O(n).
Except that here the constants are different; we get log10 n full levels and log10/9 n
levels that are nonempty.
As long as it’s a constant, the base of the log doesn’t matter in asymptotic notation.
Any split of constant proportionality will yield a recursion tree of depth O(logn).
33
Heap data structure
The (binary) heap data structure is an array object that we can view as a nearly complete
binary tree.
Each node of the tree corresponds to an element of the array.
The tree is completely filled on all levels except possibly the lowest, which is filled from
the left up to a point.
The root of the tree is A[1], and given the index i of a node, we can easily compute the
indices of its parent, left child, and right child:
1
16
2 3
14 10
4 5 6 7
8 7 9 3
8 9 10
2 4 1 1 2 3 4 5 6 7 8 9 10
16 14 10 8 7 9 3 2 4 1
(a)
(b)
34
Thus, the largest element in a max-heap is stored at the root, and the subtree rooted at a
node contains values no larger than that contained at the node itself.
Min-heap
The min-heap property is that for every node i other than the root,
A[PARENT(i)<=A[i],
The smallest element in a min-heap is at the root.
The height of a node in a heap is the number of edges on the longest simple downward
path from the node to a leaf and
The height of the heap is the height of its root.
Height of a heap of n elements which is based on a complete binary tree is O(log n).
MAX-HEAPIFY(A, i)
1. l LEFT(i)
2. r RIGHT(i)
3. if A[l] > A[i]
4. largest l
5. if A[r] > A[largest]
6. Largest r
7. if largest != i
8. Then exchange A[i] A[largest]
9. MAX-HEAPIFY(A, largest)
At each step, the largest of the elements A[i], A[LEFT(i)], and A[RIGHT(i)] is
determined, and its index is stored in largest.
If A[i] is largest, then the subtree rooted at node i is already a max-heap and the
procedure terminates.
Otherwise, one of the two children has the largest element, and A[i ] is swapped with
A[largest], which causes node i and its children to satisfy the max-heap property.
The node indexed by largest, however, now has the original value A[i], and thus the
subtree rooted at largest might violate the max-heap property.
Consequently, we call MAX-HEAPIFY recursively on that subtree
35
1
16 1
2, i 3 16
4 10 2 3
4 14 10
5 6 7
14 4. i
7 9 3 5 6 7
4
7 9 3
8 9 10
2 8 1
8 9 10
2 8 1
(a)
(b
1
)
16
2 3
14 10
4
5 6 7
8
7 9 3
8 9,i 10
2 4 1
(c)
36
Building a heap
Build-Max-Heap(A)
1. for i[n/2] to 1
2. do MAX-HEAPIFY(A,i)
4 1 3 2 16 9 10 14 8 7
1 1
4 4
2 3 2 3
1 3 1 3
4 4, i
5, i 6 7 5 7
6
2 2
16 9 10 16 10
9
8 9 10 8 9 10
14 8 7 14 8 7
(a) (b)
1
1
4
4
2 3,i
2, i 3
1 3
1 10
4
5 6 7 4
5 6 7
14
16 9 10 14
16 9 3
8 9 10 8 9 10
2 8 7 2 8 7
(c)
(d)
37
1
1, i
16
4
2 3
2 3
14 10
16 10
4
4 5 6 7
5 6 7 8
14 7 9 3
7 9 3
8 9 10
8 9 10
2 4 1
2 8 1
(f)
(e)
We can derive a tighter bound by observing that the time for MAX-HEAPIFY to run at a
node varies with the height of the node in the tree, and the heights of most nodes are
small.
Our tighter analysis relies on the properties that an n-element heap has height [log n] and
at most [n/2h+1] nodes of any height h.
The total cost of BUILD-MAX-HEAP as being bounded is T(n)=O(n)
HEAPSORT(A)
1. BUILD MAX-HEAP(A)
2. for i=n to 2
3. exchange A[1] with A[i]
4. MAX-HEAPIFY(A,1)
16 14
14 10 8 10
8 4
7 9 3 7 9 3
i
2 4 1 2 1 16
(a) (b)
38
9
10
8 3
8 9
4
7 1 2
4
7 1 3
i
i 10 14 16
2 14 16
(d)
(c)
8
7
7 3
4 3
i
4 i
2 2 9 1
2 8 9
10 14 16
10 14 16
(e)
(f)
4 3
2 3 2 1
i i
1
7 8 9 4 7 9
8
10 14 16 10 14 16
(g) (h)
39
2
i
1 3
4 7 8 9
10 14 16
(i)
1 2 3 4 7 8 9 10 14 16
The HEAPSORT procedure takes time O(n log n), since the call to BUILD-MAX-
HEAPtakes time O(n) and each of the n - 1 calls to MAX-HEAPIFYtakes time O(log n).
Binary Search
If we have ‘n’ records which have been ordered by keys so that x1 < x2 < … < xn.
When we are given a element ‘x’, binary search is used to find the corresponding element
from the list. In case ‘x’ is present, we have to determine a value ‘j’ such that a[j] = x
(successful search). If ‘x’ is not in the list then j is to set to zero (unsuccessful search).
In Binary search we jump into the middle of the file, where we find key a[mid], and
compare ‘x’ with a[mid]. If x = a[mid] then the desired record has been found.
If x < a[mid] then ‘x’ must be in that portion of the file that precedes a[mid], if there at
all.
Similarly, if a[mid] > x, then further search is only necessary in that past of the file which
follows a[mid].
If we use recursive procedure of finding the middle key a[mid] of the un-searched portion
of a file, then every un-successful comparison of ‘x’ with a[mid] will eliminate roughly
half the un-searched portion from consideration.
Since the array size is roughly halved often each comparison between ‘x’ and a[mid], and
since an array of length ‘n’ can be halved only about log2n times before reaching a trivial
length, the worst case complexity of Binary search is about log2n
40
Algorithm Algorithm
BINSRCH(a, n, x)
low and high are integer variables such that each time through the loop either ‘x’ is
found or low is increased by at least one or high is decreased by at least one.
Thus we have two sequences of integers approaching each other and eventually low will
become greater than high causing termination in a finite number of steps if ‘x’ is not
present.
Index 1 2 3 4 5 6 7 8 9
Elements -15 -6 0 7 9 23 54 82 101
41
2. Searching for x = 82
Number of comparisons = 3
3. Searching for x = 42
Number of comparisons = 3
Continuing in this manner the number of element comparisons needed to find each of
nine elements is:
Index 1 2 3 4 5 6 7 8 9
Elements -15 -6 0 7 9 23 54 82 101
Comparisons 3 2 3 4 1 3 2 3 4
42
If x < a[1], a[1] < x < a[2], a[2] < x < a[3], a[5] < x < a[6], a[6] < x < a[7]
or
a[7] < x < a[8] the algorithm requires 3 element comparisons to determine that ‘x’ is not
present.
For all of the remaining possibilities BINSRCH requires 4 element comparisons.
Thus the average number of element comparisons for an unsuccessful search is:
(3 + 3 + 3 + 4 + 4 + 3 + 3 + 3 + 4 + 4) / 10 = 34/10 = 3.4
The time complexity for a successful search is O(log n) and for an unsuccessful
search is Θ(log n).
Successful searches un-successful searches
Θ(1), Θ(logn), Θ(logn) Θ(logn)
Best average worst best, average and worst
Selection sorting
the sublist of items already sorted, which is built up from left to right at the front
(left) of the list,
and the sublist of items remaining to be sorted that occupy the rest of the list.
Initially, the sorted sublist is empty and the unsorted sublist is the entire input list.
The algorithm proceeds by finding the smallest (or largest, depending on sorting order)
element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted
element (putting it in sorted order), and moving the sublist boundaries one element to the
right.
43
Selection Sort
23 78 45 8 32 56 Original List
8 78 45 23 32 56 After pass 1
8 23 45 78 32 56 After pass 2
8 23 32 78 45 56 After pass 3
8 23 32 45 78 56 After pass 4
8 23 32 45 56 78 After pass 5
Implementation
/* a[0] to a[n-1] is the array to sort */
int i,j;
int n;
44
for (i = j+1; i < n; i++) {
/* if this element is less, then it is the new minimum */
if (a[i] < a[iMin]) {
/* found new minimum; remember its index */
iMin = i;
}
}
if(iMin != j)
{
swap(a[j], a[iMin]);
}
}
Analysis
Selection sort is not difficult to analyze compared to other sorting algorithms since none
of the loops depend on the data in the array.
Selecting the lowest element requires scanning all n elements (this takes n − 1
comparisons) and then swapping it into the first position.
Finding the next lowest element requires scanning the remaining n − 1 elements and so
on, for (n − 1) + (n − 2) + ... + 2 + 1 = n(n - 1) / 2 ∈ Θ(n2) comparisons (using arithmetic
progression).
Each of these scans requires one swap for n − 1 elements (the final element is already in
place).
Insertion Sort
23 78 45 8 32 56 Original List
23 78 45 8 32 56 After pass 1
23 45 78 8 32 56 After pass 2
8 23 45 78 32 56 After pass 3
8 23 32 45 78 56 After pass 4
8 23 32 45 56 78 After pass 5
45
Algorithm for insertion sort
Insertion sort iterates, consuming one input element each repetition, and growing a sorted
output list.
Each iteration, insertion sort removes one element from the input data, finds the location
it belongs within the sorted list, and inserts it there.
It repeats until no input elements remain.
Sorting is typically done in-place, by iterating up the array, growing the sorted list behind
it.
At each array-position, it checks the value there against the largest value in the sorted list
(which happens to be next to it, in the previous array-position checked).
If larger, it leaves the element in place and moves to the next.
If smaller, it finds the correct position within the sorted list, shifts all the larger values up
to make a space, and inserts into that correct position.
The most common variant of insertion sort, which operates on arrays, can be described as
follows:
1. Suppose there exists a function called Insert designed to insert a value into a sorted
sequence at the beginning of an array.
It operates by beginning at the end of the sequence and shifting each element one
place to the right until a suitable position is found for the new element.
The function has the side effect of overwriting the value stored immediately after
the sorted sequence in the array.
begin at the left-most element of the array and invoke Insert to insert each
element encountered into its correct position.
The ordered sequence into which the element is inserted is stored at the beginning
of the array in the set of indices already examined.
Each insertion overwrites a single value: the value being inserted.
46
Pseudocode of the complete algorithm follows, where the arrays are zero-based:
for i ← 1 to length(A)
j←i
while j > 0 and A[j-1] > A[j]
swap A[j] and A[j-1]
j←j-1
end while
end for
The outer loop runs over all the elements except the first one, because the single-element
prefix A[0:1] is trivially sorted, so the invariant that the first i+1 entries are sorted is true
from the start.
The inner loop moves element A[i] to its correct place so that after the loop, the first i+2
elements are sorted.
for i = 1 to length(A)
x = A[i]
j=i-1
while j >= 0 and A[j] > x
A[j+1] = A[j]
j=j-1
end while
A[j+1] = x
end for
The new inner loop shifts elements to the right to clear a spot for x = A[i].
The idea is that to sort an array of n elements, A[0..n-1], one first orders the subarray
A[0..n-2] and then inserts the last element (with index n-1).
47
j = n-1
while j >= 0 and A[j] > x
A[j+1] = A[j]
j = j-1
end while
A[j+1] = x
end if
end function
In this case insertion sort has a linear running time (i.e., O(n)).
During each iteration, the first remaining element of the input is only compared
with the right-most element of the sorted subsection of the array.
The set of all worst case inputs consists of all arrays where each element is the
smallest or second-smallest of the elements before it.
In these cases every iteration of the inner loop will scan and shift the entire sorted
subsection of the array before inserting the next element.
This gives insertion sort a quadratic running time (i.e., O(n2)).
The average case is also quadratic, which makes insertion sort impractical for sorting
large arrays.
However, insertion sort is one of the fastest algorithms for sorting very small arrays, even
faster than quicksort;
good quicksort implementations use insertion sort for arrays smaller than a certain
threshold, also when arising as subproblems;
the exact threshold must be determined experimentally and depends on the
machine, but is commonly around ten.
48
A brute-force algorithm to find the divisors of a natural number n would enumerate all
integers from 1 to n, and check whether each of them divides n without remainder.
The brute-force method for finding an item in a table — namely, check all entries of the
latter, sequentially — is called linear search.
While a brute-force search is simple to implement, and will always find a solution if it
exists, its cost is proportional to the number of candidate solutions – which in many
practical problems tends to grow very quickly as the size of the problem increases.
Therefore, brute-force search is typically used when the problem size is limited, or when
there are problem-specific heuristics that can be used to reduce the set of candidate
solutions to a manageable size.
The method is also used when the simplicity of implementation is more important than
speed.
This is the case, for example, in critical applications where any errors in the algorithm
would have very serious consequences; or when using a computer to prove a
mathematical theorem.
The simplest of heuristic search algorithms, the pure heuristic search, expands nodes in
order of their heuristic values h(n).
It maintains a closed list of those nodes that have already been expanded, and a open list
of those nodes that have been generated but not yet been expanded.
The algorithm begins with just the initial state on the open list.
At each cycle, a node on the open list with the minimum h(n) value is expanded,
generating all of its children and is placed on the closed list.
The heuristic function is applied to the children, and they are placed on the open list in
order of their heuristic values. The algorithm continues until a goal state is chosen for
expansion.
In a graph with cycles, multiple paths will be found to the same node, and the first path
found may not be the shortest.
When a shorter path is found to an open node, the shorter path is saved and the longer
one is discarded.
When a shorter path to a closed node is found, the node is included to open and the
shorter path is associated with it.
The main drawback of pure heuristic search is that since it ignores the cost of the path so
far to node n, it does not find optimal solutions.
Breadth-first search, uniform-cost search, and pure heuristic search are all special cases of a
more general algorithm called best-first search.
In each cycle of a best-first search, the node that is best according to some cost function
is chosen for expansion.
49
These best-first search algorithms differ only in their cost functions the depth of node n
for breadth-first search, g(n) for uniform-cost search h(n) for pure heuristic search.
A* Algorithm
where g(n) is the cost of the path from the initial state to node n and h(n) is the
heuristic estimate or the cost or a path from node n to a goal.
Thus, f(n) estimates the lowest total cost of any solution path going through node n.
At each point a node with lowest f value is chosen for expansion.
Ties among nodes of equal f value should be broken in favor of nodes with lower h
values.
The algorithm terminates when a goal is chosen for expansion.
A* algorithm guides an optimal path to a goal if the heuristic function h(n) is admissible,
meaning it never overestimates actual cost.
Greedy algorithm
50
Control abstraction
for i:=1 to n do
x := select (a);
return solution;
Procedure Greedy describes the essential way that a greedy based algorithm will look,
once a particular problem is chosen and the functions select, feasible and union are
properly implemented.
The function select selects an input from ‘a’, removes it and assigns its value to ‘x’.
Feasible is a Boolean valued function, which determines if ‘x’ can be included into the
solution vector.
The function Union combines ‘x’ with solution and updates the objective function.
Knapsack problem
51
The problem is stated as:
maximize ∑ pixi
i =1
Algorithm
If the objects are already been sorted into non-increasing order of p[i] / w[i] then the algorithm
given below obtains solutions corresponding to this strategy.
U := m;
for i := 1 to n do
52
Running time
Example:
1. First, we try to fill the knapsack by selecting the objects in some order:
X1 X2 X3 ∑wixi ∑pixi
1/2 1/3 1/4 18 x 1/2 + 15 x 1/3 + 10 x ¼ = 16.5 25 x 1/2 + 24 x 1/3 + 15 x 1/4 = 24.25
2. Select the object with the maximum profit first (p = 25). So, x1 = 1 and profit earned is
25. Now, only 2 units of space is left, select the object with next largest profit (p = 24).
So, x2 = 2/15
X1 X2 X3 ∑wixi ∑pixi
1 2/15 0 18 x 1 + 15 x 2/15 = 20 25 x 1 + 24 x 2/15 = 28.2
X1 X2 X3 ∑wixi ∑pixi
0 2/3 1 15 x 2/3 + 10 x 1 = 20 24 x 2/3 + 15 x 1 = 31
53
Sort the objects in order of the non-increasing order of the ratio pi / xi.
Select the object with the maximum pi / xi ratio, so, x2 = 1 and profit earned is 24.
Now, only 5 units of space is left, select the object with next largest pi / xi ratio,
so x3 = ½ and the profit earned is 7.5.
X1 X2 X3 ∑wixi ∑pixi
0 1 1/2 15 x 1 + 10 x 1/2 = 20 24 x 1 + 15 x 1/2 = 31.5
This solution is the optimal solution.
Huffman Codes
Using a standard coding scheme, for 58 characters using 3 bits for each character, the file
requires 174 bits to represent. This is shown in table below.
Representing by a binary tree, the binary code for the alphabets are as follows:
58 1
0
18
40
1
0 1 0
1
25 17
15
0 1 1 0
0 1 0
a e sp nl
i s t
10 15 13 1
12 3 4
54
The representation of each character can be found by starting at the root and recording
the path.
Use a 0 to indicate the left branch and a 1 to indicate the right branch.
If the character ci is at depth di and occurs fi times, the cost of the code is
equal to ∑difi
With this representation the total number of bits is 3x10 + 3x15 + 3x12 + 3x3 + 3x4 +
3x13 + 3x1 = 174
Huffman's algorithm
We maintain a forest of trees. The weight of a tree is equal to the sum of the frequencies of its
leaves.
Given a number of characters, select the two trees T1 and T2, of smallest weight, and form a new
tree with sub-trees T1 and T2.
Repeating the process we will get an optimal Huffman coding tree.
The initial forest with the weight of each tree is as follows:
10 15 12 3 4 13 1
a e i s t sp nl
The two trees with the lowest weight are merged together, creating the forest, the Huffman
algorithm after the first merge with new root T1 is as follows:
The total weight of the new tree is the sum of the weights of the old trees.
10 15 12 4 13 4
a e i t sp T1
3 1
s nl
55
Again select the two trees of smallest weight.
This happens to be T1 and t, which are merged into a new tree with root T2 and weight 8
10 15 12 13 8
a e i sp T2
4 4
T1 t
3 1
s nl
In next step, merge T2 and a creating T3, with weight 10+8=18. The result of this operation in
15 12 13 18
e i sp T3
10
8 a
T2
4 4
T1 t
3 1
s nl
After third merge, the two trees of lowest weight are the single node trees representing i and the
blank space.
These trees merged into the new tree with root T4.
15 25 18
e T4 T3
13 10
12 8 a
i sp
T2
4 4
T1 t
3 1
s nl
56
The fifth step is to merge the trees with roots e and T3. The results of this step is
25 33
T5
T4
18 15
12 13 e
T3
i sp
10
8
a
T2
4 4
T1 t
3 1
s nl
Finally, the optimal tree is obtained by merging the two remaining trees. The optimal trees with root
T6 is:
58
T6
25
33 T4
T5
15 12 13
18
e i sp
T3
10
8
a
T2
4 4
T1 t
3 1
s nl
57
Bits representation of Huffman code:
58
T6
1
0
T4
T5
0 1
0 1
e i sp
T3
1
0
a
T2
0
1
T1 t
0 1
s nl
Determine the number of bits required to represent each character in Huffman coding, and the total
number of bits to represent the entire file
The number of bits required to represent each character in Huffman coding, and the total number
of bits to represent the entire file are represented in the table below:
Character Code Frequency Total bits per character
a 001 10 30
e 01 15 30
i 10 12 24
s 00000 3 15
t 0001 4 16
spaces 11 13 26
New line 00001 1 5
Total 146
58
Graph algorithms
Basic Definitions:
Graph G is a pair (V, E), where V is a finite set (set of vertices) and E is a finite set of
pairs from V (set of edges). We will often denote n := |V|, m := |E|.
Graph G can be directed, if E consists of ordered pairs, or undirected, if E consists of
unordered pairs. If (u, v) ϵ E, then vertices u, and v are adjacent.
We can assign weight function to the edges: wG(e) is a weight of edge e ϵ E.
The graph which has such function assigned is called weighted.
Degree of a vertex v is the number of vertices u for which (u, v) ϵ E (denote deg(v)). The
number of incoming edges to a vertex v is called in–degree of the vertex (denote
indeg(v)).
The number of outgoing edges from a vertex is called out-degree (denote outdeg(v)).
Representation of Graphs:
1, if(vi, vj) ϵ E,
a i, j =
0, otherwise
The matrix is symmetric in case of undirected graph, while it may be asymmetric if the
graph is directed.
We may consider various modifications.
For example for weighted graphs, we may have
a i, j =
default, otherwise
Where default is some sensible value based on the meaning of the weight function
(for example, if weight function represents length, then default can be ∞, meaning value
larger than any other value).
59
Adjacency List: An array Adj [1 . . . . . . . n] of pointers where for 1 ≤ v ≤ n, Adj [v]
points to a linked list containing the vertices which are adjacent to v (i.e. the vertices that
can be reached from v by a single edge).
If the edges have weights then these weights may also be stored in the linked list
elements.
1 2 3 1 1 2 3
1 1 1 1
2 0 0 1 2 3
3 0 1 0
3
2
Adjacency matrix
Adjacency list
A path is a sequence of vertices (v1, v2, . . . . . . , vk), where for all i, (vi, vi+1) ϵ E.
A path is simple if all vertices in the path are distinct.
A (simple) cycle is a sequence of vertices (v1, v2, . . . . . . , vk, vk+1 = v1), where for all i,
(vi, vi+1) ϵ E and all vertices in the cycle are distinct except pair v1, vk+1.
3. If we add any edge into T, then the new graph will contain a cycle.
60
Minimum Spanning Trees (MST):
A spanning tree for a connected graph is a tree whose vertex set is the same as the vertex
set of the given graph, and whose edge set is a subset of the edge set of the given graph.
i.e., any connected graph will have a spanning tree.
Weight of a spanning tree w(T) is the sum of weights of all edges in T.
The Minimum spanning tree (MST) is a spanning tree with the smallest possible weight.
G:
2
2
:
:
4
: 5 3
G: 3
: 6 : :
:
1 1
: :
The minimum spanning tree from a weighted graph G:
A weighted graph G:
Both algorithms differ in their methodology, but both eventually end up with the MST.
Take a graph with 'n' vertices, keep on adding the shortest (least cost) edge,
while avoiding the creation of cycles, until (n - 1) edges have been added.
Sometimes two or more edges may have the same cost.
The order in which the edges are chosen, in this case, does not matter.
Different MSTs may result, but they will all have the same total cost, which will always be
the minimum cost.
Algorithm:
The algorithm for finding the MST, using the Kruskal’s method is as follows:
62
for i := 1 to n do parent [i] := -1;
// Each vertex is in a different set.
i := 0; mincost := 0.0;
while ((i < n -1) and (heap not empty)) do
{
Delete a minimum cost edge (u, v) from the heap and re-heapify using Adjust;
j := Find (u); k := Find (v);
if (j ≠ k) then
{
i := i + 1;
t [i, 1] := u; t [i, 2] := v;
mincost :=mincost + cost [u, v];
Union (j, k);
}
}
if (i ≠ n-1) then write ("no spanning tree");
else return mincost;
}
Running time:
The number of finds is at most 2e, and the number of unions at most n-1.
Including the initialization time for the trees, this part of the algorithm has a complexity that
is just slightly more than O(n + e).
We can add at most n-1 edges to tree T.
So, the total time for operations on T is O(n).
Summing up the various components of the computing times, we get O(n + eloge) as asymptotic
complexity
63
Example 1: Consider the graph below
10
1 2 50
45 40 3
30 35
25 5
15
4 55
20 6
Cost 10 15 20 25 30 35 40 45 50 55
edge (1, 2) (3, 6) (4, 6) (2,6) (1,4) (3, 5) (2,5) (1, 5) (2, 3) (5, 6)
The edge set T together with the vertices of G define a graph that has up to n connected
components.
Let us represent each component by a set of vertices in it.
These vertex sets are disjoint.
To determine whether the edge (u, v) creates a cycle, we need to check whether u and v are in the
same vertex set.
If so, then a cycle is created.
If not then no cycle is created.
Hence two Finds on the vertex sets suffice.
When an edge is included in T, two components are combined into one and a union is to be
performed on the two sets.
64
Edge Cost Spanning forest Edge set remarks
{1}, {2}, {3}, {4},
1 2 3 4 5 6
{5}, {6},
(1, 2) 10 {1, 2}, {3}, {4}, The vertices 1 and
1 2 3 4 5 6
{5}, {6}, 2 are in different
sets, so the edge
is combined
(3, 6) 15 {1, 2}, {3, 6}, The vertices 3 and
1 2 3 4 5
{4}, {5} 6 are in different
sets, so the edge
6
is combined
(4, 6) 20 {1, 2}, {3, 4, 6}, The vertices 4 and
1 2 5
3 {5} 6 are in different
sets, so the edge
6
4 is combined
(2, 6) 25 {1, 2, 3, 4, 6}, The vertices 2 and
1 2 5
{5} 6 are in different
sets, so the edge
4 3
is combined
6
65
Example 2: Find a minimum spanning tree of the following graph:
2 10
B D E
1
6 3 7 13
5 12
A C G 16 F
9
8 4 11
14 15 17
H I K J
Solution:
2 10
B D E
1
3 7
5 C
A G F
8 4 11
15
H I K J
By using Kruskal's algorithm, a minimum spanning tree of the graph can be found as follows:
Edge 1 is added to T.
Edge 2 is added to T.
Edge 3 is added to T.
Edge 4 is added to T.
Edge 5 is added to T.
Edge 6 is not added as it forms a cycle with edges 2; 1; 5.
Edge 7 is added to T.
Edge 8 is added to T.
Edge 9 is not added as it forms a cycle with edges 4; 5; 1; 2; 3.
Edge 10 is added to T.
Edge 11 is added to T.
66
Edge 12 is not added as it forms a cycle with edges 5; 1; 2; 3.
Edge 13 is not added as it forms a cycle with edges 7; 10.
Edge 14 is not added as it forms a cycle with edges 4; 5; 8.
Edge 15 is added to T.
Edge 16 is not added as it forms a cycle with edges 3; 7.
Edge 17 is not added as it forms a cycle with edges 15; 4; 5; 1; 2; 7:11.
The slight modification of the spanning tree algorithm yields a very simple algorithm for finding an MST.
In the spanning tree algorithm, any vertex not in the tree but connected to it by an edge can be
added.
To find a Minimal cost spanning tree, we must be selective - we must always add a new vertex for
which the cost of the new edge is as small as possible.
This simple modified algorithm of spanning tree is called prim's algorithm for finding a Minimal cost
spanning tree.
Prim's algorithm is an example of a greedy algorithm.
67
// adjacency matrix of an n vertex graph such that cost [i, j] is
// either a positive real number or ∞ if no edge (i, j) exists.
// A minimum spanning tree is computed and stored as a set of
// edges in the array t [1:n-1, 1:2]. (t [i, 1], t [i, 2]) is an edge in
// the minimum-cost spanning tree. The final cost is returned.
{
Let (k, l) be an edge of minimum cost in E;
mincost := cost [k, l];
t [1, 1] := k; t [1, 2] := l;
for i :=1 to n do // Initialize near
if (cost [i, l] < cost [i, k]) then near [i] := l;
else near [i] := k;
near [k] :=near [l] := 0;
for i:=2 to n - 1 do // Find n - 2 additional edges for t.
{
Let j be an index such that near [j] ≠ 0 and
cost [j, near [j]] is minimum;
t [i, 1] := j; t [i, 2] := near [j];
mincost := mincost + cost [j, near [j]];
near [j] := 0
for k:= 1 to n do // Update near[].
if ((near [k] ≠ 0) and (cost [k, near [k]] > cost [k, j]))
then near [k] := j;
}
return mincost;
}
Running time:
We do the same set of operations with dist as in Dijkstra's algorithm (initialize structure, m times decrease
value, n - 1 times select minimum). Therefore, we get O (n2) time when we implement dist with array,
68
O(n + |E| log n) when we implement it with a heap. For each vertex u in the graph we dequeue it and check
all its neighbors in Ѳ(1 + deg(u)) time. Therefore the running time is:
EXAMPLE 1:
Use Prim’s Algorithm to find a minimal spanning tree for the graph shown below starting with the vertex A.
4
B D
1 2
2
3 4
E
4 1
A C G
6 2
2
1
F
0 3 6 ∞ ∞ ∞ ∞
3 0 2 4 ∞ ∞ ∞
6 2 0 1 4 2 ∞
The cost adjacency matrix is ∞ 4 1 0 2 ∞ 4
∞ ∞ 4 2 0 2 1
∞ ∞ 2 ∞ 2 0 1
∞ ∞ ∞ 4 1 1 0
69
The stepwise progress of the prim’s algorithm is as follows:
Step 1:
Vertex A B C D E F G
B 3 ∞ D
Status 0 1 1 1 1 1 1
Dist. 0 3 6 ∞ ∞ ∞ ∞
0 6
∞ E Next * A A A A A A
A C ∞ G
∞ F
Step 2:
Vertex A B C D E F G
B 3 4 D
Status 0 0 1 1 1 1 1
Dist. 0 3 2 4 ∞ ∞ ∞
0 2 E
∞ Next * A B B A A A
A
C ∞ G
∞ F
Step 3:
Vertex A B C D E F G
B 3 1 D
Status 0 0 0 1 1 1 1
Dist. 0 3 2 1 4 2 ∞
0 2 E
4 Next * A B C C C A
A C ∞ G
2 F
Step 4:
Vertex A B C D E F G
B 3 1 D
Status 0 0 0 0 1 1 1
Dist. 0 3 2 1 2 2 4
0 2 E
2 Next * A B C D C D
4
A C
G
2 F
70
Step 5:
Vertex A B C D E F G
B 3 1 D
Status 0 0 0 0 1 0 1
Dist. 0 3 2 1 2 2 1
0 2 E
2 Next * A B C D C E
A C 1 G
2 F
Step 6:
Vertex A B C D E F G
B 3 1 D
Status 0 0 0 0 0 1 0
Dist. 0 3 2 1 2 1 1
0 2 E
2 Next * A B C D G E
A C 1 G
1 F
Step 7:
Vertex A B C D E F G
B 3 1 D
Status 0 0 0 0 0 0 0
Dist. 0 3 2 1 2 1 1
0 2 E
2 Next * A B C D G E
A C 1 G
1 F
71