Unit - i Final
Unit - i Final
INTRODUCTION
Informal Definition:
An Algorithm is any well-defined computational procedure that takes some value or set of
values as Input and produces a set of values or some value as output. Thus algorithm is a sequence
of computational steps that transforms the input into the output.
Formal Definition:
An Algorithm is a finite set of instructions that, if followed, accomplishes a particular task.
In addition, all algorithms should satisfy the following criteria.
After devising algorithm we need to convert it into program using programming language.
A Program is the expression of an algorithm in a programming language.
-1-
Advanced Data Structures and Algorithm Analysis
Performance Analysis
There are many criteria upon which we can judge an algorithm. For instance,
1. Does it do what we want it to do?
2. Does it work correctly according to the original specifications of the task?
3. Is there documentation that describes how to use it and how it works?
4. Are procedures created in such a way that they perform logical sub-functions?
5. Is the code readable?
There are other criteria for judging algorithms that have a more direct relation to performance.
1. Space Complexity
2. Time Complexity
Space Complexity:
The space complexity of an algorithm is the amount of memory money it needs to run to
compilation.
completion
The Space needed by each of these algorithms is seen to be the sum of the following
components,
1. A fixed part that is independent of the characteristics (eg: number, size) of the inputs and
outputs.
i. This part typically includes the instruction space (ie. Space for the code).
ii. Space for simple variable and fixed-size component variables (also called aggregate).
iii. Space for constants.
2. A variable part that consists of the space needed by component variables whose size is
dependent on the particular problem instance being solved, the space needed by referenced
variables (to the extent that is depends on instance characteristics), and the recursion stack
space.
-2-
Advanced Data Structures and Algorithm Analysis
Example :
Algorithm sum( a, n )
{
s := 0.0;
for i := 1 to n do
s := s+a[ i ];
return s;
}
The problem instances for this algorithm are characterized by n, the number of elements
to be summed. The space needed by „n‟ is one word, since it is of type integer.
The space needed by „a‟ is the space needed by variables of type array of floating point
numbers.
This is atleast „n‟ words, since „a‟ must be large enough to hold the „n‟ elements to be
summed.
So, we obtain sum(n) ≥ ( n+s3 ) [ n for a[ ], one each for n, i & s ]
Time Complexity:
The time complexity of an algorithm is the amount of computer time it needs to run to
compilation.
The time t( P ) taken by a program P is the sum of the compile time and the run
time(execution time).
The compile time does not depend on the instance characteristics. Also we may assume
that a compiled program will be run several times without recompilation. This rum time is denoted
by tp (instance characteristics).
Running Time:
Number of primitive steps that are executed is known as Running Time. Except for time
of executing a function call, many statements roughly require the same amount of time.
Example:
y=m*x+b
a = 5 / 9 * ( t – 32 )
-3-
Advanced Data Structures and Algorithm Analysis
1. We introduce a variable, count into the program statement with initial value 0. Statement
to increment count by the appropriate amount are introduced into the program.
This is done so that each time a statement in the original program is executes
count is incremented by the step count of that statement.
Example:
Algorithm sum(a,n)
{
s := 0.0;
count := count+1; // for assignment statement
for i :=1 to n do
{
count := count+1; // control part of for loop
s := s+a[ i ];
count := count+1; // assignment statement
}
count := count+1; // last time of for loop
count := count+1; // return statement
return s;
}
If the count is zero to start with, then it will be 2n+3 on termination. So each
invocation of sum executes a total of 2n+3 steps.
-4-
Advanced Data Structures and Algorithm Analysis
2. The second method to determine the step count of an algorithm is to build a table in
which we list the total number of steps contributes by each statement.
First determine the number of steps per execution (s/e) of the statement and the total
number of times (ie., frequency) each statement is executed.
By combining these two quantities, the total contribution of all statements, the step
count for the entire algorithm is obtained.
Example:
-5-
Advanced Data Structures and Algorithm Analysis
Asymptotic notations
Types of Analysis:
• Worst case
– Provides an upper bound on running time.
– An absolute guarantee that the algorithm would not run longer, no matter what the
inputs are.
• Best case
– Provides a lower bound on running time.
– Input is the one for which the algorithm runs the fastest.
• Average case
– Provides a prediction about the running time.
– Assumes that the input is random.
-6-
Advanced Data Structures and Algorithm Analysis
Ideal Solution:
Asymptotic Analysis
• To compare two algorithms with running times f(n) and g(n), we need a rough measure
that characterizes how fast each function grows.
• Compare functions in the limit, that is, asymptotically ( i.e., for large values of n )
-7-
Text:
1. Asymptotic Notations
The Big O notation is useful when we only have upper bound on time
complexity of an algorithm. Many times we easily find an upper bound
by simply looking at the algorithm.
O(g(n)) = { f(n): there exist positive constants c and n 0, such that 0 ≤ f(n)≤ c×g(n) for
all n ≥ n0}
Example 7.1:
f(n) = 2n + 3
2n + 3 ≤ 10 n ∀ n ≥ 1
Here, c=10, n0=1, g(n)=n
=> f(n) = O(n)
Also, 2n + 3 ≤ 2 n + 3n
2n + 3 ≤ 5 n ∀ n ≥ 1
O(1) < O(log n) < O(√ n) < O(n) < O(n log n) < O(n2) < O(n3) < O(2n) < O(3n) < O(nn)
1.2. The Omega (Ω) notation:
Ω (g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ c×g(n) ≤ f(n) for
all n ≥ n0 }.
Let us consider the same insertion sort example here. The time complexity of insertion
sort can be written as Ω(n), but it is not a very useful information about insertion sort,
as we are generally interested in worst case and sometimes in average case.
Example 7.2:
f(n) = 2n + 3
2n + 3 ≥ n ∀ n ≥ 1
Here, c=1, n0=1, g(n)=n
=> f(n) = Ω(n)
Also, f(n) = Ω(log n)
f(n) = Ω(√n)
Dropping lower order terms is always fine because there will always be a n0 after which
Θ(n3) has higher values than Θ(n2) irrespective of the constants involved. For a given
function g(n), we denote Θ(g(n)) as the following set of functions.
Θ(g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1×g(n) ≤
f(n) ≤ c2×g(n) for all n ≥ n0}
The above definition means, if f(n) is theta of g(n), then the value f(n) is always
between c1×g(n) and c2×g(n) for large values of n (n ≥ n0). The definition of theta also
requires that f(n) must be non-negative for values of n greater than n0.
Example 7.4:
f(n) = 2n + 3
1 * n ≤ 2n + 3 ≤ 5n ∀ n ≥ 1
Here, c1=1, c2 = 5, n0=1, g(n)=n
=> f(n) = Θ(n)
Example 7.5:
f(n) = 2n2 + 3n + 4
2n2 + 3n + 4 ≤ 2n2 + 3n2 + 4n2
2n2 + 3n + 4 ≤ 9n2
f(n) = O (n2)
also, 2n2 + 3n + 4 ≥ 1 * n2
f(n) = Ω (n2)
Example 7.6:
f(n) = n2 log n + n
n2 log n ≤ n2 log n + n ≤ 10 n2 log n
Ω (n2 log n) Θ(n2 log n) O(n2 log n)
Example 7.7:
f(n) = n!
=1×2×3×4×…×n
1×1×1×…×1≤1×2×3×4×…×n ≤ n×n×n×…×n
1 ≤ n! ≤ nn
Ω (1) O (nn) ( Here we cannot find the average or tight bound Θ)
Advanced Data Structures and Algorithm Analysis
AVL TREES
An AVL tree defined as a self-balancing Binary Search Tree (BST) where the difference
between heights of left and right subtrees for any node cannot be more than one.
The difference between the heights of the left subtree and the right subtree for any node is known
as the balance factor of the node.
The AVL tree is named after its inventors, Georgy Adelson-Velsky and Evgenii Landis, who
published it in their 1962 paper “An algorithm for the organization of information”.
The above tree is AVL because the differences between the heights of left and right subtrees for
every node are less than or equal to 1.
Case-01:
After the operation, the tree is balanced. In this case, the AVL tree is considered to be balanced. The
operation is concluded.
Case-02:
After the operation, the tree is imbalanced. In this case, the AVL tree is considered to be imbalanced.
Rotations are then performed to balance the tree.
-8-
Advanced Data Structures and Algorithm Analysis
-9-
Advanced Data Structures and Algorithm Analysis
2. Deletion: After performing deletion operation on AVL tree, the balance factor of each node is
checked and rebalanced if required.
Time Complexities:
- 10 -
Advanced Data Structures and Algorithm Analysis
- 11 -
Advanced Data Structures and Algorithm Analysis
- 12 -
Advanced Data Structures and Algorithm Analysis
B-TREES
B-trees are self-balancing tree data structures that are commonly used in databases and file
systems. They are optimized for storing and retrieving large amounts of data on disk, where disk
access is much slower than memory access.
Structure:
Nodes: B-trees consist of nodes, each of which can hold multiple keys and pointers to child
nodes.
Children: The number of child nodes a node has is always one more than the number of
keys it contains.
Leaf Nodes: Nodes at the bottom level of the tree, holding the actual data entries.
Properties:
Balanced:
All leaf nodes are at the same level, ensuring consistent performance for all operations.
A B-tree is defined by its minimum degree, which specifies the minimum number of keys
each node (except the root) must contain.
Maximum Keys:
Advantages:
B-trees minimize the number of disk accesses required for search, insertion, and deletion
operations.
- 13 -
Advanced Data Structures and Algorithm Analysis
High Performance:
Their balanced structure guarantees logarithmic time complexity for these operations, making
them suitable for large datasets.
Dynamic:
B-trees adapt gracefully to data modifications, making them useful for dynamic environments.
Applications:
Databases:
B-trees are widely used as the underlying data structure for indexing in relational databases.
File Systems:
Many file systems use B-trees to manage the directory structure and file location information.
Properties of B Tree
The following are some important properties of a B Tree:
1. Every node has at most m children, where m is the order of the B Tree.
2. A node having K children consists of K-1 keys.
3. Every non-leaf node, excluding the root node, must have at least [m/2] child nodes.
4. The root node must have at least two children if it is not the leaf node.
5. Unlike the other trees, the height of a B Tree increases upwards toward the root node, and
the insertion happens at the leaf node.
6. The Time Complexity of all the operations of a B Tree is O(log?n), where 'n' is the number of
data elements present in the B Tree.
- 14 -
Advanced Data Structures and Algorithm Analysis
1. Searching for the appropriate node where the element will be inserted.
2. Splitting of the node, if required.
Step 1: If the Tree is empty, a root node is allocated, and we will insert the key.
Step 2: We will then update the allowed number of keys in the node.
Step 3: We will then search for the appropriate node for the insertion of the element.
Step 4: If the node is filled, we will follow the steps shown below.
Step 4.2: Once the data elements exceed their limit, we will split the node at the median.
Step 4.3: We will then push the median key upwards, making the left keys the left child
nodes and the right keys the right child nodes.
Step 5: If the node is not full, we will follow the below steps.
- 15 -
Advanced Data Structures and Algorithm Analysis
Let us understand the steps mentioned above with the illustrations shown below.
Suppose that the following are some data elements that need to be inserted in a B Tree: 7,
8, 9, 10, 11, 16, 21, and 18.
1. Since the maximum degree of a node in the tree is 3; therefore, the maximum number of
keys per node will be 3 - 1 = 2.
3. We will insert the next data element, i.e., 8, into the tree. Since 8 is greater than 7, it will
be inserted to the right of 7 in the same node.
4. Similarly, we will insert another data element, 9, into the tree on the same to the right
of 8. However, since the maximum number of keys per node can only be 2, the node will
split, pushing the median key 8 upward, making 7 the key of the left child node and 9 the
key of the right child node.
5. We will insert the next data element, i.e., 10, into the tree. Since 10 is greater than 9, it
will be inserted as a key on the right of the node containing 9 as a key.
- 16 -
Advanced Data Structures and Algorithm Analysis
6. We will now insert another data element, 11, into the tree. Since 11 is greater than 10, it
should be inserted to the right of 10. However, as we know, the maximum number of keys
per node cannot be more than 2; therefore, 10 being the median, will be pushed to the root
node right to 8, splitting 9 and 11 into two separate nodes.
7. We will now insert data element 16 into the tree. Since 16 is greater than 11, it will be
inserted as a key on the right of the node consisting of 11 as a key.
8. The next data element that we will insert into the tree is 21. Element 21 should be inserted
to the right of 16; however, it will exceed the maximum number of keys per node limit.
Therefore, a split will occur, pushing the median key 16 upward and splitting the left and
right keys into separate nodes. But this will again violate the maximum number of keys per
node limit; therefore, a split will once again push the median key 10 upward a root node
and make 8 and 11 its children.
- 17 -
Advanced Data Structures and Algorithm Analysis
9. At last, we will insert data element 18 into the tree. Since 18 is greater than 16 but less
than 21, it will be inserted as the left key in the node consisting of 21.
- 18 -
Advanced Data Structures and Algorithm Analysis
Deleting an element on a B-tree consists of three main events: searching the node where the key to
be deleted exists, deleting the key and balancing the tree if required.
While deleting a tree, a condition called underflow may occur. Underflow occurs when a node
contains less than the minimum number of keys it should hold.
Inorder Predecessor
The largest key on the left child of a node is called its inorder predecessor.
Inorder Successor
The smallest key on the right child of a node is called its inorder successor.
Deletion Operation
Before going through the steps below, one must know these facts about a B tree of degree m.
A node (except root node) should contain a minimum of ⌈m/2⌉ - 1 keys. (i.e. 1)
- 19 -
Advanced Data Structures and Algorithm Analysis
Case I
The key to be deleted lies in the leaf. There are two cases for it.
The deletion of the key does not violate the property of the minimum number of keys a node
should hold.
In the tree below, deleting 32 does not violate the above properties.
- 20 -
Advanced Data Structures and Algorithm Analysis
The deletion of the key violates the property of the minimum number of keys a node should hold.
In this case, we borrow a key from its immediate neighboring sibling node in the order of left to
right.
First, visit the immediate left sibling. If the left sibling node has more than a minimum number of
keys, then borrow a key from this node.
In the tree below, deleting 31 results in the above condition. Let us borrow a key from the left
sibling node.
- 21 -
Advanced Data Structures and Algorithm Analysis
If both the immediate sibling nodes already have a minimum number of keys, then merge the node
with either the left sibling node or the right sibling node. This merging is done through the parent
node.
- 22 -
Advanced Data Structures and Algorithm Analysis
Case II
If the key to be deleted lies in the internal node, the following cases occur.
The internal node, which is deleted, is replaced by an inorder predecessor if the left child has more
than the minimum number of keys.
- 23 -
Advanced Data Structures and Algorithm Analysis
The internal node, which is deleted, is replaced by an inorder successor if the right child has more
than the minimum number of keys.
If either child has exactly a minimum number of keys then, merge the left and the right children.
After merging if the parent node has less than the minimum number of keys then, look for the
siblings as in Case I.
- 24 -
Advanced Data Structures and Algorithm Analysis
Case III
In this case, the height of the tree shrinks. If the target key lies in an internal node, and the deletion
of the key leads to a fewer number of keys in the node (i.e. less than the minimum required), then
look for the inorder predecessor and the inorder successor. If both the children contain a minimum
number of keys then, borrowing cannot take place. This leads to Case II(3) i.e. merging the
children.
Again, look for the sibling to borrow a key. But, if the sibling also has only a minimum number of
keys then, merge the node with the sibling along with the parent. Arrange the children accordingly
(increasing order).
- 25 -