0% found this document useful (0 votes)
4 views

Unit - i Final

The document provides an overview of algorithms, including their definitions, performance analysis, and complexity measures such as time and space complexity. It discusses asymptotic notations (Big O, Omega, and Theta) for analyzing algorithm efficiency and compares algorithms based on their running times. Additionally, it introduces AVL trees as a specific type of self-balancing binary search tree.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit - i Final

The document provides an overview of algorithms, including their definitions, performance analysis, and complexity measures such as time and space complexity. It discusses asymptotic notations (Big O, Omega, and Theta) for analyzing algorithm efficiency and compares algorithms based on their running times. Additionally, it introduces AVL trees as a specific type of self-balancing binary search tree.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Advanced Data Structures and Algorithm Analysis

INTRODUCTION

Informal Definition:
An Algorithm is any well-defined computational procedure that takes some value or set of
values as Input and produces a set of values or some value as output. Thus algorithm is a sequence
of computational steps that transforms the input into the output.

Formal Definition:
An Algorithm is a finite set of instructions that, if followed, accomplishes a particular task.
In addition, all algorithms should satisfy the following criteria.

1. INPUT : Zero or more quantities are externally supplied.


2. OUTPUT : At least one quantity is produced.
3. DEFINITENESS : Each instruction is clear and unambiguous.
4. FINITENESS : If we trace out the instructions of an algorithm, then for all cases,
the algorithm terminates after a finite number of steps.
5. EFFECTIVENESS : Every instruction must very basic so that it can be carried out, in
principle, by a person using only pencil & paper.

Issues or study of Algorithm:


The study of algorithms includes many important and active areas of research. These are,
 How to devise algorithms : creating an algorithm.
 How to validate algorithms : checking correctness.
 How to analyze algorithms : time and space complexity.
 How to Testing a program : checking for error.

After devising algorithm we need to convert it into program using programming language.
A Program is the expression of an algorithm in a programming language.

-1-
Advanced Data Structures and Algorithm Analysis

Performance Analysis
There are many criteria upon which we can judge an algorithm. For instance,
1. Does it do what we want it to do?
2. Does it work correctly according to the original specifications of the task?
3. Is there documentation that describes how to use it and how it works?
4. Are procedures created in such a way that they perform logical sub-functions?
5. Is the code readable?

There are other criteria for judging algorithms that have a more direct relation to performance.
1. Space Complexity
2. Time Complexity

Space Complexity:
The space complexity of an algorithm is the amount of memory money it needs to run to
compilation.
completion
The Space needed by each of these algorithms is seen to be the sum of the following
components,
1. A fixed part that is independent of the characteristics (eg: number, size) of the inputs and
outputs.
i. This part typically includes the instruction space (ie. Space for the code).
ii. Space for simple variable and fixed-size component variables (also called aggregate).
iii. Space for constants.
2. A variable part that consists of the space needed by component variables whose size is
dependent on the particular problem instance being solved, the space needed by referenced
variables (to the extent that is depends on instance characteristics), and the recursion stack
space.

 The space requirement S( P ) of any algorithm p may therefore be written as,


S( P ) = C + Sp (Instance characteristics)
where „C‟ is a constant.

-2-
Advanced Data Structures and Algorithm Analysis

Example :
Algorithm sum( a, n )
{
s := 0.0;
for i := 1 to n do
s := s+a[ i ];
return s;
}

 The problem instances for this algorithm are characterized by n, the number of elements
to be summed. The space needed by „n‟ is one word, since it is of type integer.
 The space needed by „a‟ is the space needed by variables of type array of floating point
numbers.
 This is atleast „n‟ words, since „a‟ must be large enough to hold the „n‟ elements to be
summed.
 So, we obtain sum(n) ≥ ( n+s3 ) [ n for a[ ], one each for n, i & s ]

Time Complexity:
The time complexity of an algorithm is the amount of computer time it needs to run to
compilation.
The time t( P ) taken by a program P is the sum of the compile time and the run
time(execution time).
The compile time does not depend on the instance characteristics. Also we may assume
that a compiled program will be run several times without recompilation. This rum time is denoted
by tp (instance characteristics).
Running Time:
Number of primitive steps that are executed is known as Running Time. Except for time
of executing a function call, many statements roughly require the same amount of time.
Example:
y=m*x+b
a = 5 / 9 * ( t – 32 )

-3-
Advanced Data Structures and Algorithm Analysis

Program step is loosely defined as a syntactically or semantically meaningful segment of


a program that has an execution time that is independent of the instance characteristics. Step is
machine independent as possible.
The number of steps any problem statement is assigned depends on the kind of statement.
1. Comments count as zero steps.
2. An assignment statement which does not involve any calls to other algorithm is counted
as one step.
3. In an iterative statement such as for, while, and repeat-until statements, we consider the
step counts only for the control part of the statement.
We can determine the no. of steps needed by a program to solve a particular problem
instance in one of two ways.

1. We introduce a variable, count into the program statement with initial value 0. Statement
to increment count by the appropriate amount are introduced into the program.
This is done so that each time a statement in the original program is executes
count is incremented by the step count of that statement.
Example:
Algorithm sum(a,n)
{
s := 0.0;
count := count+1; // for assignment statement
for i :=1 to n do
{
count := count+1; // control part of for loop
s := s+a[ i ];
count := count+1; // assignment statement
}
count := count+1; // last time of for loop
count := count+1; // return statement
return s;
}

If the count is zero to start with, then it will be 2n+3 on termination. So each
invocation of sum executes a total of 2n+3 steps.

-4-
Advanced Data Structures and Algorithm Analysis

2. The second method to determine the step count of an algorithm is to build a table in
which we list the total number of steps contributes by each statement.
 First determine the number of steps per execution (s/e) of the statement and the total
number of times (ie., frequency) each statement is executed.
 By combining these two quantities, the total contribution of all statements, the step
count for the entire algorithm is obtained.

Example:

Statement S/e Frequency Total


1. Algorithm Sum(a,n) 0 - 0
2. { 0 - 0
3. s := 0.0; 1 1 1
4. for i :=1 to n do 1 n+1 n+1
5. s := s+a[ i ]; 1 n n
6. return s; 1 1 1
7. } 0 - 0
Total 2n+3

Performance evaluation can be loosely divided into two major phases.


1. A Priori Estimate / Performance Analysis
2. A Posteriori Testing / Performance Measurement

-5-
Advanced Data Structures and Algorithm Analysis

Asymptotic notations

• What is the goal of analysis of algorithms?


– To compare algorithms mainly in terms of running time but also in terms of other
factors (e.g., memory requirements, programmer's effort etc.)

Types of Analysis:

• Worst case
– Provides an upper bound on running time.
– An absolute guarantee that the algorithm would not run longer, no matter what the
inputs are.
• Best case
– Provides a lower bound on running time.
– Input is the one for which the algorithm runs the fastest.

• Average case
– Provides a prediction about the running time.
– Assumes that the input is random.

Lower Bound Running Time Upper Bound

How do we compare algorithms?


• We need to define a number of objective measures.
1. Compare execution times?
Not good: times are specific to a particular computer.
2. Count the number of statements executed?
Not good: number of statements vary with the programming language as well as the
style of the individual programmer.

-6-
Advanced Data Structures and Algorithm Analysis

Ideal Solution:

• Express running time as a function of the input size n ( i.e., f( n ) ).


• Describes behavior of function in the limit.
• Compare different functions corresponding to running times.
• The notations describe different rate-of-growth relations between the defining function
and the defined set of functions.
• Such an analysis is independent of machine time, programming style, etc.

• Asymptotic refers to study of function f as n approaches infinity.


• Asymptotic Notation defined for functions over the natural numbers.
– Ex: f( n ) = Ɵ( n2 ).
– Describes how f( n ) grows in comparison to n2.
– Ɵ, O, W, o, w

Asymptotic Analysis
• To compare two algorithms with running times f(n) and g(n), we need a rough measure
that characterizes how fast each function grows.
• Compare functions in the limit, that is, asymptotically ( i.e., for large values of n )

-7-
Text:

1. Asymptotic Notations

The main idea of asymptotic analysis is to have a measure of efficiency of algorithms


that doesn’t depend on machine specific constants, and doesn’t require algorithms to
be implemented and time taken by programs to be compared. Asymptotic notations
are mathematical tools to represent time complexity of algorithms for asymptotic
analysis. The following three asymptotic notations are mostly used to represent time
complexity of algorithms.

1.1. The Big O Notation:


The Big O notation defines an upper bound of an algorithm, it bounds
a function only from above. For example, consider the case of
Insertion Sort. It takes linear time in best case and quadratic time in
worst case. We can safely say that the time complexity of Insertion
sort is O(n2). Note that O(n2) also covers linear time.

The Big O notation is useful when we only have upper bound on time
complexity of an algorithm. Many times we easily find an upper bound
by simply looking at the algorithm.

For a given function g(n), we denote by O(g(n)) the set of functions.

O(g(n)) = { f(n): there exist positive constants c and n 0, such that 0 ≤ f(n)≤ c×g(n) for
all n ≥ n0}

Example 7.1:
f(n) = 2n + 3

2n + 3 ≤ 10 n ∀ n ≥ 1
Here, c=10, n0=1, g(n)=n
=> f(n) = O(n)

Also, 2n + 3 ≤ 2 n + 3n
2n + 3 ≤ 5 n ∀ n ≥ 1

And, 2n + 3 ≤ 2n2 + 3n2


2n + 3 ≤ 5n2
=> f(n) = O(n2)

O(1) < O(log n) < O(√ n) < O(n) < O(n log n) < O(n2) < O(n3) < O(2n) < O(3n) < O(nn)
1.2. The Omega (Ω) notation:

Just as Big O notation provides an asymptotic upper bound on


a function, Ω notation provides an asymptotic lower bound.

Ω notation can be useful when we have lower bound on time


complexity of an algorithm. As discussed in the previously, the
best case performance of an algorithm is generally not useful,
the omega notation is the least used notation among all three.

For a given function g(n), we denote by Ω(g(n)) the set of


functions.

Ω (g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ c×g(n) ≤ f(n) for
all n ≥ n0 }.

Let us consider the same insertion sort example here. The time complexity of insertion
sort can be written as Ω(n), but it is not a very useful information about insertion sort,
as we are generally interested in worst case and sometimes in average case.

Example 7.2:
f(n) = 2n + 3

2n + 3 ≥ n ∀ n ≥ 1
Here, c=1, n0=1, g(n)=n
=> f(n) = Ω(n)
Also, f(n) = Ω(log n)
f(n) = Ω(√n)

1.3. The Theta (Θ) notation:

The theta notation bounds a functions from above and below,


so it defines the exact asymptotic behaviour. A simple way to
get theta notation of an expression is to drop low order terms
and ignore leading constants. For example, consider the
following expression.

3n3 + 6n2 + 6000 = Θ(n3)

Dropping lower order terms is always fine because there will always be a n0 after which
Θ(n3) has higher values than Θ(n2) irrespective of the constants involved. For a given
function g(n), we denote Θ(g(n)) as the following set of functions.

Θ(g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1×g(n) ≤
f(n) ≤ c2×g(n) for all n ≥ n0}
The above definition means, if f(n) is theta of g(n), then the value f(n) is always
between c1×g(n) and c2×g(n) for large values of n (n ≥ n0). The definition of theta also
requires that f(n) must be non-negative for values of n greater than n0.

Example 7.4:
f(n) = 2n + 3

1 * n ≤ 2n + 3 ≤ 5n ∀ n ≥ 1
Here, c1=1, c2 = 5, n0=1, g(n)=n
=> f(n) = Θ(n)

Example 7.5:
f(n) = 2n2 + 3n + 4
2n2 + 3n + 4 ≤ 2n2 + 3n2 + 4n2
2n2 + 3n + 4 ≤ 9n2
f(n) = O (n2)

also, 2n2 + 3n + 4 ≥ 1 * n2
f(n) = Ω (n2)

=> 1 * n2 ≤ 2n2 + 3n + 4 ≤ 9n2 ∀ n ≥ 1


Here, c1=1, c2 = 9, n0=1, g(n)= n2
=> f(n) = Θ(n2)

Example 7.6:
f(n) = n2 log n + n
n2 log n ≤ n2 log n + n ≤ 10 n2 log n
Ω (n2 log n) Θ(n2 log n) O(n2 log n)

Example 7.7:
f(n) = n!
=1×2×3×4×…×n
1×1×1×…×1≤1×2×3×4×…×n ≤ n×n×n×…×n
1 ≤ n! ≤ nn
Ω (1) O (nn) ( Here we cannot find the average or tight bound Θ)
Advanced Data Structures and Algorithm Analysis

AVL TREES
An AVL tree defined as a self-balancing Binary Search Tree (BST) where the difference
between heights of left and right subtrees for any node cannot be more than one.

The difference between the heights of the left subtree and the right subtree for any node is known
as the balance factor of the node.

The AVL tree is named after its inventors, Georgy Adelson-Velsky and Evgenii Landis, who
published it in their 1962 paper “An algorithm for the organization of information”.

Example of AVL Trees:

The above tree is AVL because the differences between the heights of left and right subtrees for
every node are less than or equal to 1.

Operations on an AVL Tree:


1. Insertion: After performing insertion operation on AVL tree, the balance factor of each node is
checked.

There are following two cases possible-

Case-01:

After the operation, the tree is balanced. In this case, the AVL tree is considered to be balanced. The
operation is concluded.

Case-02:
After the operation, the tree is imbalanced. In this case, the AVL tree is considered to be imbalanced.
Rotations are then performed to balance the tree.

-8-
Advanced Data Structures and Algorithm Analysis

Cases Of Imbalance And Their Balancing Using Rotation Operations-


Case-01:

-9-
Advanced Data Structures and Algorithm Analysis

2. Deletion: After performing deletion operation on AVL tree, the balance factor of each node is
checked and rebalanced if required.

3. Searching: It is similar to performing a search in BST


Advantages Of AVL Trees
 It is always height balanced.
 Height Never Goes Beyond LogN, where N is the number of nodes.
 It give better search than compared to binary search tree.
 It has self balancing capabilities.

Time Complexities:

- 10 -
Advanced Data Structures and Algorithm Analysis

Construct an AVL Tree with the following elements

21, 26, 30, 9, 4, 14, 28, 18,15,10, 2, 3, 7

- 11 -
Advanced Data Structures and Algorithm Analysis

- 12 -
Advanced Data Structures and Algorithm Analysis

B-TREES
B-trees are self-balancing tree data structures that are commonly used in databases and file
systems. They are optimized for storing and retrieving large amounts of data on disk, where disk
access is much slower than memory access.

Here's a breakdown of the key aspects of B-trees:

Structure:

 Nodes: B-trees consist of nodes, each of which can hold multiple keys and pointers to child
nodes.

 Keys: Keys within a node are stored in sorted order.

 Children: The number of child nodes a node has is always one more than the number of
keys it contains.

 Root: The topmost node of the tree.

 Internal Nodes: Nodes between the root and leaf nodes.

 Leaf Nodes: Nodes at the bottom level of the tree, holding the actual data entries.

Properties:

 Balanced:

All leaf nodes are at the same level, ensuring consistent performance for all operations.

 Minimum Degree (t):

A B-tree is defined by its minimum degree, which specifies the minimum number of keys
each node (except the root) must contain.

 Maximum Keys:

Each node can hold a maximum of 2t-1 keys.

Advantages:

 Efficient Disk Access:

B-trees minimize the number of disk accesses required for search, insertion, and deletion
operations.

- 13 -
Advanced Data Structures and Algorithm Analysis

 High Performance:

Their balanced structure guarantees logarithmic time complexity for these operations, making
them suitable for large datasets.

 Dynamic:

B-trees adapt gracefully to data modifications, making them useful for dynamic environments.

Applications:

 Databases:

B-trees are widely used as the underlying data structure for indexing in relational databases.

 File Systems:

Many file systems use B-trees to manage the directory structure and file location information.

Properties of B Tree
The following are some important properties of a B Tree:

1. Every node has at most m children, where m is the order of the B Tree.
2. A node having K children consists of K-1 keys.
3. Every non-leaf node, excluding the root node, must have at least [m/2] child nodes.
4. The root node must have at least two children if it is not the leaf node.
5. Unlike the other trees, the height of a B Tree increases upwards toward the root node, and
the insertion happens at the leaf node.
6. The Time Complexity of all the operations of a B Tree is O(log?n), where 'n' is the number of
data elements present in the B Tree.

The following is an example of a B Tree of order 4:

- 14 -
Advanced Data Structures and Algorithm Analysis

Let us now learn how the insertion works in a B Tree.

Insertion Operation in a B Tree


Insertion of a data element in a B Tree contains two main events:

1. Searching for the appropriate node where the element will be inserted.
2. Splitting of the node, if required.

The Insertion Operation always follows the Bottom-Up approach.

Step 1: If the Tree is empty, a root node is allocated, and we will insert the key.

Step 2: We will then update the allowed number of keys in the node.

Step 3: We will then search for the appropriate node for the insertion of the element.

Step 4: If the node is filled, we will follow the steps shown below.

Step 4.1: Insert the data elements in ascending order.

Step 4.2: Once the data elements exceed their limit, we will split the node at the median.

Step 4.3: We will then push the median key upwards, making the left keys the left child
nodes and the right keys the right child nodes.

Step 5: If the node is not full, we will follow the below steps.

Step 5.1: We will insert the node in ascending order.

- 15 -
Advanced Data Structures and Algorithm Analysis

Let us understand the steps mentioned above with the illustrations shown below.

Suppose that the following are some data elements that need to be inserted in a B Tree: 7,
8, 9, 10, 11, 16, 21, and 18.

1. Since the maximum degree of a node in the tree is 3; therefore, the maximum number of
keys per node will be 3 - 1 = 2.

2. We will start by inserting data element 7 in the empty tree.

3. We will insert the next data element, i.e., 8, into the tree. Since 8 is greater than 7, it will
be inserted to the right of 7 in the same node.

4. Similarly, we will insert another data element, 9, into the tree on the same to the right
of 8. However, since the maximum number of keys per node can only be 2, the node will
split, pushing the median key 8 upward, making 7 the key of the left child node and 9 the
key of the right child node.

5. We will insert the next data element, i.e., 10, into the tree. Since 10 is greater than 9, it
will be inserted as a key on the right of the node containing 9 as a key.

- 16 -
Advanced Data Structures and Algorithm Analysis

6. We will now insert another data element, 11, into the tree. Since 11 is greater than 10, it
should be inserted to the right of 10. However, as we know, the maximum number of keys
per node cannot be more than 2; therefore, 10 being the median, will be pushed to the root
node right to 8, splitting 9 and 11 into two separate nodes.

7. We will now insert data element 16 into the tree. Since 16 is greater than 11, it will be
inserted as a key on the right of the node consisting of 11 as a key.

8. The next data element that we will insert into the tree is 21. Element 21 should be inserted
to the right of 16; however, it will exceed the maximum number of keys per node limit.
Therefore, a split will occur, pushing the median key 16 upward and splitting the left and
right keys into separate nodes. But this will again violate the maximum number of keys per
node limit; therefore, a split will once again push the median key 10 upward a root node
and make 8 and 11 its children.

- 17 -
Advanced Data Structures and Algorithm Analysis

9. At last, we will insert data element 18 into the tree. Since 18 is greater than 16 but less
than 21, it will be inserted as the left key in the node consisting of 21.

10. Hence the resulted B Tree will be as shown below:

- 18 -
Advanced Data Structures and Algorithm Analysis

Deletion from a B-tree

Deleting an element on a B-tree consists of three main events: searching the node where the key to
be deleted exists, deleting the key and balancing the tree if required.

While deleting a tree, a condition called underflow may occur. Underflow occurs when a node
contains less than the minimum number of keys it should hold.

The terms to be understood before studying deletion operation are:

Inorder Predecessor

The largest key on the left child of a node is called its inorder predecessor.

Inorder Successor

The smallest key on the right child of a node is called its inorder successor.

Deletion Operation

Before going through the steps below, one must know these facts about a B tree of degree m.

A node can have a maximum of m children. (i.e. 3)

A node can contain a maximum of m - 1 keys. (i.e. 2)

A node should have a minimum of ⌈m/2⌉ children. (i.e. 2)

A node (except root node) should contain a minimum of ⌈m/2⌉ - 1 keys. (i.e. 1)
- 19 -
Advanced Data Structures and Algorithm Analysis

There are three main cases for deletion operation in a B tree.

Case I

The key to be deleted lies in the leaf. There are two cases for it.

The deletion of the key does not violate the property of the minimum number of keys a node
should hold.

In the tree below, deleting 32 does not violate the above properties.

- 20 -
Advanced Data Structures and Algorithm Analysis

The deletion of the key violates the property of the minimum number of keys a node should hold.
In this case, we borrow a key from its immediate neighboring sibling node in the order of left to
right.

First, visit the immediate left sibling. If the left sibling node has more than a minimum number of
keys, then borrow a key from this node.

Else, check to borrow from the immediate right sibling node.

In the tree below, deleting 31 results in the above condition. Let us borrow a key from the left
sibling node.

- 21 -
Advanced Data Structures and Algorithm Analysis

If both the immediate sibling nodes already have a minimum number of keys, then merge the node
with either the left sibling node or the right sibling node. This merging is done through the parent
node.

Deleting 30 results in the following case.

- 22 -
Advanced Data Structures and Algorithm Analysis

Case II

If the key to be deleted lies in the internal node, the following cases occur.

The internal node, which is deleted, is replaced by an inorder predecessor if the left child has more
than the minimum number of keys.

- 23 -
Advanced Data Structures and Algorithm Analysis

The internal node, which is deleted, is replaced by an inorder successor if the right child has more
than the minimum number of keys.

If either child has exactly a minimum number of keys then, merge the left and the right children.

After merging if the parent node has less than the minimum number of keys then, look for the
siblings as in Case I.

- 24 -
Advanced Data Structures and Algorithm Analysis

Case III

In this case, the height of the tree shrinks. If the target key lies in an internal node, and the deletion
of the key leads to a fewer number of keys in the node (i.e. less than the minimum required), then
look for the inorder predecessor and the inorder successor. If both the children contain a minimum
number of keys then, borrowing cannot take place. This leads to Case II(3) i.e. merging the
children.

Again, look for the sibling to borrow a key. But, if the sibling also has only a minimum number of
keys then, merge the node with the sibling along with the parent. Arrange the children accordingly
(increasing order).

- 25 -

You might also like