Datastructure Material
Datastructure Material
Abstract Data type (ADT) is a type (or class) for objects whose behaviour
is defined by a set of value and a set of operations.The definition of ADT
only mentions what operations are to be performed but not how these
operations will be implemented. It does not specify how data will be
organized in memory and what algorithms will be used for implementing
the operations. It is called “abstract” because it gives an implementation
independent view. The process of providing only the essentials and
hiding the details is known as abstraction.
The user of data type need not know that data type is implemented, for
example, we have been using int, float, char data types only with the
knowledge with values that can take and operations that can be
performed on them without any idea of how these types are
implemented. So a user only needs to know what a data type can do but
not how it will do it. We can think of ADT as a black box which hides the
inner structure and design of the data type. Now we’ll define three ADTs
namely List ADT, Stack ADT, Queue ADT.
List ADT
A list contains elements of same type arranged in sequential order and
following operations can be performed on the list.
QueueADT
A Queue contains elements of same type arranged in sequential order.
Operations takes place at both ends, insertion is done at end and deletion
is done at front. Following operations can be performed:
enqueue() – Insert an element at the end of the queue.
dequeue() – Remove and return the first element of queue, if the queue
is not empty.
peek() – Return the element of the queue without removing it, if the
queue is not empty.
size() – Return the number of elements in the queue.
isEmpty() – Return true if the queue is empty, otherwise return false.
isFull() – Return true if the queue is full, otherwise return false.
From these definitions, we can clearly see that the definitions do not
specify how these ADTs will be represented and how the operations will
be carried out. There can be different ways to implement an ADT, for
example, the List ADT can be implemented using arrays, or singly linked
list or doubly linked list. Similarly, stack ADT and Queue ADT can be
implemented using arrays or linked lists.
Asymptotic Analysis:
Asymptotic analysis of an algorithm refers to defining the mathematical
boundation/framing of its run-time performance. Using asymptotic
analysis, we can very well conclude the best case, average case, and
worst case scenario of an algorithm.
Asymptotic Notations
Following are the commonly used asymptotic notations to calculate the
running time complexity of an algorithm.
Ο Notation
Ω Notation
θ Notation
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an
algorithm's running time. It measures the worst case time complexity or
the longest amount of time an algorithm can possibly take to complete.
Omega Notation, Ω
The notation Ω(n) is the formal way to express the lower bound of an
algorithm's running time. It measures the best case time complexity or
the best amount of time an algorithm can possibly take to complete.
For example, for a function f(n)
Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }
Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and
the upper bound of an algorithm's running time. It is represented as
follows −
θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for all n > n0. }
constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
n log n − Ο(n log n)
quadratic − Ο(n2)
cubic − Ο(n3)
polynomial − nΟ(1)
exponential − 2Ο(n)
Interpolation Search:
There are cases where the location of target data may be known in
advance. For example, in case of a telephone directory, if we want to
search the telephone number of Morphius. Here, linear search and even
binary search will seem slow as we can directly jump to memory space
where the names start from 'M' are stored.
Positioning in Binary Search
In binary search, if the desired data is not found then the rest of the list
is divided in two parts, lower and higher. The search is carried out in
either of them.
Even when the data is sorted, binary search does not take advantage to
probe the position of the desired data.
If a match occurs, then the index of the item is returned. To split the list
into two parts, we use the following method −
mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])
where −
A = list
Lo = Lowest index of the list
Hi = Highest index of the list
A[n] = Value stored at index n in the list
If the middle item is greater than the item, then the probe position is
again calculated in the sub-array to the right of the middle item.
Otherwise, the item is searched in the subarray to the left of the middle
item. This process continues on the sub-array as well until the size of
subarray reduces to zero.
Runtime complexity of interpolation search algorithm is Ο(log (log
n)) as compared to Ο(log n) of BST in favorable situations.
Algorithm
As it is an improvisation of the existing BST algorithm, we are
mentioning the steps to search the 'target' data value index, using
position probing −
Step 1 − Start searching data from middle of the list.
Step 2 − If it is a match, return the index of the item, and exit.
Step 3 − If it is not a match, probe position.
Step 4 − Divide the list using probing formula and find the new midle.
Step 5 − If data is greater than middle, search in higher sub-list.
Step 6 − If data is smaller than middle, search in lower sub-list.
Step 7 − Repeat until match.
Pseudocode
A → Array list
N → Size of A
X → Target Value
Procedure Interpolation_Search()
Set Lo → 0
Set Mid → -1
Set Hi → N-1
end if
if A[Mid] = X
EXIT: Success, Target found at Mid
else
if A[Mid] < X
Set Lo to Mid+1
Set Hi to Mid-1
end if
end if
End While
End Procedure
Implementation in C
Live Demo
#include<stdio.h>
#define MAX 10
int list[MAX] = { 10, 14, 19, 26, 27, 31, 33, 35, 42, 44 };
int lo = 0;
int hi = MAX - 1;
int comparisons = 1;
int index = -1;
comparisons++;
printf("mid = %d\n",mid);
// data found
if(list[mid] == data) {
index = mid;
break;
} else {
lo = mid + 1;
} else {
hi = mid - 1;
int main() {
//find location of 33
if(location != -1)
else
return 0;
If we compile and run the above program, it will produce the following
result −
Output
Comparison 1
lo : 0, list[0] = 10
hi : 9, list[9] = 44
mid = 6
Time Complexity: The worst case time complexity of search and insert operations is
O(h) where h is height of Binary Search Tree. In worst case, we may have to travel from
root to the deepest leaf node. The height of a skewed tree may become n and the time
complexity of search and insert operation may become O(n).
Dictionary ADT „ Store elements so that they can be quickly located using keys „ Typically, useful
additional information in addition to key „ Examples – bank accounts with SSN as key – student
records with UMID or uniqname as key
Dictionary ADTs Types „: Log File ,Ordered Dictionary,Hash Table and Skip List
Stores items by key :– element pairs „ (k,e) „ k and e may be of any type „ k and e may be the same „
In general, items with the same key may be stored in same Dictionary.
Unordered vs Ordered:
Ordered „ Relative order determined by comparator between keys „ Total Order relation defined on
keys unordered „ No order relation is assumed on keys „ Only equality testing between keys.
Dictionary ADT:
Log File Defn: implementation of Dictionary ADT using a sequence to store items in arbitrary order
Obviously, Unordered Dictionary Useful implementation for case with many insertions and few
searches Implemented as array (vector) or linked list.
Applications of Log Files: „ Database systems ,„ File systems and Security audit trails.
Dict. ADT: Ordered Dictionary:
Defn: implementation of Dictionary ADT in which usual operations may be used and there exists an
order relationship between keys Useful implementation for few insertions / removals, but many
searches.
Priority Queue:
Overview
Priority Queue is more specialized data structure than Queue. Like
ordinary queue, priority queue has same method but with a major
difference. In Priority queue items are ordered by key value so that item
with the lowest value of key is at front and item with the highest value of
key is at rear or vice versa. So we're assigned priority to item based on
its key value. Lower the value, higher the priority. Following are the
principal methods of a Priority Queue.
Basic Operations
insert / enqueue − add an item to the rear of the queue.
int i = 0;
if(!isFull()){
if(itemCount == 0){
intArray[itemCount++] = data;
}else{
// start from the right end of the queue
intArray[i+1] = intArray[i];
}else{
break;
intArray[i+1] = data;
itemCount++;
return intArray[--itemCount];
Demo Program
PriorityQueueDemo.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#define MAX 6
int intArray[MAX];
int itemCount = 0;
int peek(){
}
bool isEmpty(){
return itemCount == 0;
bool isFull(){
int size(){
return itemCount;
int i = 0;
if(!isFull()){
if(itemCount == 0){
intArray[itemCount++] = data;
}else{
intArray[i+1] = intArray[i];
}else{
break;
}
}
intArray[i+1] = data;
itemCount++;
int removeData(){
return intArray[--itemCount];
int main() {
/* insert 5 items */
insert(3);
insert(5);
insert(9);
insert(1);
insert(12);
// ------------------
// index : 0 1 2 3 4
// ------------------
// queue : 12 9 5 3 1
insert(15);
// ---------------------
// index : 0 1 2 3 4 5
// ---------------------
// queue : 15 12 9 5 3 1
if(isFull()){
printf("Queue is full!\n");
// ---------------------
// index : 0 1 2 3 4
// ---------------------
// queue : 15 12 9 5 3
insert(16);
// ----------------------
// index : 0 1 2 3 4 5
// ----------------------
// queue : 16 15 12 9 5 3
insert(17);
insert(18);
// ----------------------
// index : 0 1 2 3 4 5
// ----------------------
// queue : 16 15 12 9 5 3
printf("----------------------\n");
printf("index : 5 4 3 2 1 0\n");
printf("----------------------\n");
printf("Queue: ");
while(!isEmpty()){
int n = removeData();
printf("%d ",n);
If we compile and run the above program then it would produce following
result −
Queue is full!
Element removed: 1
Element at front: 3
----------------------
index : 5 4 3 2 1 0
----------------------
Queue: 3 5 9 12 15 16
Tower of Hanoi:
Tower of Hanoi, is a mathematical puzzle which consists of three towers
(pegs) and more than one rings is as depicted −
These rings are of different sizes and stacked upon in an ascending
order, i.e. the smaller one sits over the larger one. There are other
variations of the puzzle where the number of disks increase, but the
tower count remains the same.
Rules
The mission is to move all the disks to some another tower without
violating the sequence of arrangement. A few rules to be followed for
Tower of Hanoi are −
Only one disk can be moved among the towers at any given time.
Algorithm
To write an algorithm for Tower of Hanoi, first we need to learn how to
solve this problem with lesser amount of disks, say → 1 or 2. We mark
three towers with name, source, destination and aux (only to help
moving the disks). If we have only one disk, then it can easily be moved
from source to destination peg.
If we have 2 disks −
And finally, we move the smaller disk from aux to destination peg.
Our ultimate aim is to move disk n from source to destination and then
put all other (n1) disks onto it. We can imagine to apply the same in a
recursive way for all given set of disks.
START
IF disk == 1, THEN
ELSE
END IF
END Procedure
STOP
Program:
#include <stdio.h>
#include <stdbool.h>
#define MAX 10
void display(){
int i;
printf("[");
printf("%d ",list[i]);
printf("]\n");
void bubbleSort() {
int temp;
int i,j;
temp = list[j];
list[j] = list[j+1];
list[j+1] = temp;
swapped = true;
} else {
if(!swapped) {
break;
display();
}
int main() {
display();
printf("\n");
bubbleSort();
display();
If we compile and run the above program, it will produce the following
result −
Output
Input Array: [1 8 4 6 0 3 5 2 7 9 ]
Output Array: [0 1 2 3 4 5 6 7 8 9 ]
ADS Array:The array data type is the simplest structured data type. It is such a useful
data type because it gives you, as a programmer, the ability to assign a single name to a
homogeneous collection of instances of one abstract data type and provide integer names
In simple word, we can say that dangling pointer is a pointer that not pointing a valid
object of the appropriate type and it can be the cause of the undefined behavior.
In the image Pointer1, Pointer2 are pointing a valid memory object but Pointer3 is
pointing a memory object that has been already deallocated. So Pointer3 become a
dangling pointer when you will try to access the Pointer3 than you will get the
undefined result or segmentation fault.
Important causes of the dangling pointers in C language
There a lot of cause to arise the dangling pointers but here I am describing some
common cause that creates the dangling pointers.
A local variable’s scope and lifetime belong to their block where it is declared.
Whenever control comes to the block than memory is allocated to the
local variable and freed automatically upon exit from the block.
If a local variable is referred to outside of its lifetime, the behavior is undefined. The
value of a pointer becomes indeterminate when the variable it points to reaches the
end of its lifetime.
In the below code, we have tried to read the value of Data (integer variable) outside of
their block (scope) through the pointer (piData), so the value of piData is
indeterminate.
1 #include <stdio.h>
3 int main(void)
4 {
5 int * piData;
7 { //block
9 piData = &Data;
10 }
11
13
14 return 0;
15 }
In below code, Data has not scope beyond the function. If you try to read the value of
Data after calling the Fun() using the pointer may you will get the correct value (5),
but any functions called thereafter will overwrite the stack storage allocated for Data
with other values and the pointer would no longer work correctly.
So in the below code piData is a dangling pointer that is pointing a memory which is
not available.
1 #include<stdio.h>
2
3 int *Fun()
4 {
8 }
10
11 int main()
12 {
14
15 printf("%d", *piData);
16
17 return 0;
18 }
The algorithms are designed using two approaches that are the top-down
and bottom-up approach. In the top-down approach, the complex module is
divided into submodules. On the other hand, bottom-up approach begins
with elementary modules and then combine them further. The prior purpose
of an algorithm is to operate the data comprised in the data structure. In
other words, an algorithm is used to perform the operations on the data
inside the data structures.
A complicated algorithm is split into small parts called modules, and the
process of splitting is known as modularization. Modularization significantly
reduces the complications of designing an algorithm and make its process
more easier to design and implement. Modular programming is the technique
of designing and writing a program in the form of the functions where each
function is distinct from each other and works independently. The content in
the functions are cohesive in manner, and there exists a low coupling
between the modules.
BASIS FOR
TOP-DOWN APPROACH BOTTOM-UP APPROACH
COMPARISON
larger one.
hiding.
approach. communication.
eliminated.
BASIS FOR
TOP-DOWN APPROACH BOTTOM-UP APPROACH
COMPARISON
approach.
debugging.
Thus, the top-down method begins with abstract design and then
sequentially this design is refined to create more concrete levels until there is
no requirement of additional refinement.
Definition of Bottom-up Approach
Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it.
Insert it in a queue.
Rule 2 − If no adjacent vertex is found, remove the first vertex from the
queue.
We start from
visiting S(starting node), and
mark it as visited.
From A we have D as
unvisited adjacent node. We
mark it as visited and
enqueue it.
At this stage, we are left with no unmarked (unvisited) nodes. But as per
the algorithm we keep on dequeuing in order to get all unvisited nodes.
When the queue gets emptied, the program is over.
Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it.
Push it in a stack.
Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It
will pop up all the vertices from the stack, which do not have adjacent
vertices.)
We choose B, mark it as
visited and put onto the stack.
Here Bdoes not have any
unvisited adjacent node. So,
we pop Bfrom the stack.
As C does not have any unvisited adjacent node so we keep popping the
stack until we find a node that has an unvisited adjacent node. In this
case, there's none and we keep popping until the stack is empty.
Spanning Tree
Removing one edge from the spanning tree will make the graph
disconnected, i.e. the spanning tree is minimally connected.
Adding one edge to the spanning tree will create a circuit or loop, i.e. the
spanning tree is maximally acyclic.
Cluster Analysis
Kruskal's Algorithm
Prim's Algorithm
Next cost is 3, and associated edges are A,C and C,D. We add them
again −
Next cost in the table is 4, and we observe that adding it will create a
circuit in the graph. −
We ignore it. In the process we shall ignore/avoid all edges that create a
circuit.
We observe that edges with cost 5 and 6 also create circuits. We ignore
them and move on.
Now we are left with only one node to be added. Between the two least
cost edges available 7 and 8, we shall add the edge with cost 7.
By adding edge S,A we have included all the nodes of the graph and we
now have minimum cost spanning tree.
Remove all loops and parallel edges from the given graph. In case of
parallel edges, keep the one which has the least cost associated and
remove all others.
Step 2 - Choose any arbitrary node as root
node
In this case, we choose S node as the root node of Prim's spanning tree.
This node is arbitrarily chosen, so any node can be the root node. One
may wonder why any video can be a root node. So the answer is, in the
spanning tree all the nodes of a graph are included and because it is
connected then there must be at least one edge, which will join it to the
rest of the tree.
Now, the tree S-7-A is treated as one node and we check for all edges
going out from it. We select the one which has the lowest cost and
include it in the tree.
After this step, S-7-A-3-C tree is formed. Now we'll again treat it as a
node and will check all the edges again. However, we will choose only
the least cost edge. In this case, C-3-D is the new edge, which is less
than other edges' cost 8, 6, 4, etc.
After adding node D to the spanning tree, we now have two edges going
out of it having the same cost, i.e. D-2-T and D-2-B. Thus, we can add
either one. But the next step will again yield edge 2 as the least cost.
Hence, we are showing a spanning tree with both edges included.
We may find that the output spanning tree of the same graph using two
different algorithms is same.
Step-02:
Find all the edges that connect the tree to new vertices, then find the least weight
edge among those edges and include it in the existing tree.
If including that edge creates a cycle, then reject that edge and look for the next least
weight edge.
Step-03:
Keep repeating step-02 until all the vertices are included and Minimum Spanning Tree
(MST) is obtained.
Time Complexity-
Analysis
The complexity of this algorithm is fully dependent on the
implementation of Extract-Min function. If extract min function is
implemented using linear search, the complexity of this algorithm
is O(V2 + E).
Example
Let us consider vertex 1 and 9 as the start and destination vertex
respectively. Initially, all the vertices except the start vertex are marked
by ∞ and the start vertex is marked by 0.
1 0 0 0 0 0 0 0 0 0
2 ∞ 5 4 4 4 4 4 4 4
3 ∞ 2 2 2 2 2 2 2 2
4 ∞ ∞ ∞ 7 7 7 7 7 7
5 ∞ ∞ ∞ 11 9 9 9 9 9
6 ∞ ∞ ∞ ∞ ∞ 17 17 16 16
7 ∞ ∞ 11 11 11 11 11 11 11
8 ∞ ∞ ∞ ∞ ∞ 16 13 13 13
9 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ 20
Hence, the minimum distance of vertex 9 from vertex 1 is 20. And the
path is
1→ 3→ 7→ 8→ 6→ 9
Example
The following example shows how Bellman-Ford algorithm works step by
step. This graph has a negative edge but does not have any negative
cycle, hence the problem can be solved using this technique.
At the time of initialization, all the vertices except the source are marked
by ∞ and the source is marked by 0.
In the first step, all the vertices which are reachable from the source are
updated by minimum cost. Hence, vertices a and h are updated.
Floyd–Warshall's Algorithm is used to find the shortest paths between between all pairs of
vertices in a graph, where each edge in the graph has a weight which is positive or negative. The
biggest advantage of using this algorithm is that all the shortest distances between any 2 vertices
could be calculated in O(V3), where V is the number of vertices in a graph.
dist[i][k] represents the shortest path that only uses the first K vertices, dist[k][j] represents the
shortest path between the pair k,j. As the shortest path will be a concatenation of the shortest
path from i to k, then from k to j.
Floyd–Warshall's Algorithm
Algorithm-
Create a |V| x |V| matrix // It represents the distance between every pair of vertices
as given
For each cell (i,j) in M do-
if i = = j
M[ i ][ j ] = 0 // For all diagonal elements, value = 0
if (i , j) is an edge in E
M[ i ][ j ] = weight(i,j) // If there exists a direct edge between the vertices, value =
weight of edge
else
M[ i ][ j ] = infinity // If there is no direct edge between the vertices, value = ∞
for k from 1 to |V|
for i from 1 to |V|
for j from 1 to |V|
if M[ i ][ j ] > M[ i ][ k ] + M[ k ][ j ]
M[ i ][ j ] = M[ i ][ k ] + M[ k ][ j ]
Time Complexity-
Floyd-Warshall Algorithm is best suited for dense graphs since its complexity depends
only on the number of vertices in the graph.
For sparse graphs, Johnson’s Algorithm is more suitable.
Problem-
Consider the following directed weighted graph-
Using Floyd-Warshall Algorithm, find the shortest path distance between every pair of
vertices.
Solution-
Step-01:
Remove all the self loops and parallel edges (keeping the edge with lowest weight)
from the graph if any.
In our case, we don’t have any self edge and parallel edge.
Step-02:
Now, write the initial distance matrix representing the distance between every pair of
vertices as mentioned in the given graph in the form of weights.
NOTE
Since, we have total 4 vertices in our given graph, so we will have total 4 matrices of
order 4 x 4 in our solution. (excluding initial distance matrix)
Diagonal elements of each matrix will always be 0.
The last matrix D4 represents the shortest path distance between every pair of vertices.