DS Notes-1
DS Notes-1
LECTURE NOTES
UNIT-I
INTRODUCTION TO ALGORITHMS AND DATA STRUCTURES
Definition: - An algorithm is a Step By Step process to solve a problem, where each step
indicates an intermediate task. Algorithm contains finite number of steps that leads to
the solution of the problem.
Properties /Characteristics of an
Algorithm:- Algorithm has the following
basic properties
● Input-Output:- Algorithm takes ‘0’ or more input and produces the required
output. This is the basic characteristic of an algorithm.
● Finiteness:- An algorithm must terminate in countable number of steps.
● Definiteness: Each step of an algorithm must be stated clearly and unambiguously.
● Effectiveness: Each and every step in an algorithm can be converted in to
programming language statement.
● Generality: Algorithm is generalized one. It works on all set of inputs and
provides the required output. In other words it is not restricted to a single input
value.
Data Structure involves two complementary goals. The first goal is to identify and develop
useful, mathematical entities and operations and to determine what class of problems can be
solved by using these entities and operations. The second goal is to determine representation
for those abstract entities to implement abstract operations on this concrete representation.
Primitive Data structures are directly supported by the language ie; any operation is directly performed
in these data items.
Ex: integer, Character, Real numbers etc.
Non-primitive data types are not defined by the programming language, but are instead created
by the programmer.
Linear data structures organize their data elements in a linear fashion, where data
elements are attached one after the other. Linear data structures are very easy to implement,
since the memory of the computer is also organized in a linear fashion. Some commonly used
linear data structures are arrays, linked lists, stacks and queues.
In nonlinear data structures, data elements are not organized in a sequential fashion.
Data structures like multidimensional arrays, trees, graphs, tables and sets are some examples
of widely used nonlinear data structures.
Operations on the Data Structures:
Following operations can be performed on the data structures:
1. Traversing
2. Searching
3. Inserting
4. Deleting
5. Sorting
6. Merging
1. Traversing- It is used to access each data item exactly once so that it can be processed.
2. Searching- It is used to find out the location of the data item if it exists in the given
collection of data items.
3. Inserting- It is used to add a new data item in the given collection of data items.
4. Deleting- It is used to delete an existing data item from the given collection of data
items. 5. Sorting- It is used to arrange the data items in some order i.e. in ascending or
descending order in case of numerical data and in dictionary order in case of alphanumeric
data.
6. Merging- It is used to combine the data items of two sorted files into single file in the sorted
form.
STACKS AND QUEUES
STACKS
A Stack is linear data structure. A stack is a list of elements in which an element may be inserted
or deleted only at one end, called the top of the stack. Stack principle is LIFO (last in, first out).
Which element inserted last on to the stack that element deleted first from the stack.
As the items can be added or removed only from the top i.e. the last item to be added to a stack
is the first item to be removed.
Operations on stack:
While performing push and pop operations the following test must be conducted on the stack.
a) Stack is empty or not b) stack is full or not
1. Push: Push operation is used to add new elements in to the stack. At the time of
addition first check the stack is full or not. If the stack is full it generates an error message "stack
overflow".
2. Pop: Pop operation is used to delete elements from the stack. At the time of deletion
first check the stack is empty or not. If the stack is empty it generates an error message "stack
underflow".
All insertions and deletions take place at the same end, so the last element added to the
stack will be the first element removed from the stack. When a stack is created, the stack base
remains fixed while the stack top changes as elements are added and removed. The most
accessible element is the top and the least accessible element is the bottom of the stack.
Representation of Stack (or) Implementation of stack:
The stack should be represented in two ways:
1. Stack using array
2. Stack using linked list
Initially top=-1, we can insert an element in to the stack, increment the top value i.e top=top+1.
We can insert an element in to the stack first check the condition is stack is full or not. i.e
top>=size-1. Otherwise add the element in to the stack.
void push() Algorithm: Procedure for push():
{
int x; Step 1: START
if(top >= n-1) Step 2: if top>=size-1 then
{ Write “ Stack is Overflow”
printf("\n\nStack Step 3: Otherwise
Overflow.."); 3.1: read data value
return; ‘x’ 3.2: top=top+1;
} 3.3: stack[top]=x;
else Step 4: END
{
printf("\n\nEnter data: ");
scanf("%d", &x);
stack[top] = x;
top = top + 1;
printf("\n\nData Pushed into
the stack");
}
}
2. Pop(): When an element is taken off from the stack, the operation is performed by
pop(). Below figure shows a stack initially with three elements and shows the deletion of
elements using pop().
We can insert an element from the stack, decrement the top value i.e top=top-1.
We can delete an element from the stack first check the condition is stack is empty or not.
i.e top==-1. Otherwise remove the element from the stack.
3. display(): This operation performed display the elements in the stack. We display the
element in the stack check the condition is stack is empty or not i.e top==-1.Otherwise display
the list of elements in the stack.
void display() Algorithm: procedure pop():
{ Step 1: START
If(top==-1) Step 2: if top==-1 then
{ Write “Stack is Underflow”
Printf(“Stack is Underflow”); Step 3: otherwise
} 3.1: print “Display elements are”
else 3.2: for top to 0
{ Print ‘stack[i]’
printf(“Display elements Step 4: END
are:); for(i=top;i>=0;i--)
printf(“%d”,stack[i]);
}
}
Applications of stack:
1. Stack is used by compilers to check for balancing of parentheses, brackets and braces.
2. Stack is used to evaluate a postfix expression.
3. Stack is used to convert an infix expression into postfix/prefix form.
4. In recursion, all intermediate arguments and return values are stored on the processor’s
stack.
5. During a function call the return address and arguments are pushed onto a stack and on
return they are popped off.
Converting and evaluating Algebraic expressions:
An algebraic expression can be represented using three different notations. They are infix,
postfix and prefix notations:
Infix: It is the form of an arithmetic expression in which we fix (place) the arithmetic operator in
between the two operands.
Example: A + B
Prefix: It is the form of an arithmetic notation in which we fix (place) the arithmetic
operator before (pre) its two operands. The prefix notation is called as polish notation.
Example: + A B
Postfix: It is the form of an arithmetic expression in which we fix (place) the arithmetic operator
after (post) its two operands. The postfix notation is called as suffix notation and is also referred
to reverse polish notation.
Example: A B +
More real-world examples can be seen as queues at the ticket windows and bus-stops and our
college library.
The operations for a queue are analogues to those for a stack; the difference is that the
insertions go at the end of the list, rather than the beginning.
Operations on QUEUE:
A queue is an object or more specifically an abstract data structure (ADT) that allows the
following operations:
● Enqueue or insertion: which inserts an element at the end of the queue.
● Dequeue or deletion: which deletes an element at the start of the queue.
Queue operations work as follows:
1. Two pointers called FRONT and REAR are used to keep track of the first and last
elements in the queue.
2. When initializing the queue, we set the value of FRONT and REAR to 0.
3. On enqueing an element, we increase the value of REAR index and place the new
element in the position pointed to by REAR.
4. On dequeueing an element, we return the value pointed to by FRONT and increase the
FRONT index.
5. Before enqueing, we check if queue is already full.
6. Before dequeuing, we check if queue is already empty.
7. When enqueing the first element, we set the value of FRONT to 1.
8. When dequeing the last element, we reset the values of FRONT and REAR to 0.
Again insert another element 33 to the queue. The status of the queue is:
Now, delete an element. The element deleted is the element at the front of the queue.So the
status of the queue is:
Again, delete an element. The element to be deleted is always pointed to by the FRONT
pointer. So, 22 is deleted. The queue status is as follows:
Now, insert new elements 44 and 55 into the queue. The queue status is:
Next insert another element, say 66 to the queue. We cannot insert 66 to the queue as the rear
crossed the maximum size of the queue (i.e., 5). There will be queue full signal. The queue
status is as follows:
Now it is not possible to insert an element 66 even though there are two vacant positions in the
linear queue. To overcome this problem the elements of the queue are to be shifted towards
the beginning of the queue so that it creates vacant position at the rear end. Then the FRONT
and REAR are to be adjusted properly. The element 66 can be inserted at the rear end. After this
operation, the queue status is as follows:
This difficulty can overcome if we treat queue position with index 0 as a position that comes
after position with index 4 i.e., we treat the queue as a circular queue.
Applications of Queue:
1. It is used to schedule the jobs to be processed by the CPU.
2. When multiple users send print jobs to a printer, each printing job is kept in the printing
queue. Then the printer prints those jobs according to first in first out (FIFO) basis.
3. Breadth first search uses a queue data structure to find an element from a graph.
CIRCULAR QUEUE
A more efficient queue representation is obtained by regarding the array Q[MAX] as circular.
Any number of items could be placed on the queue. This implementation of a queue is called a
circular queue because it uses its storage array as if it were a circle instead of a linear list.
There are two problems associated with linear queue. They are:
● Time consuming: linear time to be spent in shifting the elements to the beginning of the
queue.
● Signaling queue full: even if the queue is having vacant position.
For example, let us consider a linear queue status as follows:
Next insert another element, say 66 to the queue. We cannot insert 66 to the queue as the rear
crossed the maximum size of the queue (i.e., 5). There will be queue full signal. The queue
status is as follows:
This difficulty can be overcome if we treat queue position with index zero as a position that
comes after position with index four then we treat the queue as a circular queue.
In circular queue if we reach the end for inserting elements to it, it is possible to insert new
elements if the slots at the beginning of the circular queue are empty.
Representation of Circular Queue:
Let us consider a circular queue, which can hold maximum (MAX) of six elements. Initially the
queue is empty.
Now, insert 11 to the circular queue. Then circular queue status will be:
Insert new elements 22, 33, 44 and 55 into the circular queue. The circular queue status is:
Now, delete an element. The element deleted is the element at the front of the circular queue.
So, 11 is deleted. The circular queue status is as follows:
Again, delete an element. The element to be deleted is always pointed to by the FRONT pointer.
So, 22 is deleted. The circular queue status is as follows:
Again, insert another element 66 to the circular queue. The status of the circular queue is:
Now, insert new elements 77 and 88 into the circular queue. The circular queue status is:
Now, if we insert an element to the circular queue, as COUNT = MAX we cannot add the
element to circular queue. So, the circular queue is full.
a.enqueue() or insertion():This function is used to insert an element into the circular queue. In
a circular queue, the new element is always inserted at Rear position.
void insertCQ() Algorithm: procedure of insertCQ():
{
int data; Step-1:START
if(count ==MAX) Step-2: if count==MAX then
{ Write “Circular queue is full”
printf("\n Circular Queue is Full"); Step-3:otherwise
} 3.1: read the data element
else 3.2: CQ[rear]=data
{ 3.3 : rear=(rear+1)%MAX
printf("\n Enter data: "); 3.4 :
scanf("%d", &data); count=count+1
CQ[rear] = data; Step-4:STOP
rear = (rear + 1) % MAX;
count ++;
printf("\n Data Inserted in the Circular
Queue ");
}
}
b.dequeue() or deletion():This function is used to delete an element from the circular
queue. In a circular queue, the element is always deleted from front position.
void deleteCQ() Algorithm: procedure of deleteCQ():
{
if(count ==0) Step-1:START
{ Step-2: if count==0 then
printf("\n\nCircular Queue is Empty.."); Write “Circular queue is empty”
} Step-3:otherwise
else 3.1: print the deleted
{ element 3.2:
printf("\n Deleted element from Circular front=(front+1)%MAX
Queue is %d ", CQ[front]); 3.3: count=count-1
front = (front + 1) % MAX; Step-4:STOP
count --;
}
}
c.dispaly():This function is used to display the list of elements in the circular queue.
void displayCQ() Algorithm: procedure of displayCQ():
{
int i, j; Step-1:START
if(count Step-2: if count==0 then
==0) Write “Circular queue is empty”
{ Step-3:otherwise
printf("\n\n\t Circular Queue is Empty "); 3.1: print the list of elements
} 3.2: for i=front to j!=0
else 3.3: print CQ[i]
{ 3.4: i=(i+1)%MAX
printf("\n Elements in Circular Queue are: Step-4:STOP
");
j = count;
for(i = front; j != 0; j--)
{
printf("%d\t",
CQ[i]); i = (i + 1) %
MAX;
}
}
}
Deque:
In the preceding section we saw that a queue in which we insert items at one end and from
which we remove items at the other end. In this section we examine an extension of the queue,
which provides a means to insert and remove items at both ends of the queue. This data
structure is a deque. The word deque is an acronym derived from double-ended queue. Below
figure shows the representation of a deque.
deque provides four operations. Below Figure shows the basic operations on a deque.
• enqueue_front: insert an element at front.
• dequeue_front: delete an element at front.
• enqueue_rear: insert element at rear.
• dequeue_rear: delete element at rear.
A prototype of a priority queue is time sharing system: programs of high priority are processed
first, and programs with the same priority form a standard queue. An efficient implementation
for the Priority Queue is to use heap, which in turn can be used for sorting purpose called heap
sort
Linear data structures are those data structures in which data elements are accessed (read and
written) in sequential fashion (one by one). Ex: Stacks, Queues, Lists, Arrays
Non Linear Data Structures:
Non Linear Data Structures are those in which data elements are not accessed in sequential
fashion.
Ex: trees, graphs
Difference between Linear and Nonlinear Data Structures
Main difference between linear and nonlinear data structures lie in the way they organize data
elements. In linear data structures, data elements are organized sequentially and therefore they
are easy to implement in the computer’s memory. In nonlinear data structures, a data element
can be attached to several other data elements to represent specific relationships that exist
among them. Due to this nonlinear structure, they might be difficult to be implemented in
computer’s linear memory compared to implementing linear data structures. Selecting one data
structure type over the other should be done carefully by considering the relationship among
the data elements that needs to be stored.
LINEAR LIST
A data structure is said to be linear if its elements form a sequence. A linear list is a list that
displays the relationship of adjacency between elements.
A Linear list can be defined as a data object whose instances are of the form (e1, e2, e3…en)
where n is a finite natural number. The ei terms are the elements of the list and n is its length.
The elements may be viewed as atomic as their individual structure is not relevant to the
structure of the list. When n=0, the list is empty. When n>0,e1 is the first element and en the last.
Ie;e1 comes before e2, e2 comes before e3 and so on.
Some examples of the Linear List are
● An alphabetized list of students in a class
● A list of exam scores in non decreasing order
● A list of gold medal winners in the Olympics
● An alphabetized list of members of Congress
The following are the operations that performed on the Linear List
✔ Create a Linear List
This equation states that the ith element of the list is in position i-1 of the array. The below figure
shows a five element list represented in the array element using the mapping of equation.
To completely specify the list we need to know its current length or size. For this purpose we
use variable length. Length is zero when list is empty. Program gives the resulting C++ class
definition. Since the data type of the list element may vary from application to application, we
have defined a template class in which the user specifies the element data type T. the data
members length, MaxSize and element are private members are private members, while the
remaining members are public. Insert and delete have been defined to return a reference to a
linear list.
Insertion and Deletion of a Linear List:
Suppose we want to remove an element ei from the list by moving to its right down by 1.For
example, to remove an element e1=2 from the list,we have to move the elements e2=4,
e3=8,and e4=1,which are to the right of e1, to positions 1,2 and 3 of the array element. The
below figure shows this result. The shaded elements are moved.
To insert an element so that it becomes element I of a list, must move the existing element
ei and all elements to its right one position right and then put the new element into position I of
the array. For example to insert 7 as the second element of the list, we first move elements e2
and e3 to the right by 1 and then put 7 in to second position 2 of the array. The below figure
shows this result. The shaded elements were moved.
Since each node in the Linked representation of the above figure has exartly one link, the
structure of this figure is called a ‘Single Linked List’.the nodes are ordered from left to right
with each node (other than last one) linking to the next,and the last node has a NULL link,the
structure is also called a chain.
Insertion and Deletion of a Single Linked List:
Insertion Let the list be a Linked list with succesive nodes A and B as shown in below
figure.suppose a node N id to be inserted into the list between the node A and B.
In the New list the Node A points to the new Node N and the new node N points to the node B
to which Node A previously pointed.
Deletion:
Let list be a Linked list with node N between Nodes A and B is as shown in the following figure.
In the new list the node N is to be deleted from the Linked List. The deletion occurs as the link
field in the Node A is made to point node B this excluding node N from its path.
Lptr contains the address of the before node. Rptr contains the address of next node. Data
Contains the Linked List is as follows.
In the above diagram Last and Start are pointer variables which contains the address of last
node and starting node respectively.
Insertion in to the Double Linked List:Let list be a double linked list with successive modes A
and B as shown in the following diagram. Suppose a node N is to be inserted into the list
between the node s A and B this is shown in the following diagram.
As in the new list the right pointer of node A points to the new node N ,the Lptr of the node ‘N’
points to the node A and Rptr of node ‘N’ points to the node ‘B’ and Lpts of node B points the
new node ‘N’
Deletion Of Double Linked List :- Let list be a linked list contains node N between the nodes A
and B as shown in the following diagram.
Support node N is to be deleted from the list diagram will appear as the above mention double
linked list. The deletion occurs as soon as the right pointer field of node A charged, so that it
points to node B and the lift point field of node B is changed. So that it pointes to node A.
Circular Linked List:- Circular Linked List is a special type of linked list in which all the nodes are
linked in continuous circle. Circular list can be singly or doubly linked list. Note that, there are no
Nulls in Circular Linked Lists. In these types of lists, elements can be added to the back of the list
and removed from the front in constant time.
Both types of circularly-linked lists benefit from the ability to traverse the full list beginning at
any given node. This avoids the necessity of storing first Node and last node, but we need a
special representation for the empty list, such as a last node variable which points to some node
in the list or is null if it's empty. This representation significantly simplifies adding and removing
nodes with a non-empty list, but empty lists are then a special case. Circular linked lists are
most useful for describing naturally circular structures, and have the advantage of being able to
traverse the list starting at any point. They also allow quick access to the first and last records
through a single pointer (the address of the last element)
Circular linked list are one they of liner linked list. In which the link fields of last node of the list
contains the address of the first node of the list instead of contains a null pointer.
Advantages:- Circular list are frequency used instead of ordinary linked list because in circular
list all nodes contain a valid address. The important feature of circular list is as follows.
(1) In a circular list every node is accessible from a given node.
(2) Certain operations like concatenation and splitting becomes more efficient in circular
list.
Disadvantages: Without some conditions in processing it is possible to get into an infinite Loop.
Circular Double Linked List :- These are one type of double linked list. In which the rpt field of
the last node of the list contain the address of the first node ad the left points of the first node
contains the address of the last node of the list instead of containing null pointer.
Advantages:- circular list are frequently used instead of ordinary linked list because in circular
list all nodes contained a valid address. The important feature of circular list is as follows.
(1) In a circular list every node is accessible from a given node.
(2) Certain operations like concatenation and splitting becomes more efficient in
circular list.
Disadvantage:-Without some conditions in processes it is possible to get in to an infant glad.
Difference between single linked list and double linked list?
3. The insertion and deletion are 3. The insertion and deletion are
done by moving the elements either up done by only changing the pointers.
or down.
4. Successive elements need not
4. Successive elements occupy occupy adjacent space.
adjacent space on memory.
5. In linked list each location contains
5. In arrays each location contain DATA only data and pointer to denote whether the
6. The linear relation ship between the next element present in the memory.
data elements of an array is reflected by the
physical relation ship of data in the memory. 6. The linear relation ship between the
data elements of a Linked List is reflected
7. In array declaration a block of by the Linked field of the node.
memory space is required.
7. In Linked list there is no need of
8. There is no need of storage of such thing.
pointer or lines
Programming Time
Execution Time
Number of Comparisons
Memory Utilization
Computational Complexity
Complexity of sorting Algorithms: The complexity of sorting algorithm measures the running
time as a function of the number n of items to be stored. Each sorting algorithm S will be made
up of the following operations, where A1, A2, A3 An contain the
items to be sorted and B is an auxiliary location.
Define sorting? What is the difference between internal and external sorting methods? Ans:-
Sorting is a technique of organizing data. It is a process of arranging the elements either may be
ascending or descending order, ie; bringing some order lines with data.
TREES
INTRODUCTION
In linear data structure data is organized in sequential order and in non-linear data structure
data is organized in random order. A tree is a very popular non-linear data structure used in a
wide range of applications. Tree is a non-linear data structure which organizes data in
hierarchical structure and this is a recursive definition.
DEFINITION OF TREE:
Tree is collection of nodes (or) vertices and their edges (or) links. In tree data structure, every
individual element is called as Node. Node in a tree data structure stores the actual data of that
particular element and link to next element in hierarchical structure.
Note: 1. In a Tree, if we have N number of nodes then we can have a maximum of N- 1 number
of links or edges.
2. Tree has no cycles.
TREE TERMINOLOGIES:
1. Root Node: In a Tree data structure, the first node is called as Root Node. Every tree
must have a root node. We can say that the root node is the origin of the tree data structure. In
any tree, there must be only one root node. We never have multiple root nodes in a tree.
2. Edge: In a Tree, the connecting link between any two nodes is called as EDGE. In a tree
with 'N' number of nodes there will be a maximum of 'N-1' number of edges.
3. Parent Node: In a Tree, the node which is a predecessor of any node is called as PARENT
NODE. In simple words, the node which has a branch from it to any other node is called a
parent node. Parent node can also be defined as "The node which has child / children".
5. Siblings: In a Tree data structure, nodes which belong to same Parent are called as
SIBLINGS. In simple words, the nodes with the same parent are called Sibling nodes.
6. Leaf Node: In a Tree data structure, the node which does not have a child is called as
LEAF Node. In simple words, a leaf is a node with no child. In a tree data structure, the leaf
nodes are also called as External Nodes. External node is also a node with no child. In a tree,
leaf node is also called as 'Terminal' node.
7. Internal Nodes: In a Tree data structure, the node which has atleast one child is called as
INTERNAL Node. In simple words, an internal node is a node with atleast one child.
In a Tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root
node is also said to be Internal Node if the tree has more than one node. Internal nodes are also
called as 'Non-Terminal' nodes.
8. Degree: In a Tree data structure, the total number of children of a node is called as
DEGREE of that Node. In simple words, the Degree of a node is total number of children it has.
The highest degree of a node among all the nodes in a tree is called as 'Degree of Tree'
9. Level: In a Tree data structure, the root node is said to be at Level 0 and the children of
root node are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2
10. and so on... In simple words, in a tree each step from top to bottom is called as a Level
and the Level count starts with '0' and incremented by one at each level (Step).
11. Height: In a Tree data structure, the total number of edges from leaf node to a particular
node in the longest path is called as HEIGHT of that Node. In a tree, height of the root node is
said to be height of the tree. In a tree, height of all leaf nodes is '0'.
12. Depth: In a Tree data structure, the total number of egdes from root node to a particular
node is called as DEPTH of that Node. In a tree, the total number of edges from root node to a
leaf node in the longest path is said to be Depth of the tree. In simple words, the highest depth
of any leaf node in a tree is said to be depth of that tree. In a tree, depth of the root node is '0'.
13. Path: In a Tree data structure, the sequence of Nodes and Edges from one node to
another node is called as PATH between that two Nodes. Length of a Path is total number of
nodes in that path. In below example the path A - B - E - J has length 4.
14. Sub Tree: In a Tree data structure, each child from a node forms a subtree recursively.
Every child node will form a subtree on its parent node.
TREE REPRESENTATIONS:
A tree data structure can be represented in two methods. Those methods are as follows...
1. List Representation
2. Left Child - Right Sibling Representation
1. List Representation
In this representation, we use two types of nodes one for representing the node with data
called 'data node' and another for representing only references called 'reference node'. We start
with a 'data node' from the root node in the tree. Then it is linked to an internal node.
through a 'reference node' which is further linked to any other node directly. This process
repeats for all the nodes in the tree.
The above example tree can be represented using List representation as follows...
In this representation, every node's data field stores the actual value of that node. If that node has left a
child, then left reference field stores the address of that left child node otherwise stores NULL. If that
node has the right sibling, then right reference field stores the address of right sibling node otherwise
stores NULL.
The above example tree can be represented using Left Child - Right Sibling representation as follows...
BINARY TREE:
In a normal tree, every node can have any number of children. A binary tree is a special type of
tree data structure in which every node can have a maximum of 2 children. One is known as a
left child and the other is known as right child.
A tree in which every node can have a maximum of two children is called Binary Tree.
In a binary tree, every node can have either 0 children or 1 child or 2 children but not more than
2 children.
In general, tree nodes can have any number of children. In a binary tree, each node can have at
most two children. A binary tree is either empty or consists of a node called the root together
with two binary trees called the left subtree and the right subtree. A tree with no nodes is
called as a null tree
Example:
Strictly binary tree is also called as Full Binary Tree or Proper Binary Tree or 2-Tree.
Strictly binary tree data structure is used to represent mathematical expressions.Example
In a left skewed tree, most of the nodes have the left child without corresponding right child.
In a right skewed tree, most of the nodes have the right child without corresponding left child.
Properties of binary trees:
Some of the important properties of a binary tree are as follows:
1. If h = height of a binary tree, then
a. Maximum number of leaves = 2h
b. Maximum number of nodes = 2h + 1 - 1
2. If a binary tree contains m nodes at level l, it contains at most 2m nodes at level l + 1.
3. Since a binary tree can contain at most one node at level 0 (the root), it can contain at
most 2l node at level l.
4. The total number of edges in a full binary tree with n node is n –
BINARY TREE REPRESENTATIONS:
A binary tree data structure is represented using two methods. Those methods are as
follows...
1. Array Representation
2. Linked List Representation
Consider the following binary tree...
1. Array Representation of Binary Tree
In array representation of a binary tree, we use one-dimensional array (1-D Array) to
represent a binary tree.
Consider the above example of a binary tree and it is represented as follows...
To represent a binary tree of depth 'n' using array representation, we need one dimensional
array with a maximum size of 2n + 1.
2. Linked List Representation of Binary Tree
We use a double linked list to represent a binary tree. In a double linked list, every node
consists of three fields. First field for storing left child address, second for storing actual data
and third for the right child address.
In this linked list representation, a node has the following structure...
The above example of the binary tree represented using Linked list representation is shown as
follows...
In the above example of a binary tree, first we try to visit left child of root node 'A', but A's left
child 'B' is a root node for left subtree. so we try to visit its (B's) left child 'D' and again D is a
root for subtree with nodes D, I and J. So we try to visit its left child 'I' and it is the leftmost
child. So first we visit 'I' then go for its root node 'D' and later we visit D's right child 'J'. With
this we have completed the left part of node B. Then visit 'B' and next B's right child 'F' is
visited. With this we have completed left part of node A. Then visit root node 'A'. With this we
have completed left and root parts of node A. Then we go for the right part of the node A. In
right of A again there is a subtree with root C. So go for left child of C and again it is a subtree
with root G. But G does not have left part so we visit 'G' and then visit G's right child K. With this
we have completed the left part of node C. Then visit root node 'C' and next visit C's right child
'H' which is the rightmost child in the tree. So we stop the process.
That means here we have visited in the order of I - D - J - B - F - A - G - K - C - H using In- Order
Traversal.
2. Pre - Order Traversal ( root - leftChild - rightChild ):
In Pre-Order traversal, the root node is visited before the left child and right child nodes. In this
traversal, the root node is visited first, then its left child and later its right child. This pre- order
traversal is applicable for every root node of all subtrees in the tree. Preorder search is also
called backtracking.
Algorithm:
Step-1: Visit the root.
Step-2: Visit the left subtree, using preorder.
Step-3: Visit the right subtree, using preorder.
In the above example of binary tree, first we visit root node 'A' then visit its left child 'B'
which is a root for D and F. So we visit B's left child 'D' and again D is a root for I and
J. So we visit D's left child 'I' which is the leftmost child. So next we go for visiting D's right child
'J'. With this we have completed root, left and right parts of node D and root, left parts of node
B. Next visit B's right child 'F'. With this we have completed root and left parts of node A. So we
go for A's right child 'C' which is a root node for G and H. After visiting C, we go for its left child
'G' which is a root for node K. So next we visit left of G, but it does not have left child so we go
for G's right child 'K'. With this, we have completed node C's root and left parts. Next visit C's
right child 'H' which is the rightmost child in the tree. So we stop the process.
That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order Traversal.
3. Post - Order Traversal ( leftChild - rightChild - root ):
In Post-Order traversal, the root node is visited after left child and right child. In this
traversal, left child node is visited first, then its right child and then its root node. This is
recursively performed until the right most nodes are visited.
Algorithm:
Step-1: Visit the left subtree, using postorder.
Step-2: Visit the right subtree, using postorder
Step-3: Visit the root.
Definition
A graph G can be defined as an ordered set G(V, E) where V(G) represents the set of vertices
and E(G) represents the set of edges which are used to connect these vertices.
A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B), (B,C), (C,E), (E,D), (D,B), (D,A))
is shown in the following figure.
In a directed graph, edges form an ordered pair. Edges represent a specific path from
some vertex A to another vertex B. Node A is called initial node while node B is called
terminal node.
Graph Terminology
Path
A path can be defined as the sequence of nodes that are followed in order to reach
some terminal node V from the initial node U.
Closed Path
A path will be called as closed path if the initial node is same as terminal node. A path
will be closed path if V0=VN.
Simple Path
If all the nodes of the graph are distinct with an exception V0=VN, then such path P is
called as closed simple path.
Cycle
A cycle can be defined as the path which has no repeated edges or vertices except the
first and last vertices.
Connected Graph
A connected graph is the one in which some path exists between every two vertices (u,
v) in V. There are no isolated nodes in connected graph.
Complete Graph
A complete graph is the one in which every node is connected with all other nodes. A
complete graph contain n(n-1)/2 edges where n is the number of nodes in the graph.
Weighted Graph
In a weighted graph, each edge is assigned with some data such as length or weight.
The weight of an edge e can be given as w(e) which must be a positive (+) value
indicating the cost of traversing the edge.
Digraph
A digraph is a directed graph in which each edge of the graph is associated with some
direction and the traversing can be done only in the specified direction.
Loop
An edge that is associated with the similar end points can be called as Loop.
Adjacent Nodes
If two nodes u and v are connected via an edge e, then the nodes u and v are called as
neighbours or adjacent nodes.
Graph representation
By Graph representation, we simply mean the technique to be used to store some graph
into the computer's memory.
A graph is a data structure that consist a sets of vertices (called nodes) and edges.
There are two ways to store Graphs into the computer's memory:
Adjacency matrix
If adj[i][j] = w, it means that there is an edge exists from vertex i to vertex j with weight w.
aij = 0 {Otherwise}
If there is no self-loop present in the graph, it means that the diagonal entries of the
adjacency matrix will be 0.
There exist different adjacency matrices for the directed and undirected graph. In a directed
graph, an entry Aij will be 1 only when there is an edge directed from Vi to Vj.
In the above figure, we can see that there is a linked list or adjacency list for every node of
the graph. From vertex A, there are paths to vertex B and vertex D. These nodes are linked to
nodes A in the given adjacency list.
An adjacency list is maintained for each node present in the graph, which stores the node
value and a pointer to the next adjacent node to the respective node. If all the adjacent
nodes are traversed, then store the NULL in the pointer field of the last node of the list.
The sum of the lengths of adjacency lists is equal to twice the number of edges present in an
undirected graph.
Graph Traversal - BFS
Graph traversal is a technique used for searching a vertex in a graph. The graph traversal is also
used to decide the order of vertices is visited in the search process. A graph traversal finds the
edges to be used in the search process without creating loops. That means using graph traversal we
visit all the vertices of the graph without getting into looping path.
There are two graph traversal techniques and they are as follows...
Depth First Traversal (or DFS) for a graph is similar to Depth First Traversal of a tree.
Like Trees, we traverse all adjacent one by one, when we traverse an adjacent, we
finish traversal of all vertices reachable through the adjacent completely. After we
finish one adjacent and its reachable, we go to the next adjacent and finish all
reachable through next and continue this way. Similar to tree where we first
completely traverse the left subtree and then go to the right subtree. The only catch
here is, that, unlike trees, graphs may contain cycles (a node may be visited twice).
To avoid processing a node more than once, use a boolean visited array.
Example:
Input: V = 5, E = 5, edges = {{1, 2}, {1, 0}, {0, 2}, {2, 3}, {2, 4}}, s = 1
Output: 1 2 0 3 4
Explanation: The source vertex s is 1. visit it first, then we visit an adjacent. There
are two adjacent 1, 0 and 2. pick any of the two (
Input: V = 5, E = 4, edges = {{0, 2}, {0, 3}, {0, 1}, {2, 4}}, s = 0
Output: 0 2 4 3 1
Explanation: DFS Steps:
● For a connected graph having N vertices then the number of edges in the
The minimum spanning tree has all the properties of a spanning tree with an added
constraint of having the minimum possible weights among all possible spanning trees.
Like a spanning tree, there can also be many possible MSTs for a graph.
This is one of the popular algorithms for finding the minimum spanning tree from a
connected, undirected graph. This is a greedy algorithm. The algorithm workflow is as
below:
● At each iteration, the algorithm adds the next lowest-weight edge one by one,
such that the edges picked until now does not form a cycle.
Illustration:
The graph contains 9 vertices and 14 edges. So, the minimum spanning tree formed will
be having (9 – 1) = 8 edges.
Step 6: Pick edge 8-6. Since including this edge results in the cycle, discard it. Pick edge
2-3: No cycle is formed, include it.
Step 8: Pick edge 1-2. Since including this edge results in the cycle, discard it. Pick edge
3-4. No cycle is formed, include it.
Note: Since the number of edges included in the MST equals to (V – 1), so the algorithm
stops here
Hashing in Data Structure
Hashing is a technique used in data structures that efficiently stores and retrieves data in a way that
allows for quick access. It involves mapping data to a specific index in a hash table using a hash
function that enables fast retrieval of information based on its key. This method is commonly used
in databases, caching systems, and various programming applications to optimize search and
retrieval operations. The great thing about hashing is, we can achieve all three operations (search,
insert and delete) in O(1) time on average.
Hash Table
Hash table is one of the most important data structures that uses a special function known as a hash
function that maps a given value with a key to access the elements faster.
A Hash table is a data structure that stores some information, and the information has basically two
main components, i.e., key and value. The hash table can be implemented with the help of an
associative array. The efficiency of mapping depends upon the efficiency of the hash function used
for mapping.
For example, suppose the key value is John and the value is the phone number, so when we pass
the key value in the hash function shown as below:
Hash(key)= index;
When we pass the key in the hash function, then it gives the index.
Hash(john) = 3;
A Hash function assigns each value with a unique key. Sometimes hash table uses an imperfect
hash function that causes a collision because the hash function generates the same key of two
different values.
In Hashing technique, the hash table and hash function are used. Using the hash function, we can
calculate the address at which the value can be stored.
The main idea behind the hashing is to create the (key/value) pairs. If the key is given, then the
algorithm computes the index at which the value would be stored. It can be written as:
Index = hash(key)
There are three ways of calculating the hash function:
○ Division method
○ Folding method
○ Mid square method
h(ki) = ki % m;
For example, if the key value is 6 and the size of the hash table is 10. When we apply the hash
function to key 6 then the index would be:
h(6) = 6%10 = 6
Mid-Square Method
In the mid-square method, the key is squared, and the middle digits of the result are taken as
the hash value.
Steps:
1. Square the key.
2. Extract the middle digits of the squared value.
Advantages:
● Produces a good distribution of hash values.
Disadvantages:
● May require more computational effort.
Folding Method
The process involves two steps:
● except for the last component, which may have fewer digits than the other parts, the
key-value k should be divided into a predetermined number of pieces, such as k1, k2,
k3,..., kn, each having the same amount of digits.
● Add each element individually. The hash value is calculated without taking into
account the final carry, if any.
Formula:
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Advantages:
● Creates a simple hash value by precisely splitting the key value into equal-sized
segments.
● Without regard to distribution in a hash table.
Disadvantages:
● When there are too many collisions, efficiency can occasionally suffer.
k = 12345
s = 67 + 89 + 12
s = 168
Mid square method
Formula:
h(K) = h(k x k)
Advantages:
● This technique works well because most or all of the digits in the key value affect the result.
All of the necessary digits participate in a process that results in the middle digits of the
squared result.
● The result is not dominated by the top or bottom digits of the initial key value.
Disadvantages:
● The size of the key is one of the limitations of this system; if the key is large, its square will
contain twice as many digits.
● Probability of collisions occurring repeatedly.
k = 60 Therefore,k = k x k
k = 60 x 60
k = 3600 Thus,
h(60) = 60
Collision
When the two different values have the same value, then the problem occurs between the two
values, known as a collision. In the above example, the value is stored at index 6. If the key value is
26, then the index would be:
h(26) = 26%10 = 6
Therefore, two values are stored at the same index, i.e., 6, and this leads to the collision problem. To
resolve these collisions, we have some techniques known as collision techniques.
Open Hashing
In Open Hashing, one of the methods used to resolve the collision is known as a chaining method.
The value 11 would be stored at the index 5. Now, we have two values (6, 11) stored at the same
index, i.e., 5. This leads to the collision problem, so we will use the chaining method to avoid the
collision. We will create one more list and add the value 11 to this list. After the creation of the new
list, the newly created list will be linked to the list having value 6.
1. Linear probing
2. Quadratic probing
3. Double Hashing technique
Linear Probing
Linear probing is one of the forms of open addressing. As we know that each cell in the hash table
contains a key-value pair, so when the collision occurs by mapping a new key to the cell already
occupied by another key, then linear probing technique searches for the closest free locations and
adds a new key to that empty cell. In this case, searching is performed sequentially, starting from the
position where the collision occurs till the empty cell is not found.
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively. The calculated index value
of 11 is 5 which is already occupied by another key value, i.e., 6. When linear probing is applied, the
nearest empty cell to the index 5 is 6; therefore, the value 11 will be added at the index 6.
Quadratic Probing
In case of linear probing, searching is performed linearly. In contrast, quadratic probing is an open
addressing technique that uses quadratic polynomial for searching until a empty slot is found.
It can also be defined as that it allows the insertion ki at first free location from (u+i2)%m where i=0
to m-1.
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5, respectively. We do not need to apply
the quadratic probing technique on these key values as there is no occurrence of the collision.
The index value of 11 is 5, but this location is already occupied by the 6. So, we apply the quadratic
probing technique.
When i = 0
Index= (5+02)%10 = 5
When i=1
Index = (5+12)%10 = 6
The next element is 13. When the hash function is applied on 13, then the index value comes out to
be 9, which we already discussed in the chaining method. At index 9, the cell is occupied by another
value, i.e., 3. So, we will apply the quadratic probing technique to calculate the free location.
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5, respectively. We do not need to apply
the quadratic probing technique on these key values as there is no occurrence of the collision.
The index value of 11 is 5, but this location is already occupied by the 6. So, we apply the quadratic
probing technique.
When i = 0
Index= (5+02)%10 = 5
When i=1
Index = (5+12)%10 = 6