Cd3291 - Dsa - Book
Cd3291 - Dsa - Book
COURSE OBJECTIVES:
● To understand the concepts of ADTs
● To design linear data structures – lists, stacks, and queues
● To understand sorting, searching and hashing algorithms
● To apply Tree and Graph structures
UNIT I ABSTRACT DATA TYPES 9
Abstract Data Types (ADTs) – ADTs and classes – introduction to OOP – classes in
Python – inheritance – namespaces – shallow and deep copying. Introduction to analysis of
algorithms – asymptotic notations – recursion – analyzing recursive algorithms
UNIT II LINEAR STRUCTURES 9
List ADT – array-based implementations – linked list implementations – singly linked
lists – circularlylinked lists – doubly linked lists – applications of lists – Stack ADT – Queue
ADT – double ended queues
UNIT III SORTING AND SEARCHING 9
Bubble sort – selection sort – insertion sort – merge sort – quick sort – linear search
– binary search– hashing – hash functions – collision handling – load factors, rehashing, and
efficiency
UNIT IV TREE STRUCTURES 9
Tree ADT – Binary Tree ADT – tree traversals – binary search trees – AVL trees –
heaps – multiway search trees
UNIT V GRAPH STRUCTURES 9
Graph ADT – representations of graph – graph traversals – DAG – topological
ordering – shortest paths – minimum spanning trees
TOTAL: 45 HOURS
COURSE OUTCOMES:
At the end of the course, the student should be able to:
1
UNIT I - ABSTRACT DATA TYPES
Variables
Ex: x2+2y-2=1
Data Types
The number of bits allocated for each primitive data type depends on the
programming languages, the compiler and the operating system.
For the same primitive data type, different languages may use different sizes.
Depending on the size of the data types, the total available values (domain) will also change.
For example, “int” may take 2 bytes or 4 bytes. If it takes 2 bytes (16 bits), then the
total possible values are minus 32,768 to plus 32,767 (-215 to 215-1). If it takes 4 bytes (32
bits), then the possible values are between -2,147,483,648 and +2,147,483,647 (-231 to
231-1). The same is the case with other data types.
2
User defined data types
If the system-defined data types are not enough, then most programming languages
allow the users to define their own data types, called user – defined data types.
Good examples of user defined data types are: structures in C/C + + and classes in
Java.
For example, in the snippet below, we are combining many system-defined data
types and calling the user defined data type by the name “newType”.
This gives more flexibility and comfort in dealing with computer memory.
struct newType
{
int data1;
float data2;
.
.
.
char datan;
};
Data Structures
Data structure is a particular way of storing and organizing data in a computer so that
it can be used efficiently.
A data structure is a special format for organizing and storing data. General data
structure types include arrays, files, linked lists, stacks, queues, trees, graphs and so on.
Depending on the organization of the elements, data structures are classified into two types:
1) Linear data structures: Elements are accessed in a sequential order but it is not
compulsory to store all elements sequentially. Examples: Linked Lists, Stacks andQueues.
2) Non – linear data structures: Elements of this data structure are stored/accessed
in a non-linear order. Examples: Trees and graphs.
3
Abstract Data Types (ADTs)
For user-defined data types we also need to define operations. The implementation
for these operations can be done when we want to actually use them. That means, in
general, user defined data types are defined along with their operations.
To simplify the process of solving problems, we combine the data structures with
their operations and we call this Abstract Data Types (ADTs). An ADT consists of two parts:
1. Declaration of data
2. Declaration of operations
Commonly used ADTs include: Linked Lists, Stacks, Queues, Priority Queues, Binary
Trees, Dictionaries, Disjoint Sets (Union and Find), Hash Tables, Graphs, and many others.
Robustness
A program produces the right output for all the anticipated inputs in the program’s
application. In addition, we want software to be robust, that is, capable of handling
unexpected inputs that are not explicitly defined for its application.
Adaptability
Software, therefore, needs to be able to evolve over time in response to changing
conditions in its environment. Thus, another important goal of quality software is that it
achieves adaptability (also called evolvability).
Related to this concept is portability, which is the ability of software to run with
minimal change on different hardware and operating system platforms. An advantage
of writing software in Python is the portability provided by the language itself.
Reusability
Developing quality software can be an expensive enterprise, and its cost can be
offset somewhat if the software is designed in a way that makes it easily reusable in future
applications.
Such reuse should be done with care, however, for one of the major sources of
software errors in the Therac-25 came from inappropriate reuse of Therac-20 software.
4
Object-Oriented Design Principles
Chief principles of object-oriented approach is,
Modularity
Abstraction
Encapsulation
Modularity
Modularity refers to an organizing principle in which different components of a
software system are divided into separate functional units.
Python’s standard libraries include, for example, the math module, which provides
definitions for key mathematical constants and functions, and the os module, which provides
support for interacting with the operating system.
Abstraction
Applying the abstraction paradigm to the design of data structures gives rise to
abstract data types (ADTs). An ADT is a mathematical model of a data structure that
specifies the type of data stored, the operations supported on them, and the types of
parameters of the operations, the collective set of behaviours supported by an ADT
designed as public interface.
Python supports abstract data types using a mechanism known as an abstract base
class (ABC). An abstract base class cannot be instantiated (i.e., you cannot directly create
an instance of that class), but it defines one or more common methods that all
implementations of the abstraction must have.
An ABC is realized by one or more concrete classes that inherit from the abstract
base class while providing implementations for those method declared by the ABC.
Encapsulation
5
Encapsulation yields robustness and adaptability, for it allows the implementation
details of parts of a program to change without adversely affecting other parts, thereby
making it easier to fix bugs or add new functionality with relatively local changes to a
component.
Class Definitions
A class provides a set of behaviours in the form of member functions (also known as
methods), with implementations that are common to all instances of that class.
A class also serves as a blueprint for its instances, effectively determining the way
that state information for each instance is represented in the form of attributes (also known
as fields, instance variables, or data members)
Defining a Class:
Like function definitions begin with the def keyword in Python, class definitions begin
with a class keyword.
The first string inside the class is called docstring and has a brief description of the
class. Although not mandatory, this is highly recommended.
class My NewClass:
'''This is a doc string. I have created a new class'''
Pass
A class creates a new local namespace where all its attributes are defined. Attributes
may be data or functions.
There are also special attributes in it that begins with double underscores __. For
example, __doc__ gives us the doc string of that class.
As soon as we define a class, a new class object is created with the same name.
This class object allows us to access the different attributes as well as to instantiate new
objects of that class.
Example:
class Person:
"This is a person class"
age = 10
def greet(self):
print('Hello')
6
print(Person.age)
print(Person.greet)
print(Person.__doc__)
Output:
10
<function Person.greet at 0x7fc78c6e8160>
This is a person class
We saw that the class object could be used to access different attributes. It can also
be used to create new object instances (instantiation) of that class. The procedure to create
an object is similar to a function call.
This will create a new object instance named harry. We can access the attributes of
objects using the object name prefix.
Example:
class CreditCard:
def init (self, customer, bank, acnt, limit):
self. customer = customer
self. bank = bank
self. account = acnt
self. limit = limit
self. balance = 0
def get customer(self):
return self. Customer
def get bank(sf):
return self. Bank
7
The Constructor
__init__ method that serves as the constructor of the class. Its primary
responsibilityis to establish the state of a newly created credit card object with appropriate
instance variables.
Encapsulation
A single leading underscore in the name of a data member, such as balance, implies
that it is intended as non-public. Users of a class should not directly access such members.
Additional Methods
The most interesting behaviours in our class are charge and make payment. The
charge function typically adds the given price to the credit card balance, to reflect a purchase
of said price by the customer.
Inheritance:
There are two ways in which a subclass can differentiate itself from its superclass. A
subclass may specialize an existing behaviour by providing a new implementation that
overrides an existing method. A subclass may also extend its superclass by providing brand
new methods.
Syntax:
class BaseClass:
Body of base class
class DerivedClass(BaseClass):
Body of derived class
Derived class inherits features from the base class where new features can be added
to it. This results in re-usability of code.
8
Types of Inheritance
Depending upon the number of child and parent classes involved, there are four
types of inheritance in python.
Single Inheritance
When a child class inherits only a single parent class.
Example:
class Parent:
def func1(self):
print("this is function one")
class Child(Parent):
def func2(self):
print(" this is function 2 ")
ob = Child()
ob.func1()
ob.func2()
Multiple Inheritance
When a child class inherits from more than one parent class.
Example:
class Parent:
def func1(self):
print("this is function 1")
class Parent2:
def func2(self):
print("this is function 2")
class Child(Parent , Parent2):
def func3(self):
print("this is function 3")
ob = Child()
ob.func1()
ob.func2()
9
ob.func3()
Multilevel Inheritance
When a child class becomes a parent class for another child class.
Example:
class Parent:
def func1(self):
print("this is function 1")
class Child(Parent):
def func2(self):
print("this is function 2")
class Child2(Child):
def func3("this is function 3")
ob = Child2()
ob.func1()
ob.func2()
ob.func3()
Hierarchical Inheritance
Hierarchical inheritance involves multiple inheritance from the same base or parent
class.
Example:
class Parent:
def func1(self):
print("this is function one")
class Child(Parent):
def func2(self):
print("this is function 2")
class Child1(Parent):
def func3(self):
print(" this is function 3"):
class Child3(Parent , Child1):
def func4(self):
print(" this is function 4")
ob = Child3()
ob.func1()
10
Whenever an identifier is assigned to a value that definition is made with a specific
scope. Top-level assignments are typically made in what is known as global scope.
Assignments made within the body of a function typically have scope that is local to that
function call. Therefore, an assignment, x=5, within a function has no effect on the identifier,
x, in the broader scope. Each distinct scope in Python is represented using an abstraction
known as a namespace. A namespace manages all identifiers that are currently defined in a
given scope.
The process of determining the value associated with an identifier is known as name
resolution.
In a Python program, there are three types of namespaces:
1. Built-In
2. Global
3. Local
11
a. Built-in Namespace in Python
This namespace gets created when the interpreter starts. It stores all the keywords or
the built-in names. This is the superset of all the Namespaces. This is the reason we can
use print, True, etc. from any part of the code.
This is the namespace that holds all the global objects. This namespace gets created
when the program starts running and exists till the end of the execution.
Output:
PythonGeeks
PythonGeeks
This is the namespace that generally exists for some part of the time during the
execution of the program. This stores the names of those objects in a function. These
namespaces exist as long as the functions exist. This is the reason we cannot globally
access a variable, created inside a function.
Output:
Python
12
Copy an Object in Python
In Python, we use = operator to create a copy of an object. It only creates a new
variable that shares the reference of the original object.
Let's take an example where we create a list named old_list and pass an object
reference to new_list using = operator.
Example:
#Copy using = operator
old_list = [[1, 2, 3], [4, 5, 6], [7, 8, 'a']]
new_list = old_list T
The output will be:
new_list[2][2] = 9
O
print('Old List:', old_list) ld List: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
I
print('ID of Old List:', id(old_list))
D of Old List: 140673303268168
print('New List:', new_list) N
ew List: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print('ID of New List:', id(new_list))
I
D of New List: 140673303268168
The output both variables old_list and new_list shares the same id i.e
140673303268168.So, the changes in new_list or old_list, will be visible in both.
Essentially, sometimes you may want to have the original values unchanged and only modify
the new values or vice versa. In Python, there are two ways to create copies:
Shallow Copy
Deep Copy
To make these copy work, copy module is used.
For example:
import copy
copy.copy(x)
copy.deepcopy(x)
Here, the copy() return a shallow copy of x. Similarly, deepcopy() return a deep copy of x.
13
Shallow Copy
A shallow copy creates a new object which stores the reference of the original
elements.So, a shallow copy doesn't create a copy of nested objects, instead it just copies
the reference of nested objects. This means, a copy process does not recurse or create
copies of nested objects itself.
In the above program, shallow copy of old_list is created. The new_list contains
references to original nested objects stored in old_list. Then we add the new list i.e [4, 4, 4]
into old_list. This new sublist was not copied in new_list.
14
New list: [[1, 1, 1], [2, 'AA', 2], [3, 3, 3]]
In the above program, changes to old_list i.e old_list[1][1] = 'AA' affects both sublists
old_list and new_list at index [1][1]. This is because, both lists share the reference of same
nested objects.
Deep Copy
A deep copy creates a new object and recursively adds the copies of nested objects
present in the original elements.
In the above program, changes to any nested objects in original object old_list, makes
changes to the copy new_list.
Example: Adding a new nested object in the list using Deep copy
import copy
old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.deepcopy(old_list)
old_list[1][0] = 'BB'
print("Old list:", old_list)
print("New list:", new_list)
Output:
Old list: [[1, 1, 1], ['BB', 2, 2], [3, 3, 3]]
New list: [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
In the above program, changes in old_list, makes changes only in the old_list. This
means, both the old_list and the new_list are independent. This is because the old_list was
recursively copied, which is true for all its nested objects.
15
Introduction to Analysis of Algorithms
Experimental running times of two algorithms are difficult to directly compare unless
the experiments are performed in the same hardware and software environments.
Experiments can be done only on a limited set of test inputs; hence, they leave out
the running times of inputs not included in the experiment (and these inputs may be
important).
An algorithm must be fully implemented in order to execute it to study its running
time experimentally.
1. Allows us to evaluate the relative efficiency of any two algorithms in a way that is
independent of the hardware and software environment.
Types of Analysis
Algorithm analysis depends on which inputs the algorithm takes less time
(performing wel1) and with which inputs the algorithm takes a long time.
Worst case
16
Defines the input for which the algorithm takes a long time (slowest time to
complete).
Input is the one for which the algorithm runs the slowest.
Best case
Defines the input for which the algorithm takes the least time (fastest time to
complete).
Input is the one for which the algorithm runs the fastest.
Average case
Provides a prediction about the running time of the algorithm.
Run the algorithm many times, using many different inputs and divided by the
number of trials.
Assumes that the input is random.
Asymptotic Notation
For the best, average and worst cases, we need to identify the upper and lower
bounds. To represent these upper and lower bounds, we need some kind of syntax,
represented in the form of function f(n).
This notation gives the tight upper bound of the given function. Generally, it is
represented as f(n) = O(g(n)). That means, at larger values of n, the upper bound of f(n) is
g(n).
For example, if f(n) = n 4 + 100n 2 + 10n + 50 is the given algorithm, then n 4 is g(n).
That means g(n) gives the maximum rate of growth for f(n) at larger values of n.
Big-O Examples
17
This notation gives the tighter lower bound of the given algorithm and we represent it
as f(n) = Ω(g(n)). That means, at larger values of n, the tighter lower bound of f(n) is g(n).
For example, if f(n) = 100n2 + 10n + 50, g(n) is Ω(n2 ).
There is a notation that allows us to say that two functions grow at the same rate, up
to constant factors. f(n) is Θ(g(n)), pronounced “ f(n) is big-Theta of g(n),” if f(n) is O(g(n))
and f(n) is Ω(g(n)) , that is, there are real constants c > 0 and c > 0, and an integer constant
n0 ≥ 1 such that
There are some general rules to help us determine the running time of an algorithm.
1) Loops: The running time of a loop is, at most, the running time of the statements inside
the loop (including tests) multiplied by the number of iterations.
2) Nested loops: Analyze from the inside out. Total running time is the product of the sizes
of all the loops.
Example:
//outer loop
For(i=1;i<=n;i++)
For(j=1;j<=n;j++)
K=k+1 //constant time, c
Total time = c × n × n = c n2 = O(n2 ).
Example:
X=x+1
For(i=1;i<=n;i++)
18
M=m+2 //constant time, c
//outer loop
For(i=1;i<=n;i++)
For(j=1;j<=n;j++)
K=k+1 //constant time, c
4) If-then-else statements:
Worst-case running time: the test, plus either the then part or the else part (whichever is the
larger).
//test : constant
If(length()==0)
Return false;
Else:
For(int n=0;n<length();n++)
If(!list[n].equals(otherList.list[n]))
Return false;
Total time = c0 + c1 + (c2 + c3 ) * n = O(n).
19
Recursion
Recursion is a technique by which a function makes one or more calls to itself during
execution, until the condition gets satisfied. Recursion provides a powerful alternative for
performing repetitive tasks.
Example
The factorial of a positive integer n, denoted n!, is defined as the product of the
integers from 1 to n. If n = 0, then n! is defined as 1 by convention. More formally, for any
integer n ≥ 0.
The factorial function is used to find the number of ways in which n distinct items can
be arranged into a sequence, that is, the number of permutations of n items. For example,
the three characters a, b, and c can be arranged in 3! = 3 · 2 · 1 = 6 ways: abc, acb, bac,
bca, cab, and cba.
Recursion is not just a mathematical notation; we can use recursion to design aPython
implementation of a factorial function, as shown in Code Fragment 4.1.
def factorial(n):
if n == 0:
return 1
else:
return n*factorial(n−1)
20
Trace for the factorial function is,
This is to draw the markings of a typical English ruler. For each inch, we place a tick
with a numeric label. We denote the length of the tick designating a whole inch as the major
tick length. Between the marks for whole inches, the ruler contains a series of minor ticks,
placed at intervals of 1/2 inch, 1/4 inch, and so on.
21
Python Code:
"""Draw one line with given tick length (followed by optional label)."""
if tick_label:
print(line)
def drawinterval(centerlength):
drawinterval(centerlength-1)
drawruler(2,4)
22
Trace of the English Ruler Code:
3) Binary Search
Binary search, that is used to efficiently locate a target value within a sorted
sequence of n elements.
The algorithm maintains two parameters, low and high, such that all the candidate
entries have index at least low and at most high. Initially, low = 0 and high = n− 1. We then
compare the target value to the median candidate, that is, the item data[mid] with index
If the target equals data[mid], then we have found the item we are looking for,
and the search terminates successfully.
If target < data[mid], then we recur on the first half of the sequence, that is, on
the interval of indices from low to mid−1.
If target > data[mid], then we recur on the second half of the sequence, that
is, on the interval of indices from mid+1 to high.
23
An unsuccessful search occurs if low > high, as the interval [low,high] is
empty.
This algorithm is known as binary search. Whereas sequential search runs in O(n)
time, the more efficient binary search runs in O(logn) time.
4) File Systems
Modern operating systems define file-system directories (which are also sometimes
called “folders”) in a recursive way. Namely, a file system consists of a top-level directory,
and the contents of this directory consists of files and other directories, which in turn can
contain files and other directories, and so on.
24
25
Analyzing Recursive Algorithm
With a recursive algorithm, we will account for each operation that is performed
based upon the particular activation of the function that manages the flow of control at the
time it is executed. Stated another way, for each invocation of the function, we only account
for the number of operations that are performed within the body of that activation. We can
then account for the overall number of operations that are executed as part of the recursive
algorithm by taking the sum, over all activations, of the number of operations that take place
during each individual activation
Computing Factorials
It is relatively easy to analyze the efficiency of our function for computing factorials.
Sample recursion trace is,
In analyzing the English ruler application, the fundamental question of how many
total lines of output are generated by an initial call to draw interval(c), where c denotes the
center length. This is a reasonable benchmark for the overall efficiency of the algorithm as
each line of output is based upon a call to the draw line utility, and each recursive call to
draw interval with nonzero parameter makes exactly one direct call to draw line. Some
intuition may be gained by examining the source code and the recursion trace. We know that
a call to draw interval(c) for c > 0 spawns two calls to draw interval(c−1) and a single call to
draw line. We will rely on this intuition to prove the following claim. Proposition 4.1: For c ≥ 0,
a call to draw interval(c) results in precisely 2c − 1 lines of output.
26
Justification:
In fact, induction is a natural mathematical technique for proving the correctness and
efficiency of a recursive process. In the case of the ruler, we note that an application of draw
interval(0) generates no output, and that 20 −1 = 1−1 = 0. This serves as a base case for our
claim. More generally, the number of lines printed by draw interval(c) is one more than twice
the number generated by a call to draw interval(c−1), as one center line is printed between
two such recursive calls. By induction, we have that the number of lines is thus 1+2 ·(2c−1
−1) = 1+2c −2 = 2c −1. This proof is indicative of a more mathematically rigorous tool, known
as a recurrence equation that can be used to analyze the running time of a recursive
algorithm.
Considering the running time of the binary search algorithm, a constant number of
primitive operations are executed at each recursive call of method of a binary search.
Hence, the running time is proportional to the number of recursive calls performed. The most
log n+1 recursive calls are made during a binary search of a sequence having n elements,
leading to the following claim. The binary search algorithm runs in O(logn) time for a sorted
sequence with n elements.
Justification:
Each recursive call the number of candidate entries still to be searched is given by
the value high−low+1. Moreover, the number of remaining candidates is reduced by at least
one half with each recursive call. Specifically, from the definition of mid, the number of
remaining candidates is either.
Initially, the number of candidates is n; after the first call in a binary search, it is at
most n/2; after the second call, it is at most n/4; and so on. In general, after the j th call in a
binary search, the number of candidate entries remaining is at most n/2j . In the worst case
(an unsuccessful search), the recursive calls stop when there are no more candidate entries.
Hence, the maximum number of recursive calls performed, is the smallest integer r such that
n 2r < 1. In other words (recalling that we omit a logarithm’s base when it is 2), r > logn.
Thus, we have r = logn+1, which implies that binary search runs in O(logn) time
To characterize the “problem size” for our analysis, we let n denote the number of
file-system entries in the portion of the file system that is considered. (For example, the file
system portrayed in Figure 4.6 has n = 19 entries.) To characterize the cumulative time
27
spent for an initial call to the disk usage function, we must analyze the total number of
recursive invocations that are made, as well as the number of operations that are executed
within those invocations.
Intuitively, a call to disk usage for a particular entry ‘e’ of the file system is only made
from ‘e’, and that entry will only be explored once.
The fact that each iteration of that loop makes a recursive call to disk usage, and yet
we have already concluded that there are a total of n calls to disk usage (including the
original call). We therefore conclude that there are O(n) recursive calls, each of which uses
O(1) time outside the loop, and that the overall number of operations due to the loop is O(n).
Summing all of these bounds, the overall number of operations is O(n).
28
Unit I – 2 Mark Questions with Answers
The number of bits allocated for each primitive data type depends on the
programming languages, the compiler and the operating system.
For the same primitive data type, different languages may use different sizes.
Depending on the size of the data types, the total available values (domain) will also change.
For example, “int” may take 2 bytes or 4 bytes. If it takes 2 bytes (16 bits), then the
total possible values are minus 32,768 to plus 32,767 (-215 to 215-1). If it takes 4 bytes (32
bits), then the possible values are between -2,147,483,648 and +2,147,483,647 (-231 to
231-1). The same is the case with other data types.
29
5. What are Data Structures?
Data structure is a particular way of storing and organizing data in a computer so that
it can be used efficiently.
A data structure is a special format for organizing and storing data. General data
structure types include arrays, files, linked lists, stacks, queues, trees, graphs and so on.
Depending on the organization of the elements, data structures are classified into two types:
1) Linear data structures: Elements are accessed in a sequential order but it is not
compulsory to store all elements sequentially. Examples: Linked Lists, Stacks and Queues.
2) Non – linear data structures: Elements of this data structure are stored/accessed
in a non-linear order. Examples: Trees and graphs.
7. Characteristics of Python?
Robustness
Adaptable
Reusable
Modular
30
9. Is Python Adaptable?
Software, needs to be able to evolve over time in response to changing conditions in
its environment. Thus, another important goal of quality software is that it achieves
adaptability (also called evolvability).
Related to this concept is portability, which is the ability of software to run with
minimal change on different hardware and operating system platforms. An advantage of
writing software in Python is the portability provided by the language itself.
Such reuse should be done with care, however, for one of the major sources of
software errors in the Therac-25 came from inappropriate reuse of Therac-20 software.
Python supports abstract data types using a mechanism known as an abstract base
class (ABC). An abstract base class cannot be instantiated (i.e., you cannot directly create
an instance of that class), but it defines one or more common methods that all
implementations of the abstraction must have.
An ABC is realized by one or more concrete classes that inherit from the abstract
base class while providing implementations for those method declared by the ABC.
31
14. What is Encapsulation?
Wrapping up on Class and Data together into single unit is called Encapsulation.
Different components of a software system should not reveal the internal details of their
respective implementations.
Encapsulation yields robustness and adaptability, for it allows to fix bugs or add new
functionality with relatively local changes to a component.
A class is a collection of objects. A class contains the blueprints or the prototype from
which the objects are being created. It is a logical entity that contains some attributes and
methods.
A class provides a set of behaviors in the form of member functions (also known as
methods), with implementations that are common to all instances of that class.
A class determines the way that state information for each instance, it is represented
in the form of attributes (also known as fields, instance variables, or data members)
Example:
class Person:
__init__(self,a,b):
Print(“Sum=”,a+b)
Obj=Person(2,3);
Output:
Sum=5
__init__ is a Special method that serves as the constructor of the class. Its primary
responsibility is to establish the state of a newly created credit card object with appropriate
instance variables.
Example:
class Person:
__init__(self,a,b):
Print(“Sum=”,a+b)
Obj=Person(2,3);
Output:
Sum=5
32
17. What is Inheritance?
This allows a new class to be defined based upon an existing class as the starting
point. In object-oriented terminology, the existing class is typically described as the base
class, parent class, or superclass, while the newly defined class is known as the subclass or
child class.
Syntax:
class BaseClass:
Body of base class
class DerivedClass(BaseClass):
Body of derived class
Derived class inherits features from the base class where new features can be added
to it. This results in re-usability of code.
33
20. What is Multiple Inheritance?
When a child class inherits from more than one parent class.
Example:
class Parent:
def func1(self):
print("this is function 1")
class Parent2:
def func2(self):
print("this is function 2")
class Child(Parent , Parent2):
def func3(self):
print("this is function 3")
ob = Child()
ob.func1()
ob.func2()
ob.func3()
34
22. What is Hierarchical Inheritance?
Hierarchical inheritance involves multiple inheritance from the same base or parent
class.
Example:
class Parent:
def func1(self):
print("this is function one")
class Child(Parent):
def func2(self):
print("this is function 2")
class Child1(Parent):
def func3(self):
print(" this is function 3"):
class Child3(Parent , Child1):
def func4(self):
print(" this is function 4")
ob = Child3()
ob.func1()
1. Built-In
2. Global
3. Local
35
25. What isPython Global Namespace?
This is the namespace that holds all the global objects. This namespace gets created
when the program starts running and exists till the end of the execution.
This namespace gets created when the interpreter starts. It stores all the keywords or
the built-in names. This is the superset of all the Namespaces. This is the reason we can
use print, True, etc. from any part of the code.
This is the namespace that generally exists for some part of the time during the
execution of the program. This stores the names of those objects in a function.
These namespaces exist as long as the functions exist. This is the reason we cannot
globally access a variable, created inside a function.
Output:
NameError: name 'var2' is not defined
36
o Deep Copy
To make these copy work, copy module is used.
For example:
import copy
copy.copy(x)
copy.deepcopy(x)
31. Give the procedure to create a new object from original elements.
A deep copy creates a new object and recursively adds the copies of nested objects
present in the original elements.
37
print("Old list:", old_list)
print("New list:", new_list)
32. What is the use of Algorithm analysis?
Algorithm analysis helps us to determine which algorithm is most efficient in terms of
time and space consumed.
Worst case
Defines the input for which the algorithm takes a long time (slowest time to
complete).
Input is the one for which the algorithm runs the slowest.
Best case
Defines the input for which the algorithm takes the least time (fastest time to
complete).
Input is the one for which the algorithm runs the fastest.
Average case
Provides a prediction about the running time of the algorithm.
Run the algorithm many times, using many different inputs and divide by the
number of trials.
Assumes that the input is random.
38
Lower Bound <= Average Time <= Upper Bound
There are some general rules to help us determine the running time of an algorithm.
1) Loops: The running time of a loop is, at most, the running time of the statements inside
the loop (including tests) multiplied by the number of iterations.
38. How will you analyse the running time complexity of Nested Loops?
Nested loops: Analyze from the inside out. Total running time is the product of the
sizes of all the loops.
Example:
//outer loop
39
For(i=1;i<=n;i++)
For(j=1;j<=n;j++)
K=k+1 //constant time, c
Total time = c × n × n = c n2 = O(n2 ).
39. How will you analyse the running time complexity of Consecutive Statements?
40. How will you analyse the running time complexity of if-then-else statements?
If-then-else statements - Worst-case running time: the test, plus either the then part
or the else part (whichever is the larger).
//test : constant
If(length()==0)
Return false;
Else:
For(int n=0;n<length();n++)
If(!list[n].equals(otherList.list[n]))
Return false;
Total time = c0 + c1 + (c2 + c3 ) * n = O(n).
Recursion is a technique by which a function makes one or more calls to itself during
execution, until the condition gets satisfied. Recursion provides a powerful alternative for
performing repetitive tasks.
def factorial(n):
if n == 0:
return 1
40
else:
return n*factorial(n−1)
42. Give the procedure for drawing an English Ruler using Recursion.
41
UNIT II LINEAR STRUCTURES
List ADT:
Array Implementation:
Basic structure for storing and accessing a collection of data is the array. A one-
dimensional array is a collection of contiguous elements in which individual elements are
identified by a unique integer subscript starting with zero. Once an array is created, its size
cannot be changed.
Python’s list structure is a mutable sequence container that can change size as
items are added or removed. It is an abstract data type that is implemented using an array
structure to store the items contained in the list.
42
Appending Items
pyList.append( 50 )
If there is room in the array, the item is stored in the next available slot of the array
and the length field is incremented by one.
pyList.append( 18 )
pyList.append( 64 )
pyList.append( 6)
After the second statement is executed, the array becomes full and there is no
available space to add more values.
By definition, a list can contain any number of items and never becomes full. Thus,
when the third statement is executed, the array will have to be expanded to make room for
value 6. Array cannot change size once it has been created. To allow for the expansion of
the list, the following steps have to be performed:
(2) The items from the original array are copied to the new array,
(3) The new larger array is set as the data structure for the list,
(4) The original smaller array is destroyed. After the array has been expanded, the
value can be appended to the end of the list.
43
Extending a List
A list can be appended to a second list using the extend() method as shown in the
following example:
pyListA = [ 34, 12 ]
pyListB = [ 4, 6, 31, 9 ]
pyListA.extend( pyListB )
If the list being extended has the capacity to store all of the elements from the
second list, the elements are simply copied, element by element. If there is not enough
capacity for all of the elements, the underlying array has to be expanded as was done with
the append() method.
44
Inserting Items
An item can be inserted anywhere within the list using the insert() method. In the
following example pyList.insert( 3, 79 ) we insert the value 79 at index position 3. Since there
is already an item at that position, we must make room for the new item by shifting all of the
items down one position starting with the item at index position 3. After shifting the items, the
value 79 is then inserted at position 3.
Removing Items
An item can be removed from any position within the list using the pop() method.
Consider the following code segment, which removes both the first and last items from the
sample list:
The first statement removes the first item from the list. After the item is removed,
typically by setting the reference variable to None, the items following it within the array are
shifted down, from left to right, to close the gap. Finally, the length of the list is decremented
to reflect the smaller size.
The second pop() operation in the example code removes the last item from the list.
Since there are no items following the last one, the only operations required are to remove
the item and decrement the size of the list. After removing an item from the list, the size of
the array may be reduced using a technique similar to that for expansion. This reduction
occurs when the number of available slots in the internal array falls below a certain
45
threshold. For example, when more than half of the array elements are empty, the size of the
array may be cut in half.
List Slice
Slicing is an operation that creates a new list consisting of a contiguous subset of
elements from the original list. The original list is not modified by this operation. Instead,
references to the corresponding elements are copied and stored in the new list. In Python,
slicing is performed on a list using the colon operator and specifying the beginning element
index and the number of elements included in the subset. Consider the following example
code segment, which creates a slice from our sample list: aSlice = theVector[2:3]
Python Code
import ctypes
class dy_array:
def __init__(self):
self.n=0
self.capacity=1
self.Arr=self.makearray(self.capacity)
def makearray(self,c):
return (c*ctypes.py_object)( )
def findlength(self,obj):
for i in obj:
pass
46
return (i)
def getitem(self,x):
for i in range(len):
if(self.Arr[i]==x):
return i
else:
print("Data Not Found")
def append(self,obj):
if(self.n==self.capacity):
self.resize(2*self.capacity)
self.Arr[self.n]=obj
self.n+=1
def resize(self,c):
B=self.makearray(2*self.capacity)
for i in range(self.n):
B[i]=self.Arr[i]
self.Arr=B
self.capacity=c
def insert(self,pos,val):
if(self.n==self.capacity):
self.resize(2*self.capacity)
for i in range(self.n,pos,-1):
self.Arr[i]=self.Arr[i-1]
self.Arr[pos]=val
self.n+=1
def extend(self,val):
len=self.findlength(val)
print("..",len)
for i in range(len):
self.append(val[i])
def remove(self,val):
for i in range(self.n):
if self.Arr[i]==val:
for j in range(i,self.n-1):
self.Arr[j]=self.Arr[j+1]
self.n-=1
def disp(self):
for i in range(self.n):
print(self.Arr[i])
47
Linked List Implementation:
48
After Insertion
49
Algorithm remove first(L):
if L.head is None then
Indicate an error: the list is empty.
L.head = L.head.next {make head point to next node (or None)}
L.size = L.size−1 {decrement the node count}
def __init__(self):
self.head=None
self.tail=None
self.size=0
def len(self):
return self.size
def insert_first(self,data):
newnode=L.Node(data)
newnode.next=self.head
self.head=newnode
if self.tail==None:
self.tail=newnode
self.size+=1
def insert_last(self,data):
newnode=L.Node(data)
if self.tail.data==None:
self.head=self.tail=newnode
else:
self.tail.next=newnode;
self.tail=newnode
self.size+=1
def remove_first(self):
if self.head==None:
print("Invalid")
else:
self.head=self.head.next
self.size-=1
def display(self):
n=L.Node(None);
n=self.head
for i in range(self.size):
print(n.data)
n=n.next
def length(self):
return self.size
50
Circularly Linked List:
A circularly linked list, is a collection of nodes that collectively form a linear sequence
and the next of tail node point back to the head of the list.
A circularly linked list provides a more general model than a standard linked list for
data sets that are cyclic, that is, which do not have any particular notion of a beginning and
end.
A A
A1
2 3
def __init__(self):
self.head=None
self.tail=None
self.size=0
Insert at first:
def insert_first(self,data):
newnode=L.Node(data)
newnode.next=self.head
if self.head==None:
self.head=newnode
self.tail=newnode
self.tail.next=self.head
self.size+=1
51
Insert at Last:
def insert_last(self,data):
newnode=L.Node(data)
if self.tail.data==None:
self.head=self.tail=newnode
else:
self.tail.next=newnode;
self.tail=newnode
self.tail.next=self.head
self.size+=1
Remove First:
def remove_first(self):
if self.head==None:
print("Invalid")
else:
self.tail.next=self.head.next
self.head=self.head.next
self.size-=1
def length(self):
return self.size
52
Doubly Linked Lists
A linked list in which each node keeps an explicit reference to the node before it
and a reference to the node after it is known as a doubly linked list.
Advantages:
Deletion operation is easier.
Finding the predecessor and successor of node is easier.
Newnode
Algorithm
1. Create a newnode
2. If there is no list already, make newnode as Head and Tail.
3. else Find the node predata.
4. Update,
Newnode’s next = predata.next
predate.next.prev =newnode.
newnode.prev = predata.
predata.next = newnode.
5. Increase the size.
53
Deletion in Doubly Linked List
Algorithm:
def __init__(self):
self.head=None
self.tail=None
self.size=0
def len(self):
return self.size
def insert(self,predata,data):
newnode=List.Node(data)
if(self.head==None):
self.head=newnode
self.tail=newnode
self.size+=1
else:
temp=self.head
while(temp!=None):
if(temp.data==predata):
newnode.next=temp.next
temp.next.prev=newnode
temp.next=newnode
54
newnode.prev=temp
self.size+=1
break
temp=temp.next
else:
print("data not found")
def remove(self,x):
print("size:",self.size)
temp=self.head
if(self.head==None):
print("Empty List")
return
elif(self.head.data==x and self.size==1):
self.head=None
self.tail=None
self.size-=1
print("First node deleted")
return
elif(self.head.data==x):
self.head=self.head.next
self.head.prev=None
self.size-=1
return
else:
while(temp.data!=x):
temp=temp.next
else:
print("Data not found")
return
temp.prev.next=temp.next
temp.next.prev=temp.prev
self.size-=1
return
def display(self):
if self.head==None:
print("List empty")
else:
n=L.Node(None)
n=self.head
for i in range(self.size):
print(n.data)
n=n.next
55
Stack ADT
A stack is an ordered list in which all insertions and deletions are made at one
end,called the top.
Stack is a list with the restriction that insertions and deletions can be performed in
only one position, namely the end of the list called Top.
It follows LIFO approach. LIFO represents “Last In First Out”. The basic
operations are push and pop.
PushEquivalent to insert.
Stack Model:
Pop Push(x)
Stack
Pop
Push
3
Top/Tos
10
6
4
5
56
Implementation of Stack:
There are two methods of implementing stack operations.
Array implementation
Linked List implementation
Push Operation:
The process of putting a new data element onto stack is known as a Push
Operation.
Push operation involves a series of steps –
Step 1 − Checks if the stack is full.
Step 2 − If the stack is full, produces an error and exit.
Step 3 − If the stack is not full, increments top to point next empty space.
Step 4 − Adds data element to the stack location, where top is pointing.
Pop Operation:
POP operation is performed on the stack to remove items from the stack
Pop operation involves a series of steps –
Step1 - Check if top== (-1) then stack is empty else goto step 4
Step 2 - Access the element top is pointing num = stk[top];
Step 3 - Decrease the top by 1 top = top-1;
isFull():
To check whether the stack is full or not before every push operation.
isEmpty():
To check whether the stack is Empty or not before every pop operation.
57
Linked List implementation of a Stack:
class Stack:
class node:
def __init__(self,data):
self.data=data
self.next=None
def __init__(self):
self.top=None
self.size=0
def push(self,data):
newnode=Stack.node(data)
if(self.top==None):
self.top=newnode
else:
newnode.next=self.top
self.top=newnode
self.size+=1
def pop(self):
if(self.isempty()):
print("Stack is empty")
else:
self.top=self.top.next
self.size-=1
def isFull():
return(self.size==MaxSize)
def isempty(self):
return(self.size==0)
def length(self):
return(self.size)
def display(self):
temp=self.top
for i in range(self.size):
print(temp.data)
temp=temp.next
58
Queue ADT
Queue is an ordered collection of data items. It delete item at front of the queue. It
inserts item at rear of the queue. It has FIFO structure i.e. “First In First Out”.
Queue Model:
Queue Operations:
1. Enqueue:
To add an item to the queue. If the queue is full, then it is said to be an Overflow
condition.
Code for Enqueue:
def Enqueue(self,data):
newnode=Queue.Node(data)
if(self.size==0):
self.Front=self.Rear=newnode
else:
self.Rear.next=newnode
self.Rear=newnode
self.size+=1
2. Dequeue:
Dequeue: Removes an item from the queue. The items are popped in the same
order in which they are pushed. If the queue is empty, then it is said to be an Underflow
condition.
Code for Dequeue:
def Dequeue(self):
if(self.size==0):
print("Queue is Empty")
59
else:
self.Front=self.Front.next
self.size-=1
3. isFull:
To check whether the queue is full before every Enqueue Operation.
Code for isFull:
def isFull(self):
return(self.size==MaxSize)
def __init__(self):
self.Front=None
self.Rear=None
self.size=0
def Enqueue(self,data):
newnode=Queue.Node(data)
if(self.size==0):
self.Front=self.Rear=newnode
else:
self.Rear.next=newnode
self.Rear=newnode
self.size+=1
def Dequeue(self):
if(self.size==0):
print("Queue is Empty")
else:
self.Front=self.Front.next
self.size-=1
def isempty(self):
return(self.size==0)
def length(self):
return(self.size)
60
Double Ended Queue – Deque:
Data structure that supports insertion and deletion at both the front and the back of
the queue is called a double ended queue, or deque.
class Deque:
class Node:
def __init__(self,data):
self.data=data
self.prev=None
self.next=None
def __init__(self):
self.Front=None
self.Rear=None
self.size=0
def Enqueue_Front(self,data):
newnode=Deque.Node(data)
if(self.isempty()):
self.Front=self.Rear=newnode
else:
newnode.next=self.Front
self.Front.prev=newnode
self.Front=newnode
self.size+=1
def Enqueue_Rear(self,data):
newnode=Deque.Node(data)
if(self.isempty()):
61
self.Front=self.Rear=newnode
else:
self.Rear.next=newnode
newnode.prev=self.Rear
self.Rear=newnode
self.size+=1
def Dequeue_Front(self):
if(self.isempty()):
print("Deque is Empty")
else:
self.Front=self.Front.next
self.size-=1
def Dequeue_Rear(self):
if(self.isempty()):
print("Deque is Empty")
else:
self.Rear=self.Rear.prev
self.Rear.next=None
self.size-=1
def isempty(self):
return(self.size==0)
def display(self):
temp=self.Front
for i in range(self.size):
print(temp.data)
temp=temp.next
62
UNIT II – 2 Marks Questions with Answers
5. What is abstract data type? What are all not concerned in an ADT?
The abstract data type is a triple of D i.e. set of axioms, F-set of functions and A-
Axioms in which only what is to be done is mentioned but how is to be done is not
mentioned.
Thus ADT is not concerned with implementation details.
6. List out the areas in which data structures are applied extensively.
Following are the areas in which data structures are applied extensively.
Operating system- the data structures like priority queues are used for
scheduling the jobs in the operating system.
63
Compiler design- the tree data structure is used in parsing the source program.
Stack data structure is used in handling recursive calls.
Database management system- The file data structure is used in database
management systems. Sorting and searching techniques can be applied on these
data in the file.
Numerical analysis package- the array is used to perform the numerical
analysis on the given set of data.
Graphics- the array and the linked list are useful in graphics applications.
Artificial intelligence- the graph and trees are used for the applications like
building expression trees, game playing.
64
Doubly Linked List Structure
12. State the properties of LIST abstract data type with suitable example.
Various properties of LIST abstract data type are
It is linear data structure in which the elements are arranged adjacent to each other.
It allows to store single variable polynomial.
If the LIST is implemented using dynamic memory then it is called linked list.
Example of LIST are- stacks, queues, linked list.
13. State the advantages of circular lists over doubly linked list.
In circular list the next pointer of last node points to head node, whereas in doubly
linked list each node has two pointers: one previous pointer and another is next pointer. The
main advantage of circular list over doubly linked list is that with the help of single pointer
field we can access head node quickly. Hence some amount of memory get saved because
in circular list only one pointer is reserved.
14. What are the advantages of doubly linked list over singly linked list?
The doubly linked list has two pointer fields. One field is previous link field and
another is next link field. Because of these two pointer fields we can access any node
efficiently whereas in singly linked list only one pointer field is there which stores forward
pointer.
65
The linked list makes use of the dynamic memory allocation. Hence the user can
allocate or de allocate the memory as per his requirements. On the other hand, the array
makes use of the static memory location. Hence there are chances of wastage of the
memory or shortage of memory for allocation.
17. What is the circular linked list?
The circular linked list is a kind of linked list in which the last node is connected to the
first node or head node of the linked list.
Singly circular linked list
A A
A
2 3
1
66
The various operations that are performed on the stack are CREATE(S) – Creates S
as an empty stack. PUSH(S,X) – Adds the element X to the top of the stack. POP(S) –
Deletes the top most elements from the stack. TOP(S) – returns the value of top element
from the stack. ISEMTPTY(S) – returns true if Stack is empty else false. ISFULL(S) - returns
true if Stack is full else false.
28. Write down the function to insert an element into a queue, in which the queue is
implemented as an array.
void enqueue (int X, Queue Q)
{
if(IsFull(Q))
67
Error (“Full queue”);
else
{
Q->Size++;
Q->Rear = Q->Rear+1;
Q->Array[ Q->Rear ]=X;
}
}
68
UNIT – III SORTING AND SEARCHING
Bubble sort – selection sort – insertion sort – merge sort – quick sort – linear
search – binary search – hashing – hash functions – collision handling – load factors,
rehashing, and efficiency.
Sorting:
1) Bubble Sort:
There is a simple, but inefficient algorithm, called bubble-sort, for sorting a list L of n
comparable elements. This algorithm scans the list n−1 times, where, in each scan, the
algorithm compares the current element with the next one and swaps them if they are out of
order.
This algorithm uses multiple passes and in each pass the first and second data items
are compared.
If the first data item is bigger than the second, then the two items are swapped.
Next the items in second and third position are compared and if the first one is
larger than the second, then they are swapped, otherwise no change in their
order.
This process continues for each successive pair of data items until all items are
sorted.
Algorithm:
#Bubble Sort
Function BubbleSort(A):
for i in range(len(A)):
for j in range(len(A)-1):
if(A[j]>A[j+1]):
A[j],A[j+1]=A[j+1],A[j]
Output:
print(A)
[1, 2, 3, 4, 7, 8, 9]
BubbleSort([4,2,7,3,1,8,9])
69
Step-by-step example:
Let us take the array of numbers "6 2 5 3 9", and sort the array from lowest number
to greatest number using bubble sort.
In each step, elements written in bold are being compared. Three passes will be
required.
Time Complexity:
The efficiency of Bubble sort algorithm is independent of number of data items in the
array and its initial arrangement. If an array containing n data items, then the outer loop
executes n-1 times as the algorithm requires n-1 passes.
In the first pass, the inner loop is executed n-1 times; in the second pass, n-2 times;
in the third pass, n-3 times and so on. The total number of iterations resulting in a run time of
O(n2).
Worst Case Performance O(n2)
Best Case Performance O(n2)
Average Case Performance O(n 2)
The total no. of iterations for the inner loop will be the sum of the first n - 1 integers,
which Equals resulting in a run time of O(n 2).
70
2) Selection Sort
Selection sort algorithm is one of the simplest sorting algorithm, which sorts the
elements in an array by finding the minimum element in each pass from unsorted part and
keeps it in the beginning. .
This sorting technique improves over bubble sort by making only one exchange in
each pass. This sorting technique maintains two sub arrays, one sub array which is already
sorted and the other one which is unsorted. In each iteration the minimum element
(ascending order) is picked from unsorted array and moved to sorted sub array.
Python Code
# Selection Sort
Function SelectionSort(A):
for i in range(len(A)):
min=i
for j in range(i+1,len(A)):
if(A[min]>A[j]):
min=j
A[i],A[min]=A[min],A[i]
print(A)
SelectionSort([3,20,1,4,5,2])
Step-by-step example:
Time Complexity:
Selection sort is not difficult to analyse compared to other sorting algorithms since
none of the loops depend on the data in the array. Selecting the lowest element requires
scanning all n elements (this takes n − 1 comparisons) and then swapping it into the first
position. Finding the next lowest element requires scanning the remaining n − 1 elements
and so on, for (n − 1) + (n − 2) + ... + 2 + 1 = n(n − 1) / 2 ∈ O(n2) comparisons. Each of
these scans requires one swap for n − 1 elements (the final element is already in place).
Worst Case Performance O(n2)
Best Case Performance O(n2)
Average Case Performance O(n2)
71
3) Insertion Sort:
We start with the first element in the array. One element by itself is already sorted.
Then we consider the next element in the array. If it is smaller than the first, we swap them.
Next we consider the third element in the array. We swap it leftward until it is in its
proper order with the first two elements. We then consider the fourth element, and swap it
leftward until it is in the proper order with the first three.
We continue in this manner with the fifth element, the sixth, and so on, until the whole
array is sorted.
Algorithm InsertionSort(A):
Input: An array A of n comparable elements
Output: The array A with elements rearranged in nondecreasing order
for k from 1 to n − 1 do
Insert A[k] at its proper location within A[0], A[1], ..., A[k].
Step-by-step example:
Algorithm:
Function InsertionSort(A):
for j in range(1,len(A)):
i=j
while(i>0):
if(A[i]<A[i-1]):
A[i],A[i-1]=A[i-1],A[i]
i-=1
print(A)
InsertionSort([5,14,30,2,1])
72
Time Complexity:
Worst Case Performance O(n2)
Best Case Performance(nearly) O(n)
Average Case Performance O(n 2)
4) Merge Sort:
Merge sort is based on Divide and conquer method. It takes the list to be sorted
and divide it in half to create two unsorted lists. The two unsorted lists are then sorted
and merged to get a sorted list. The two unsorted lists are sorted by continually calling the
Partition algorithm; we eventually get a list of size 1 which is already sorted. The two lists
of size 1 are then merged.
1. Divide the input which we have to sort into two parts in the middle. Call it the left
part and right part.
2. Sort each of them separately. Note that here sort does not mean to sort it using
some other method. We use the same function recursively.
3. Then merge the two sorted parts.
Step-by-step Example:
Algorithm:
Function merge(S1, S2, S):
i=j=0
while i + j < len(S):
if j == len(S2) or (i < len(S1) and S1[i] < S2[j]):
S[i+j] = S1[i]
i += 1
else:
S[i+j] = S2[j]
j += 1
73
Function Partition(S):
n = len(S)
if n < 2:
return
mid = n // 2
S1 = S[0:mid] # copy of first half
S2 = S[mid:n] # copy of second half
Partition(S1) # sort copy of first half
Partition(S2) # sort copy of second half
merge(S1, S2, S)
A=[85,24,63,450,170,31,96,50]
Partition(A)
print("Sorted Array is")
for i in range(len(A)):
print(A[i])
5) Quick Sort:
The quick sort algorithm also uses the divide and conquer strategy. But unlike
the merge sort, which splits the sequence of keys at the midpoint, the quick sort partitions
the sequence by dividing it into two segments based on a selected pivot key. In addition, the
quick sort can be implemented to work with virtual sub sequences without the need for
temporary storage.
Quick sort is a divide and conquer algorithm. Quick sort first divides a large list
into two smaller sublists: the low elements and the high elements. Quick sort can then
recursively sort the sub-lists.
74
Step-by-step Example 1:
Pivot=last
15 6 8 4 11 9 2 1 5
Left=0 position
left right Pivot
Right=last-1 position
If left < right,
1 6 8 4 11 9 2 15 5
If left <Pivot, move left to next
left right Pivot
If right>pivot, move right to prev
1 2 8 4 11 9 6 15 5 If left < right, swap left & right
left right Pivot Left=left+1, right=right-1
If left<right,
If left <Pivot, move left to next
If right>pivot, move right to prev
1 2 8 4 11 9 6 15 5
left right Pivot
If left < right, swap left & right
Left=left+1, right=right-1
9
1 2 4 5 15 8 If left >= right, swap left & pivot
6 Left, 11
Pivot
right
Lock pivot
(Do quicksort for left of 8 and right
1 2 4 5 8 11 15 9 of 8 separately.)
6
Left right Pivot Pivot=last
Left=next position
Right=last-1 position
15 If left<right,
1 2 4 5 6 8 9 11
Left, If left <Pivot, move left to next
Pivot
right If right>pivot, move right to prev
75
Example 2:
Algorithm:
Function Quicksort(S, l, r):
if l >= r:
return
pivot = S[r]
left = l
right = r-1
while left <= right:
while left <= right and S[left] < pivot:
left += 1
while left <= right and pivot < S[right]:
right-=1
if left <= right:
S[left], S[right] = S[right], S[left]
left = left + 1
right = right - 1
else:
S[left], S[pivot] = S[pivot], S[left]
Quicksort(S, l, left - 1)
Quicksort(S, left + 1, r)
A=[41,21,5,10,6,3]
Quicksort(A,0,5)
print(A)
76
Searching:
1) Linear Search:
The algorithm uses the guess and check pattern by first guessing that the
smallest item is the first item in the list and then checking the subsequent items to see if it
made an incorrect guess.
When the sequence is unsorted, the standard approach to search for a target
value is to use a loop to examine every element, until either finding the target or exhausting
the data set. This is known as the Linear or sequential search algorithm. This algorithm runs
in O(n) time (i.e., linear time) since every element is inspected in the worst case.
Example:
List : 10,51,2,18,4,31,13,5,23,64,29 Element to be searched : 31
Algorithm:
#Linear Search
Function LinearSearch(List,data):
for i in range(len(List)):
if(List[i]==data):
print(data,"present at position",i+1)
break
else:
print("Element not found!")
List=[10,51,2,18,4,31,13,5,23,64,29]
LinearSearch(List,31)
Output:
31 present at position 6
77
78
2) Binary Search:
Values stored in sorted order within an indexable sequence, such as Python list.
When the sequence is sorted and indexable, there is a much more efficient
algorithm. Initially, low = 0 and high = n− 1. We then compare the target value to the median
candidate, that is, the item data[mid] with index mid = (low +high)/2 .
79
Algorithm For Binary Search
Function BinarySearch(List,data,low,high):
if(low<=high):
mid=(low+high)//2
if(data==List[mid]):
print(data,"present at position",mid+1)
elif(data<List[mid]):
BinarySearch(List,data,low,mid-1)
else:
BinarySearch(List,data,mid+1,high)
else:
print("Element not found!")
List=[1,2,3,4,5,6,7,8,9,0]
BinarySearch(List,1,0,9)
Output:
1 present at position 1
80
Hashing:
What is Hashing?
Hashing in the data structure is a technique of mapping a large chunk of data into
small tables using a hashing function. It is also known as the message digest function. It is a
technique that uniquely identifies a specific item from a collection of similar items. It uses
hash tables to store the data in an array format.
Each value in the array has assigned a unique index number. Hash tables use a
technique to generate these unique index numbers for each value stored in an array format.
This technique is called the hash technique.
Hash table:
The hash table data structure is merely an array of some fixed size, containing the
keys. A key is a string with an associated value.
Each key is mapped into some number in the range 0 to tablesize-1 and placed in
the appropriate cell. In the following example, tablesize is 5 ie., 0 to 4.
21%5=1 1 21
18%5=3 2 32
32%5=2
3 18
Hash function:
A hash function is a key to address transformation which acts upon a given key to
compute the relative position of the key in an array.
The choice of hash function should be simple and it must distribute the data evenly.
Importance of hashing:
81
1. Division method: The hash function depends upon the remainder of division.
2. Mid square: In the mid square method, the key is squared and the middle or mid part of
the result is used as the index.
Consider that if we want to place a record 3111 then for the hash table size 1000
31112 = 9678321
H(3111) = 783 ( the middle 3 digits)
3. Digital Folding: The Key is divided into separate part and using some simple operation
these parts are combined to produce the hash key.
Collision:
Separate Chaining:
Separate chaining is a collision resolution technique to keep the list of all elements that
hash to the same value. This is called separate chaining because each hash table element
is a separate chain (linked list). Each linked list contains all the elements whose keys hash
to the same index.
More number of elements can be inserted as it uses linked lists. For ex, insert
18,54,28,25,41,38,36,12,90.
82
In the worst case, operations on an individual bucket take time proportional to the
size of the bucket. Assuming we use a good hash function to index the n items of our map
in a bucket array of capacity N, the expected size of a bucket is n/N.
Therefore, if given a good hash function, the core map operations run in O( n/N).
The ratio λ = n/N, called the load factor of the hash table, should be bounded by a small
constant, preferably below 1. As long as λ is O(1), the core operations on the hash table run
in O(1) expected time.
83
Open addressing:
Open addressing requires that the load factor is always at most 1 and that items are
stored directly in the cells of the bucket array itself.
(i) Linear probing - With this approach, if we try to insert an item (k,v) into a bucket A[ j]
that is already occupied, where j = h(k), then we next try A[(j +1) mod N]. If A[(j +1) mod N]
is also occupied, then we try A[(j + 2) mod N], and so on, until we find an empty bucket that
can accept the new item. Once this bucket is located, we simply insert the item there.
Example:
(ii) Quadratic probing - Another open addressing strategy, known as quadratic probing,
iteratively tries the buckets A[(h(k)+ f(i)) mod N], for i = 0,1,2,..., where f(i) = i 2, until finding an
empty bucket. As with linear probing, the quadratic probing strategy complicates the removal
operation, but it does avoid the kinds of clustering patterns that occur with linear probing.
Example:If we have to insert following elements in the hash table with size 10.
(22+12)%11=1
0 1 2 3 4 5 6 7 8 9 10
55 22 90 37 49 17 87
84
(iii) Double hashing - in which F(i)=i.hash2(X). This formula says that we apply a second
hash function to X and probe at a distance hash2(X), 2hash2(X),…., and so on.
In this approach, we choose a secondary hash function, h, and if h maps some key
k to a bucket A[h(k)] that is already occupied, then we iteratively try the buckets A[(h(k) +
f(i)) mod N] next, for i = 1,2,3,..., where f(i) = i · h (k). In this scheme, the secondary hash
function is not allowed to evaluate to zero; a common choice is h(k) = q−(k mod q), for
some prime number q < N. Also, N should be a prime.
A function such as hash2(X)=R-(X mod R), with R a prime smaller than Tablesize.
Example:
Insert 37, 90, 55, 22, 14 into a hash table with size 7 using Double Hashing method.
55 22 90 37 49 22 87
Linear probing has the best cache performance, but suffers from clustering. One
more advantage of Linear probing is easy to compute.
Quadratic probing lies between the two in terms of cache performance and
clustering.
Double hashing has poor cache performance but no clustering. Double hashing
requires more computation time as two hash functions need to be computed.
85
2 Mark Questions with Answers
1.Define Sorting.
2. What is searching?
t is a process of locating an element stored in a file or array.
Different searching methods are,
a. Linear Search(or Sequential Search)
b. Binary Search
Advantage of linear search method:
It is simple and useful when the elements to be searched are not in any definite
order.
86
4. Specify the space complexity of different sorting algorithm.
87
9. Define Hashing.
Hashing is the process of mapping large amount of data item to a smaller table with
the help of a hashing function. Modulo operator is used to get the key value from the
actual data/information.
The hash table data structure is merely an array of some fixed size, containing the
keys. A key is a string with an associated value.
Each key is mapped into some number in the range 0 to tablesize-1 and placed in
the appropriate cell. In the following example, tablesize is 5 ie., 0 to 4.
0
21%5=1 1 21
18%5=3 2 18
32%5=2 3 32
4
A hash function is a key to address transformation which acts upon a given key to
compute the relative position of the key in an array.
The choice of hash function should be simple and it must distribute the data evenly.
88
12. Write the importance of hashing.
13. What do you mean by collision in hashing? Name some collision resolution
techniques.
Separate chaining is a collision resolution technique to keep the list of all elements
that hash to the same value. This is called separate chaining because each hash table
element is a separate chain (linked list). Each linked list contains all the elements whose
keys hash to the same index.
More number of elements can be inserted as it uses linked lists. For ex, insert
12,17,22,24.
0
12%5=>2
1
17%5=>2
2
12 12 12
22%5=>2 3
24%5=>4 4 12
89
16. List some advantages and disadvantages of separate chaining?
Advantage:
1. Simple to implement.
2. Hash table never fills up, we can always add more elements to chain.
3. Less sensitive to the hash function or load factors.
4. It is mostly used when it is unknown how many and how frequently keys may be
inserted or deleted.
Disadvantages of separate chaining.
1. Cache performance of chaining is not good as keys are stored using linked list.
Open addressing provides better cache performance as everything is stored in same table.
2. Wastage of Space (Some Parts of hash table are never used)
3. If the chain becomes long, then search time can become O(n) in worst case.
4. Uses extra space for links.
In linear probing collision resolution strategy, even if the table is relatively empty,
blocks of occupied cells start forming. This effect is known as primary clustering means that
any key hashes into the cluster will require several attempts to resolve the collision and
then it will add to the cluster.
90
19. What are the types of collision resolution strategies in open addressing?
Quadratic probing - If collision occurs, alternative cells are tried until an empty cell
is found. In linear probing method, the hash table is represented one-dimensional array with
indices that range from 0 to the desired table.
In Quadratic probing the alternative cells are calculated using the formula, F(i) = i2.
Linear probing has the best cache performance, but suffers from clustering. One
more advantage of Linear probing is easy to compute.
Quadratic probing lies between the two in terms of cache performance and
clustering.
Double hashing has poor cache performance but no clustering. Double hashing
requires more computation time as two hash functions need to be computed.
Although quadratic probing eliminates primary clustering, elements that hash to the
same position will probe the same alternative cells. This is known as secondary clustering.
Building another table that is about twice as big with the associated new hash
function and scan down the entire original hash table, computing the new hash value for
each element and inserting it in the new table. This entire operation is called rehashing.
Advantage:
91
Slowing down of rehashing method.
22. What is the need for extendible hashing?
If either open addressing hashing or separate chaining hashing is used, the major
problem is that collisions could cause several blocks to be examined during a Find, even for
a well-distributed hash table. Extendible hashing allows a find to be performed in two disk
accesses. Insertions also require few disk accesses.
92
26.List out the applications of hashing.
A linear search scans one item at a time, without jumping to any item.
The worst case complexity is O(n), sometimes known an O(n) search
Time taken to search elements keep increasing as the number of elements are
increased.
After storing a large amount of data. Linear search and binary search perform
lookups/search with time complexity of O(n) and O(log n) respectively.
As the size of the dataset increases, these complexities also become significantly
high which is not acceptable.
We need a technique that does not depend on the size of data. Hashing allows
lookups to occur in constant time i.e. O(1).
93
UNIT IV TREE STRUCTURES
Tree ADT – Binary Tree ADT – tree traversals – binary search trees – AVL trees
– heaps – multiway search trees.
Tree ADT:
Tree is an abstract data type that stores elements hierarchically. With the exception
of the top element, each element in a tree has a parent element and zero or more children
elements.
A tree is usually visualized by placing elements inside ovals or rectangles, and by
drawing the connections between parents and children with straight lines.
Formal Tree Definition
Formally, we define a tree T as a set of nodes storing elements such that the nodes
have a parent-child relationship that satisfies the following properties:
If T is nonempty, it has a special node, called the root of T, which has no parent.
Each node v of T different from the root has a unique parent node w; every node with
parent w is a child of w.
Node Relationships
A node v is external, if v has no children. External nodes are also known as leaves.
A node v is internal if it has one or more children.
A node u is an ancestor of a node v, if u = v or u is an ancestor of the parent of v.
Conversely, we say that a node v is a descendant of a node u if u is an ancestor of
v.
A tree is ordered if there is a meaningful linear order among the children of each
node;
Path: Path refers to the sequence of nodes along the edges of a tree.
Root: The node at the top of the tree is called root. There is only one root per tree
and one path from the root node to any node.
Parent: Any node except the root node has one edge upward to a node called
parent.
Child: The node below a given node connected by its edge downward is called its
child node.
Sub tree: Sub tree represents the descendants of a node.
Traversing: Traversing means passing through nodes in a specific order.
Levels: Level of a node represents the generation of a node. If the root node is at
level 0, then its next child node is at level 1, its grandchild is at level 2, and soon.
Keys: Key represents a value of a node based on which a search operation is to be
carried out for a node.
Siblings: All the nodes that share the same parent are called siblings.
94
Depth: The depth of a node N is the length of the path from the root to the node N
Height: The Height of a node N is the length of the path from the node to the deepest
leaf.
Properties of Tree:
Every tree has a special node called the root node. The root node can be used to
traverse every node of the tree. It is called root because the tree originated from root
only.
If a tree has N vertices(nodes) than the number of edges is always one less than the
number of nodes(vertices) i.e N-1. If it has more than N-1 edges it is called a graph
not a tree.
Every child has only a single Parent but Parent can have multiple child.
Example
95
if self.is root(p):
return 0
else:
return 1 + self.depth(self.parent(p))
Height
The height of a position p in a tree T is also defined recursively:
If p is a leaf, then the height of p is 0.
Otherwise, the height of p is one more than the maximum of the heights of p’s
children. The height of a nonempty tree T is the height of the root of T.
def height(self, p):
if self.is leaf(p):
return 0
else:
return 1 + max(self. height2(c) for c in self.children(p))
Types of Tree:
1. Binary Tree
Binary tree is the type of tree in which each parent can have at most two children.
The children are referred to as left child or right child.
96
3. AVL Tree:
AVL tree is a self-balancing binary search tree. In AVL tree, the heights of children of
a node differ by at most 1. The valid balancing factor in AVL tree are 1, 0 and -1. When a
new node is added to the AVL tree and tree becomes unbalanced then rotation is done to
make sure that the tree remains balanced.
4. B-tree
B-tree is another self-balancing search tree that comprises many nodes to keep data
stored in a particular order. Each node has over two child nodes and each node comprises
multiple keys. B-trees are compatible with file systems and databases that can write and
read larger blocks of data.
5. N-ary Tree:
In an N-ary tree, the maximum number of children that a node can have is limited to
N. A binary tree is 2-ary tree as each node in binary tree has at most 2 children. Trie data
structure is one of the most commonly used implementation of N-ary tree. A full N-ary tree is
a tree in which children of a node is either 0 or N. A complete N-ary tree is the tree in which
all the leaf nodes are at the same level.
97
Advantages of Tree:
The tree reflects the data structural connections.
The tree is used for hierarchy.
It offers an efficient search and insertion procedure.
The trees are flexible. This allows subtrees to be relocated with minimal effort.
Tree Traversal:
Traversal is a process to visit all the nodes of a tree and may print their values too.
Because, all nodes are connected via edges (links) we always start from the root (head)
node. That is, we cannot randomly access a node in a tree. There are three ways which we
use to traverse a tree −
• In-order Traversal
• Pre-order Traversal
• Post-order Traversal
Generally, we traverse a tree to search or locate a given item or key in the tree or to
print all the values it contains.
In-order Traversal
In this traversal method, the left sub tree is visited first, then the root and later the
right sub-tree. We should always remember that every node may represent a sub tree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an
ascending order.
We start from A, and following in-order traversal, we move to its left subtree B. B is
also traversed in- order. The process goes on until all the nodes are visited. The output of
inorder traversal of this tree wills be−
D→B→E→A→F→C→G
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Visit root node.
Step 3 − Recursively traverse right subtree.
98
Python Code for inorder traversal:
def Inorder(self):
if self.left:
self.left.Inorder()
print( self.data)
if self.right:
self.right.Inorder()
Pre-order Traversal
In this traversal method, the root node is visited first, then the left subtree and finally
the right subtree.
We start from A, and following pre-order traversal, we first visit A itself and then move
to its left subtree B. B is also traversed pre-order. The process goes on until all the nodes
are visited. The output of pre- order traversal of this tree willbe−
A → B → D → E → C → F →G
Algorithm
Until all nodes are traversed −
Step 1 − Visit root node.
Step 2 − Recursively traverse left subtree.
Step 3 − Recursively traverse right subtree.
99
Post-order Traversal
In this traversal method, the root node is visited last, hence the name. First we
traverse the left subtree, then the right subtree and finally the root node.
We start from A, and following Post-order traversal, we first visit the left subtree B. B is also
traversed post-order. The process goes on until all the nodes are visited. The output of post-
order traversal of this tree will be −
D→E→B→F→G→C→A
Algorithm
Until all nodes are traversed −
Step 1 − Recursively traverse left subtree.
Step 2 − Recursively traverse right subtree.
Step 3 − Visit root node.
100
Binary Tree:
A binary tree is an ordered tree with the following properties:
1. Every node has at most two children.
2. Each child node is labeled as being either a left child or a right child.
3. A left child precedes a right child in the order of children of a node.
The subtree rooted at a left or right child of an internal node v is called a left subtree
or right subtree, respectively, of v.
101
Complete Binary Tree
Decision trees: Full Binary Tree Perfect Binary Tree
To represent a number of different outcomes that can result from answering a
series of yes-or-no questions. Each internal node is associated with a question. Starting at
the root, go to the left or right child of the current node, depending on whether the answer
to the question is “Yes” or “No.” Such binary trees are known as decision trees.
Decision Trees
Arithmetic expression tree:
An arithmetic expression can be represented by a binary tree whose leaves are
associated with variables or constants, and whose internal nodes are associated with one
of the operators +, −, ×, and /. Such tree is called arithmetic expression tree.
102
Properties of Binary Tree
Example:
Algorithm:
def height(self,root):
if root is None:
return 0;
l=self.height(root.left)
r=self.height(root.right)
return max(l,r)+1
Parent of 2 is 1. Parent of 5 is 2.
103
Algorithm:
def parent(self,data):
if self.data==data:
print(data,"is the root")
elif self.left.data==data or self.right.data==data:
print(self.data)
elif data<self.data and self.left is not None:
self.left.parent(data)
elif data>self.data and self.right is not None:
self.right.parent(data)
else:
print("No such data")
Inserting a New node:
For inserting a node in a binary tree you will have to check the following conditions:
If a node in the binary tree does not have its left child, then insert the given node (the
one that we have to insert) as its left child.
If a node in the binary tree does not have its right child then insert the given node as
its right child.
If the above-given conditions do not apply then search for the node which does not
have a child at all and insert the given node there.
Example:
Algorithm:
def insert(self,data):
if self.data:
if data<self.data:
if self.left is None:
self.left=Node(data)
else:
self.left.insert(data)
elif data>self.data:
if self.right is None:
self.right=Node(data)
else:
self.right.insert(data)
else:
self.data = data
104
Finding / Searching a node:
Procedure:
a) It checks whether the root is null, which means the tree is empty.
b) If the tree is not empty, it will compare root’s data with value. If they are equal, it
will set the flag to true and return.
c) Traverse left subtree by calling searchNode() recursively and check whether the
value is present in left subtree.
d) Traverse right subtree by calling searchNode() recursively and check whether the
value is present in the right subtree.
Algorithm:
def search(self,data):
if self.data==data:
print("Data found")
elif(data<self.data and self.left is not None):
self.left.search(data)
elif(data>self.data and self.right is not None):
self.right.search(data)
else:
print("Not Found")
105
Binary Search Tree:
Ordered sequence of elements in a binary tree, is called binary search tree. A binary
search tree for S is a binary tree T such that, for each position p of T:
Binary search tree hierarchically represents the sorted order of its keys. An inorder
traversal of a binary search tree visits positions in increasing order of their keys.
106
Algorithm for Searching:
def search(self,data):
if self.data==data:
print("Data found")
elif(data<self.data and self.left is not None):
self.left.search(data)
elif(data>self.data and self.right is not None):
self.right.search(data)
else:
print("Not Found")
107
Algorithm for Binary Search Tree Insertion
def insert(self,data):
if self.data:
if(data<self.data and self.left is None):
self.left=BinarySearchTree(data)
elif(self.left is not None):
self.left.insert(data)
elif(data>self.data and self.right is None):
self.right=BinarySearchTree(data)
elif(self.right is not None):
self.right.insert(data)
else:
self.data=data
108
Figure: Before and After deleting 7
Algorithm for Deletion in BST:
def delete(self, root):
if self.root == None:
return self. root
if value == self. root.value:
if self.left == None:
self.data = self.data.right
elif self.data.right == None:
self.data = self.data.left
else:
self. root = findsuccessor(self. root)
self. root.right = delete(self. root.right, self. root.value)
elif value < self. root.value:
self. root.left = self.delete(self. root.left, value)
else:
self. root.right = self.delete(self. root.right, value)
return self. root
class BinarySearchTree:
def __init__(self,data):
self.left=None
self.data=data
self.right=None
self.root=self.data
# 4,3,5,1,2
def insert(self,data):
if self.data:
if(data<self.data and self.left is None):
109
self.left=BinarySearchTree(data)
elif(self.left is not None):
self.left.insert(data)
elif(data>self.data and self.right is None):
self.right=BinarySearchTree(data)
elif(self.right is not None):
self.right.insert(data)
else:
self.data=data
def findMin(self):
if self.data:
if self.left is not None:
self.left.findMin()
else:
print(self.data)
else:
print("Tree Not Found")
def findMax(self):
if self.data:
if self.right is not None:
self.right.findMin()
else:
print(self.data)
else:
print("Tree Not Found")
def parent(self,data):
110
if self.data==data:
print(data,"is the root")
elif self.left.data==data or self.right.data==data:
print(self.data)
elif data<self.data and self.left is not None:
self.left.parent(data)
elif data>self.data and self.right is not None:
self.right.parent(data)
else:
print("No such data")
def inorder(self):
if self.data is not None:
if self.left is not None:
self.left.inorder()
print(self.data)
if self.right is not None:
self.right.inorder()
111
AVL Tree
A tree is called an AVL tree if each node of the tree possesses one of the following
properties:
A node is called left heavy if the longest path in its left subtree is one longer than the
longest path of its right subtree
A node is called right heavy if the longest path in the right subtree is one longer than
the path in its left subtree
A node is called balanced if the longest path in both the right and left subtree are
equal.
AVL tree is a height-balanced tree where the difference between the heights of the
right subtree and left subtree of every node is either -1, 0 or 1. The difference between the
heights of the subtree is maintained by a factor named as balance factor. Therefore, we can
define AVL as it is a balanced binary search tree where the balance factor of every node in
the tree is either -1, 0, or +1. Here, the balance factor is calculated by the formula:
As AVL is the height-balanced tree, it helps to control the height of the binary search
tree and further help the tree to prevent skewing. When the binary tree gets skewed, the
running time complexity becomes the worst-case scenario i.e O(n) but in the case of the AVL
tree, the time complexity remains O(logn). Therefore, it is always advisable to use an AVL
tree rather than a binary search tree.
Every AVL Tree is a binary search tree but every Binary Search Tree need not be
AVL Tree.
112
AVL Rotation
When certain operations like insertion and deletion are performed on the AVL tree,
the balance factor of the tree may get affected. If after the insertion or deletion of the
element, the balance factor of any node is affected then this problem is overcome by using
rotation. Therefore, rotation is used to restore the balance of the search tree. Rotation is the
method of moving the nodes of trees either to left or to right to make the tree heighted
balance tree.
There are total two categories of rotation which is further divided into two further parts:
1) Single Rotation
Single rotation switches the roles of the parent and child while maintaining the search
order. We rotate the node and its child, the child becomes a parent.
113
#Python code for rotation with right data
def rRotate(self, z):
y = z.right
z.right = y.left
y.left = z
z.height = 1 + max(self.getHeight(z.left),self.getHeight(z.right))
y.height = 1 + max(self.getHeight(y.left),self.getHeight(y.right))
return y
2) Double Rotation
Single rotation does not fix the LR rotation and RL rotation. For this, we require
double rotation involving three nodes. Therefore, double rotation is equivalent to the
sequence of two single rotations.
LR(Left-Right) Rotation
The LR rotation is the process where we perform a single left rotation followed by a
single right rotation. Therefore, first, every node moves towards the left and then the node of
this new tree moves one position towards the right. Let us see the below example
RL (Right-Left) Rotation
114
The RL rotation is the process where we perform a single right rotation followed by a
single left rotation. Therefore, first, every node moves towards the right and then the node of
this new tree moves one position towards the left. Let us see the below example
115
Operations In AVL Tree
There are 2 major operations performed on the AVL tree
1. Insertion Operation
2. Deletion Operation
1. Find the appropriate empty subtree where the new value should be added by
comparing the values in the tree
2. Create a new node at the empty subtree
3. The new node is a leaf ad thus will have a balance factor of zero
4. Return to the parent node and adjust the balance factor of each node through the
rotation process and continue it until we are back at the root. Remember that the
modification of the balance factor must happen in a bottom-up fashion
Example:
The root node is added as shown in the below figure
The node to the root node is added as shown below. Here the tree is balanced
Then, The right child is added to the parent node. Here, the balance factor of the tree is
changed, therefore, the LL rotation is performed and the tree becomes a balanced tree
116
Later, one more right child is added to the new tree as shown below
Again further, one more right child is added and the balance factor of the tree is changed.
Therefore, again LL rotation is performed on the tree and the balance factor of the tree is
restored as shown in the below figure
117
return self.lRotate(root)
if b < -1 and key > root.right.value:
return self.rRotate(root)
if b > 1 and key > root.left.value:
root.left = self.rRotate(root.left)
return self.lRotate(root)
if b < -1 and key < root.right.value:
root.right = self.lRotate(root.right)
return self.rRotate(root)
return root
Deletion Operation In AVL
The deletion operation in the AVL tree is the same as the deletion operation in BST.
In the AVL tree, the node is always deleted as a leaf node and after the deletion of the node,
the balance factor of each node is modified accordingly. Rotation operations are used to
modify the balance factor of each node. The algorithm steps of deletion operation in an AVL
tree are:
Example:
Let us consider the below AVL tree with the given balance factor as shown in the
figure below
Here, we have to delete the node '25' from the tree. As the node to be deleted does
not have any child node, we will simply remove the node from the tree
118
After removal of the tree, the balance factor of the tree is changed and therefore, the
rotation is performed to restore the balance factor of the tree and create the perfectly
balanced tree.
class treeNode:
class AVLTree:
b = self.getBal(root)
if b > 1 and key < root.left.value:
return self.rRotate(root)
if b < -1 and key > root.right.value:
return self.lRotate(root)
if b > 1 and key > root.left.value:
root.left = self.lRotate(root.left)
return self.rRotate(root)
if b < -1 and key < root.right.value:
119
root.right = self.rRotate(root.right)
return self.lRotate(root)
return root
120
def preOrder(self, root):
if not root:
return
print("{0} ".format(root.value), end="")
self.preOrder(root.left)
self.preOrder(root.right)
Tree = AVLTree()
root = None
root = Tree.insert(root, 1)
root = Tree.insert(root, 2)
root = Tree.insert(root, 3)
root = Tree.insert(root, 4)
root = Tree.insert(root, 5)
root = Tree.insert(root, 6)
Heap:
Heap is a data structure that follows a complete binary tree's property and satisfies
the heap property. Therefore, it is also known as a binary heap. As we all know, the
complete binary tree is a tree with every level filled and all the nodes are as far left as
possible. In the binary tree, it is possible that the last level is empty and not filled.
In the heap data structure, we assign key-value or weight to every node of the tree.
Now, the root node key value is compared with the children’s nodes and then the tree is
arranged accordingly into two categories i.e., max-heap and min-heap.
121
Heapify:
The process of creating a heap data structure using the binary tree is called Heapify.
The heapify process is used to create the Max-Heap or the Min-Heap.
Min Heap
When the value of each internal node is smaller than the value of its children node
then it is called the Min-Heap Property. Also, in the min-heap, the value of the root node is
the smallest among all the other nodes of the tree. Therefore, if “a” has a child node “b” then
#Python Code
defmin_heapify(A,k):
l = left(k)
r = right(k)
if l < len(A) and A[l] < A[k]:
smallest = l
else:
smallest = k
if r < len(A) and A[r] < A[smallest]:
smallest = r
if smallest != k:
A[k], A[smallest] = A[smallest], A[k]
min_heapify(A, smallest)
122
defleft(k):
return2 * k + 1
defright(k):
return2 * k + 2
defbuild_min_heap(A):
n = int((len(A)//2)-1)
for k in range(n, -1, -1):
min_heapify(A,k)
A = [3,9,2,1,4,5]
build_min_heap(A)
print(A)
Max Heap
When the value of each internal node is greater than the value of its children node
then it is called the Max-Heap Property. Also, in the max-heap, the value of the root node is
the greatest among all the other nodes of the tree. Therefore, if “a” has a child node “b” then
#Python Code
defmax_heapify(A,k):
l = left(k)
r = right(k)
if l < len(A) and A[l] > A[k]:
max = l
else:
max = k
if r < len(A) and A[r] > A[max]:
max = r
if max != k:
123
A[k], A[max] = A[max], A[k]
max_heapify(A, max)
defleft(k):
return2 * k + 1
defright(k):
return2 * k + 2
defbuild_max_heap(A):
n = int((len(A)//2)-1)
for k in range(n, -1, -1):
max_heapify(A,k)
A = [3,9,2,1,4,5]
build_max_heap(A)
print(A)
Time complexity
The running time complexity of the building heap is O(n log(n)) where each call for
heapify costs O(log(n)) and the cost of building heap is O(n). Therefore, the overall time
complexity will be O(n log(n)).
Applications of Heap
124
Multiway Search Tree:
A multiway search tree or (2-4) Search tree is one with nodes that have two or more
children. Each internal nodes may have more than two children. Root can have maximum of
two children.
Properties:
Each internal node of T has at least two children. That is, each internal node
is a d-node such that d ≥ 2.
Each internal d-node w of T with children c1,...,cd stores an ordered set of d
−1 key-value pairs (k1,v1),..., (kd−1,vd−1), where k1 ≤···≤ kd−1.
Let us conventionally define k0 = −∞ and k d = +∞. For each item (k,v) stored
at a node in the subtree of w rooted at ci, i = 1,...,d, we have that k i−1 ≤ k ≤ ki.
Example:
(2,4)-Tree Operations:
A multiway search tree that keeps the secondary data structures stored at each node
small and also keeps the primary multiway tree balanced is the (2,4) tree, which is
sometimes called a 2-4 tree or 2-3-4 tree. This data structure achieves these goals by
maintaining two simple properties,
Size Property: Every internal node has at most four children.
Depth Property: All the external nodes have the same depth
125
Insertion in (2-4) Tree:
126
Deletion in (2-4) Tree:
127
2 Mark Question with answers
1. What is tree?
A tree is an abstract data type that stores elements hierarchically. With the exception
of the top element, each element in a tree has a parent element and zero or more children
elements.
2. What is sibling?
Two nodes that are children of the same parent are siblings.
128
6. Give the array representation of the given binary tree?
129
12. What is a strictly binary tree?
The binary tree, in which every non-leaf node has nonempty left and right sub-trees,
is called a strictly binary tree.
130
16. Write the in-order, pre-order, post-order and Breadth-First or Level Order Traversal
for the given tree.
Inorder : 3 7 8 6 11 2 5 4 9
Preorder : 2 7 3 6 8 11 5 9 4
Postorder : 3 8 11 6 7 4 9 5 2
18. What is Degree of a node in a tree? What is the Degree of and B for the given tree?
20. What is Height of a node in a tree? What is the height of E in the given node?
131
If ‘p’ is a leaf, then the height of p is 0.
Otherwise, the height of p is one more than the maximum of the heights of p’s
children.
The height of a nonempty tree T is the height of the root of T.
Binary Tree
132
Let p be a pointer to a node and x be the information. Now, the basic operations are:
i) info(p)
ii) father(p) / parent(p)
iii) left(p)
iv) right(p)
v) brother(p)
vi) isleft(p)
vii) isright(p)
133
32. What is a binary search tree?
A binary tree in which all the elements in the left sub-tree of a node n are less than
the contents of n, and all the elements in the right sub-tree of n are greater than or equal to
the contents of n is called a binary search tree.
33. How can you say recursive procedure is efficient than non-recursive?
There is no extra recursion. The automatic stacking and unstacking make it more
efficient. There are no extraneous parameters and local variables used.
134
38. What are the applications of Heap?
Heap Implemented priority queues are used in Graph algorithms like Prim‟s
Algorithm and Dijkstra‟s algorithm.
Order statistics: The Heap data structure can be used to efficiently find the kth
smallest (or largest) element in an array.
Priority Queues: Priority queues can be efficiently implemented using Binary Heap
because it supports insert(), delete() and extractmax(), decreaseKey() operations in
O(logn) time.
39. In a binary max heap containing n numbers, the smallest element can be found in
time.
Time complexity : O(n) In a max heap, the smallest element is always present at a
leaf node. So we need to check for all leaf nodes for the minimum value. Worst case
complexity will be O(n).
135
heapify(iterable) :- This function is used to convert the iterable into a heap data
structure. i.e. in heap order.
heappush(heap, ele) :- This function is used to insert the element mentioned in its
arguments into heap. The order is adjusted, so as heap structure is maintained.
heappop(heap) :- This function is used to remove and return the smallest element
from heap. The order is adjusted, so as heap structure is maintain.
Max Heapify
136
137
UNIT V - GRAPH STRUCTURES
Graph ADT – representations of graph – graph traversals – DAG – topological
ordering – shortest paths – minimum spanning trees.
Graphs:
A graph is a way of representing relationships that exist between pairs of objects.
That is, a graph is a set of objects, called vertices, together with a collection of pairwise
connections between them, called edges.It can also be represented as G=(V, E).
Vertex − Each node of the graph is represented as a vertex. In the following example, the
labelled circle represents vertices. Thus, A to E are vertices.
Edge − Edge represents a path between two vertices or a line between two vertices. In the
following example, the lines from A to B, B to E, and so on represents edges.
Adjacency − Two node or vertices are adjacent if they are connected to each other through
an edge. In the following example, B is adjacent to A, D is adjacent to B, and so on.
Path − Path represents a sequence of edges between the two vertices. In the following
example,
138
ABDE represents a path from A to D.
Directed path - is a path such that all edges are directed and are traversed along their
direction.
Length - The no of edges in a path is called as length of the path in a graph. For example,
the length of the path (A,D) in a above graph is 2 because it contains two edges there are
(A,B),(B,D).
139
subgraph - A subgraph of a graph G is a graph H whose vertices and edges are subsets of
the vertices and edges of G, respectively.
Tree - A tree is a connected forest, that is, a connected graph without cycles.
Types of Graphs
1.Directed Graph
Directed graph is a graph in which edges are directed. Here each edge is unidirectional. In
directed graph the edges (A,C) is not same as (C,A). It is also called as digraph.
2. Undirected Graph
Undirected graph is a graph in which edges are undirected. Here each edge is Bidirectional.
In undirected graph, (A,C) = (C,A )
3. Weighted Graph
Weighted graph is a graph in which edges are assigned by some a weight or value.
This value is considered as cost/distance of traversing from one vertex to another vertex.
Weighted graph can be either directed or undirected.
140
4. Complete Graph
Complete graph is a graph in which there is an edge between each pair of vertices.
Here there is a path from each vertex to every other vertex. A complete graph with n vertices
should have n(n-1)/2 edges.
5. Cyclic Graph
Cyclic graph is a graph which has cycles. Cycle is a path which starts and ends at
same vertex.
6. Acyclic Graph
Acyclic graph is a graph in which does not have cycles in it. It is also called as Directed
Acyclic Graph(DAG).
141
1. Edge List:
It maintains an unordered list of all edges, but there is no efficient way to locate a
particular edge (u,v), or the set of all edges incident to a vertex v.
142
Performance of the Adjacency List Structure:
O(n + m) for representing a graph with n vertices and m edges.
Each individual vertex or edge instance uses O(1) space.
Vertex count and edge count methods run in O(1) time.
Methods vertices and edges run respectively in O(n) and O(m) time.
143
4. Adjacency Matrix Structure:
It provides worst-case O(1) access to a specific edge (u,v) by maintaining an n × n matrix,
for a graph with n vertices. Each entry is dedicated to storing a reference to the edge (u,v)
for a particular pair of vertices u and v if no such edge exists, the entry will be None.
144
Graph Traversals:
A traversal is a systematic procedure for exploring a graph by examining all of its
vertices and edges. A traversal is efficient if it visits all the vertices and edges in time
proportional to their number, that is, in linear time.
Graph traversal shows the notion of reachability. Reachability in an undirected graph
G include the following:
Computing a path from vertex u to vertex v, or reporting that no such path
exists.
Given a start vertex s of G, computing, for every vertex v of G, a path with the
minimum number of edges between s and v, or reporting that no such path
exists.
Testing whether G is connected.
Computing a spanning tree of G, if G is connected.
Computing a cycle in G, or reporting that G has no cycles.
145
Algorithm DFS(G,u): {We assume u has already been marked as visited}
Input: A graph G and a vertex u of G
Output: A collection of vertices reachable from u, with their discovery edges
for each outgoing edge e = (u,v) of u do
if vertex v has not been visited then
Mark vertex v as visited (via edge e).
Recursively call DFS(G,v).
Example:
146
Running Time of Depth-First Search:
incident edges(v) takes O(deg(v)) time.
e.opposite(v) method takes O(1) time.
edge has been explored in O(1) time.
Python Code:
def dfs(visited, graph, node):
if node not in visited:
print (node)
visited.add(node)
for neighbour in graph[node]:
dfs(visited, graph, neighbour)
Breadth-First Search
Traversing a connected component of a graph, known as a breadth-first search
(BFS)
Procedure:
A BFS proceeds in rounds and subdivides the vertices into levels.
BFS starts at vertex s, which is at level 0.
In the first round, we paint as “visited,” all vertices adjacent to the start vertex s-
these vertices are one step away from the beginning and are placed into level 1.
In the second round, we allow all explorers to go two steps (i.e., edges) away from
the starting vertex. These new vertices, which are adjacent to level 1 vertices and not
previously assigned to a level, are placed into level 2 and marked as “visited.”
This process continues in similar fashion, terminating when no new vertices are
found in a level.
Python Code:
def bfs(visited, graph, node):
visited.append(node)
queue.append(node)
while queue:
s = queue.pop(0)
print (s, end = " ")
for neighbour in graph[s]:
if neighbour not in visited:
visited.append(neighbour)
queue.append(neighbour)
bfs(visited, graph, 'A')
Example:
147
Running time of Breadth First Search:
A BFS traversal of G takes O(n+m) time.
148
Directed Acyclic Graphs:
Directed Graphs without cycles are referred as Directed Acyclic Graphs – DAG.
Topological Ordering:
A topological ordering is an ordering such that any directed path in G traverses
vertices in increasing order. Note that a directed graph may have more than one topological
ordering.
Python Code
def Topsort(Graph G):
int Counter;
Vertex V, W;
for counter in range(0,NumVertex)
V=FindNewVertexOfDegreeZero()
if(V==NotAVertex):
Error(“Graph has a cycle)
break
TopNum[V]= Counter
for W in range(o, adjacent to V):
Indegree[W]--;
Example:
149
Ordering of vertices using Topological Sorting:
A B C D E F G H
A 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0
C 1 0 0 0 0 0 0 0
D 3 2 1 1 0 0 0 0
E 1 1 0 0 0 0 0 0
F 2 2 2 2 1 0 0 0
G 2 2 2 1 1 1 0 0
H 3 3 2 2 2 2 1 0
A C E B D F G H
150
Shortest Paths:
Breadth-first search strategy can be used to find a shortest path from some starting
vertex to every other vertex in a connected graph.
Dijkstra’s Algorithm:
Single-source shortest path problem is to perform a “weighted” breadth-first search
starting at the source vertex s.
In each iteration, the next vertex chosen is the vertex outside the cloud that is closest
to s. The algorithm terminates when no more vertices are outside the cloud.
Applying the greedy method to the single-source shortest-path problem, results in an
algorithm known as Dijkstra’s algorithm.
Edge Relaxation:
Procedure:
• Assign the source node as S and Enqueue S.
• Dequeue the vertex S from queue and assign the value of that vertex to be known
and then find its adjacency vertices.
• If the distance of the adjacent vertices is equal to infinity then change the distance of
that vertex as the distance of its source vertex. Increment by 1 and enqueue the
vertex.
• Repeat step ii until the queue becomes empty.
151
Algorithm ShortestPath(G,s):
#Input: A weighted graph G with nonnegative edge weights, and a #distinguished vertex s of
G.
#Output: The length of a shortest path from s to v for each vertex v of G.
#Initialize D[s] = 0 and D[v] = ∞ for each vertex v = s.
#Let a priority queue Q contain all the vertices of G using the D labels as keys.
while Q is not empty do
Exa u = value returned by Q.remove min() #{pull a new vertex u into the cloud}
mpl for each vertex v adjacent to u such that v is in Q do
e: if D[u] +w(u,v) < D[v] then #{perform the relaxation procedure on edge (u,v)}
Find D[v] = D[u] +w(u,v)
the Change to D[v] the key of vertex v in Q.
Shor return the label D[v] of each vertex v
test
path using Dijkstra’s Algorithm
Solution:
1. v1 is taken as source.
152
2. Now v1 is known vertex, marked as 1. Its adjacent vertices are v2, v4, pv and dv values
are updated
T[v2]. dist = Min (T[v2].dist, T[v1].dist + Cv1, v2) = Min (α , 0+2) = 2
T[v4]. dist = Min (T[v4].dist, T[v1].dist + Cv1, v4) = Min (α , 0+1) = 1
3. Select the vertex with minimum distance away v2 and v4. V4 is marked as known vertex.
Its adjacent vertices are v3, v5, v6 and v7 .
T[v3]. dist = Min (T[v3].dist, T[v4].dist + Cv4, v3) = Min (α , 1+2) = 3
T[v5]. dist = Min (T[v5].dist, T[v4].dist + Cv4, v5) = Min (α , 1+2) = 3
T[v6]. dist = Min (T[v6].dist, T[v4].dist + Cv4, v6) = Min (α , 1+8) = 9
T[v7]. dist = Min (T[v7].dist, T[v4].dist + Cv4, v7) = Min (α , 1+4) = 5
4. Select the vertex which is shortest distance from source v1. v2 is smallest one. v2 is
marked as known vertex. Its adjacent vertices are v4 ad v5. The distance from v1 to v4 and
v5 through v2 is more comparing with previous value of dv. No change in dv and pv value.
153
5. Select the next smallest vertex from source. v3 and v5 are smallest one. Adjacent vertices
for v3 are v1 and v6. v1 is source there is no change in dv and pv
T[v6]. dist = Min (T[v6].dist, T[v3].dist + Cv3, v6) = Min (9 , 3+5) = 8
dv and pv values are updated. Adjacent vertices for v5 are v7. No change in dv and pv
value.
154
7. The last vertex v6 is declared as known.
Procedure:
o We begin with some vertex s,
o defining the initial “cloud” of vertices C.
o Then, in each iteration, choose a minimum-weight edge e = (u,v), connecting
a vertex u in the cloud C to a vertex v outside of C.
o The vertex v is then brought into the cloud C and the process is repeated until
a spanning tree is formed.
155
Algorithm PrimJarnik(G):
Input: An undirected, weighted, connected graph G with n vertices and m edges
Output: A minimum spanning tree T for G
Pick any vertex s of G
INF = 9999999
V=5
selected = [0, 0, 0, 0, 0]
no_edge = 0
selected[0] = True
print("Edge : Weight\n")
while (no_edge < V - 1):
minimum = INF
x=0
y=0
for i in range(V):
if selected[i]:
for j in range(V):
if ((not selected[j]) and G[i][j]):
# not in selected and there is an edge
if minimum > G[i][j]:
minimum = G[i][j]
x=i
y=j
print(str(x) + "-" + str(y) + ":" + str(G[x][y]))
selected[y] = True
no_edge += 1
156
Example
157
2 Mark Questions with Answers
1. Define Graph.
A graph is a way of representing relationships that exist between pairs of objects.
That is, a graph is a set of objects, called vertices ‘V’, together with a collection of pairwise
connections between them, called edges ‘E’. It can also be represented as G=(V, E).
5. What is a loop?
An edge of a graph which connects to itself is called a loop or sling.
158
6. What is a simple graph?
A simple graph is a graph, which has not more than one edge between a pair of
nodes than such a graph is called a simple graph.
159
11. What is a simple path?
A path in a diagram in which the edges are distinct is called a simple path. It is also
called as edge simple.
160
3. Adjacency Map - is very similar to an adjacency list, but the secondary container
of all edges incident to a vertex is organized as a map.
4. Adjacency Matrix - Each entry is dedicated to storing a reference to the edge
(u,v) for a particular pair of vertices u and v; if no such edge exists, the entry will
be None.
21. List the two important key points of depth first search.
i) If path exists from one node to another node, walk across the edge – exploring the edge.
ii) If path does not exist from one specific node to any other node, return to the previous
node where we have been before – backtracking.
161
23. Differentiate BFS and DFS.
SNo Concept BFS DFS
BFS stands for Breadth First
1. Stands for DFS stands for Depth First Search.
Search.
Approach It works on the concept of FIFO It works on the concept of LIFO
2.
used (First In First Out). (Last In First Out).
BFS is more suitable for searching
DFS is more suitable when there are
3. Suitable for vertices which are closer to the
solutions away from source.
given source.
The Time complexity of BFS is O(V The Time complexity of DFS is also
Time + E) when Adjacency List is used O(V + E) when Adjacency List is
4.
Complexity and O(V^2) when Adjacency Matrix used and O(V^2) when Adjacency
is used. Matrix is used.
Visiting of
Here, siblings are visited before the Here, children are visited before the
5. Siblings/
children. siblings.
Children
BFS is used in various application DFS is used in various application
6. Applications such as bipartite graph, and such as acyclic graph and
shortest path etc. topological order etc.
7. Memory BFS requires more memory. DFS requires less memory.
162
26. Define biconnectivity.
A connected graph G is said to be biconnected, if it remains connected after removal
of any one vertex and the edges that are incident upon that vertex. A connected graph is
biconnected, if it has no articulation points.
163