Introduction To Data Structures: Basic Terminologies Elementary Data Organization
Introduction To Data Structures: Basic Terminologies Elementary Data Organization
DATA STRUCTURE
Data may be organised into many different ways; the logical or mathematical model of
particular organization of data is called a data structure. The choice of a particular data model
depends on two considerations.
1. It must give actual relationships of the data in the real world.
2. The structure should be simple, that can effectively process the data when necessary.
Classification of data structures
Data structure is generally classified into primitive and non-primitive data structures. Basic
data types such as integer, real, character and Boolean are known as primitive data structures.
These data types consist of characters that cannot be divided, and hence they are called simple
data types.
Based on the structure and arrangement of data, non-primitive data structures are further
classified into linear and non-linear.
A data structure is said to be linear if its elements form a sequence or a linear list. In linear
data structures, the data is arranged in a linear fashion although they are stored in memory
1
Unit-I
need not be sequential. Arrays, linked list, stacks and queues are example linear data
structures.
A data structure is said to be non-linear if the data are not arranged into sequence. The
insertion and deletion of data is therefore not possible in a linear fashion. Trees and graphs
are examples of non-linear data structures.
Arrays:
The simple type of data structure is a linear (or one dimensional ) array. Linear array is list
of a finite number n of similar data elements referenced respectively by a set of n consecutive
numbers, usually 1,2,3,....,n . If we choose the name A for the array, then the elements of A
are denoted by subscript notation a1, a2, a3, .........., an or by the parenthesis notation A(1),
A(2),A(3),........,A(N) or by the bracket notation A[1],A[2],A[3],......A[N].
Linked List:
The difference between arrays and linked list is that in the arrays all the memory may not be
utilized. Only fewer may be used and the rest become unused even though they are allocated.
This leads to wastage of memory. The memory is very important resource which has to be
handled efficiently. In the array, the memory is allocated before the execution of the program;
it is fixed and cannot be changed. This problem could be overcome using linked list.
Linked list is a non-sequential collection of data items. For every data item in the linked list,
there is an associated address that would give the memory location of the next data item in
the list. The data items in the linked list are not in a consecutive memory location. They may
be anywhere in the memory. Accessing of these items is easier because of pointers used to
link. A linked list is shown in the following figure:
Advantage
➢ Linked lists are dynamic data structures: They can grow or shrink during the execution
of a program.
2
Unit-I
Trees:
Data frequently contain a hierarchical relationship between various elements. The data
structures which reflects this relationship is called a tree.
Stack:
A stack , also called a last-in-first-out ( LIFO ) system, is a linear list in which insertions and
deletions can take place only at the end, called the top.
Queue:
A queue , also called a first-in-first-out ( FIFO ) system, is a linear list in which deletion take
place only at one end of the list, the “front” of the list, and insertions can take place only at
the other end of the list, the “rear” of the list.
Graph:
Data sometimes contain a relationship between pair of elements which is not necessarily
hierarchical in nature.
3
Unit-I
Preliminaries
INTRODUCTION TO ALGORITHMS:
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a
certain order to get the desired output. Algorithms are generally created independent of
underlying languages, i.e., an algorithm can be implemented in more than one programming
language.
1. Unambiguous − Algorithm should be clear and unambiguous. Each of its steps (or phases),
and their inputs/outputs should be clear and must lead to only one meaning.
2. Input − An algorithm should have 0 or more well-defined inputs.
3. Output − An algorithm should have 1 or more well-defined outputs, and should match the
desired output.
4. Finiteness − Algorithms must terminate after a finite number of steps.
5. Feasibility − Should be feasible with the available resources.
6. Independent − An algorithm should have step-by-step directions, which should be
independent of any programming code.
ALGORITHMIC NOTATIONS:
An algorithm is a finite step-by-step list of well-defined instructions for solving a particular
problem. The following summarize the basic format conventions used in the formulation of
algorithms.
⬧ Name of Algorithm: Every algorithm is given an identifying name written in capital
letters.
⬧ Introductory comments: The algorithm name is followed by a brief description of tasks
the algorithm performs and any assumptions that have been made. The description gives
the names and types of the variables used in the algorithm.
⬧ Comments: An algorithm step may terminate with a comment enclosed in round
parentheses intended to help the reader to understand the step better. Comments specify
no action and are included only for clarity.
⬧ Identifying number: Each algorithm is assigned an identifying number as follows:
Algorithm 4.3 refers to third algorithm in chapter 4.
⬧ Steps, Control, Exit
Algorithm is made up of sequence of numbered steps, each beginning with a phrase
enclosed in square brackets which gives abbreviated description of that step.
Following this phrase is an ordered sequence of statements which describe actions to
be performed. The steps of algorithm are executed one after the other, beginning with
Step 1, unless indicated otherwise.
Control may be transferred to Step n of algorithm by the statement “Go to Step n.”
The algorithm is completed when the statement
Exit
is encountered.
⬧ Assignment Statement: The assignment statement is indicated by placing an arrow (←)
between the righthand side of the statement and the variable receiving the value.
Ex:
MAX←A[I]
which means that the value of the vector element A[I] replaces the contents of variable
MAX. An exchange of values of two variables (accomplished by the sequence of statements
TEMP←A, A←B, B←TEMP) is written as A↔B. Many variables can be set to the same
value by using a multiple assignment. Ex:
I←0, J←0, K←0
4
Unit-I
could be written as
I←J←K←0
Sometimes it uses := notation. For example,
MAX := DATA[1]
assigns the value in DATA[1] to MAX.
⬧ Input and Output: Data may be input and assigned to variables by means of a Read
statement with the following form,
Read: Variable names
Similarly, messages, placed in quotation marks, and data in variables may be output
by means of a Write or Print statement with the following form,
Write: Messages and / or variable names
⬧ Variable names: A variable is an entity that possesses a value, and its name is chosen to
reflect the meaning of the value it holds (Ex: MAX holds the largest element). A variable
name always begins with a letter followed by characters including letters, numeric digits
and some special characters. Blanks are not permitted within a name, and all letters are
capitalized. Example for valid variable names are BLACK_BOX, X_SQUARED. The most
useful of the special characters is “_” (called break character), which may be used as a
separator in names made up of several words.
For example:
Algorithm : (Largest element in array ) LARGEST(DATA, N)
A nonempty array DATA with N numerical values is given. This algorithm finds the location
LOC and the value MAX of the largest element of DATA. The variable K is used as a counter.
Step 1. [ Initialize ] Set K := 1 , LOC := 1 and MAX := DATA[1]
Step 2. [ Increment Counter] Set K := K + 1
Step 3. [Test Counter] If K >N , then:
Write : LOC , MAX and Exit
Step 4. [ Compare and Update ] If MAX <DATA[K] , then:
Set LOC := K and MAX := DATA[K]
Step 5. [ Repeat Loop] Go to Step 2
CONTROL STRUCTURES:
The three types of logic or flow of control are
1. Sequential logic or sequential flow
2. Selection logic or conditional flow
3. Iteration logic or repetitive flow
Sequential logic or sequential flow:
Instructions or the modules are executed in sequence. The sequence may be presented
explicitly, by means of number of steps, or implicitly, by the order in which the modules are
written.
5
Unit-I
1. Single Alternative:
The structure has form
If the condition holds, then Module A, which may consist of one or more statements, is
executed; otherwise Module A is skipped and control transfers to the next step of the
algorithm.
2. Double Alternative:
This structure has the form
3. Multiple Alternatives:
This structure has the form
The logic of this structure allows only one of the modules to be executed. Either the
module which follows the first condition which holds is executed, or the module which
follows the final Else statement is executed.
6
Unit-I
Repeat-for:
The Repeat-for loop uses an index variable to control the loop. The loop has the form
Here R is called the initial value, S the end value or test value and T the increment
Repeat-while:
The repeat-while uses a condition to control the loop. The loop has the form
Repeat while condition:
[Module]
[End of loop.]
7
Unit-I
Recursion
INTRODUCTION:
The process in which a function calls itself directly or indirectly is called recursion and the
corresponding function is called as recursive function. Using recursive algorithm, certain
problems can be solved easily. Examples of such problems are Factorial, Fibonacci sequence,
and Towers of Hanoi (TOH) etc.
FACTORIAL FUNCTION:
Factorial of n is the product of all positive descending integers. Factorial of n is denoted by
n!. If n = 0, then n!=1.
If n > 0, then n! = n x (n – 1)!
For example:
If n = 5 then 5! = 5x4x3x2x1 = 120
If n=3 then 3! = 3x2x1 = 6
FACTORIAL(FACT, N)
This algorithm calculates N! and returns the value in the variable FACT.
1. If N =0, then: Set FACT := 1, and Return.
2. Call FACTORIAL(FACT, N-1).
3. Set FACT := N * FACT.
4. Return.
FIBONACCI SEQUENCE:
Fibonacci series generates the subsequent number by adding two previous numbers.
Fibonacci series starts from two numbers − F0 & F1. The initial values of F0 & F1 can be taken
0, 1 or 1, 1 respectively. If n = 0 or n = 1 then F n = n
If n > 1, then F n = F n-2 + F n-1
For example:
If n=8 then F8 = 0 1 1 2 3 5 8 13
If n=5 then F5 = 0 1 1 2 3
FIBONACCI(FIB, N)
This algorithm calculates F N and returns the value in the variable FIB.
1. If N =0 or N = 1, then: Set FIB := N, and Return.
2. Call FIBONACCI (FIBA, N-2).
3. Call FIBONACCI (FIBB, N-1).
4. Set FIB := FIBA + FIBB.
5. Return.
TOWERS OF HANOI:
Tower of Hanoi, is a mathematical puzzle which consists of three towers (pegs) and more
than one rings is as depicted –
8
Unit-I
These rings are of different sizes and stacked upon in an ascending order, i.e., the smaller one
sits over the larger one. There are other variations of the puzzle where the number of disks
increase, but the tower count remains the same.
Rules: To move all the disks to some another tower without violating the sequence of
arrangement. A few rules to be followed for Tower of Hanoi are −
• Only one disk can be moved among the towers at any given time.
• Only the "top" disk can be removed.
• No large disk can sit over a small disk.
Algorithm
To write an algorithm for Tower of Hanoi, first we need to learn how to solve this problem
with lesser amount of disks, say → 1 or 2. We mark three towers with name, source,
destination and aux (only to help moving the disks). If we have only one disk, then it can
easily be moved from source to destination peg. If we have 2 disks −
• First, we move the smaller (top) disk to aux peg.
• Then, we move the larger (bottom) disk to destination peg.
• And finally, we move the smaller disk from aux to destination peg.
TOWER(N,BEG,AUX,END)
This algorithm gives a recursive solution to the towers of Hanoi problem for N disks.
1. If N = 1, then:
(a) Write BEG → END.
(b) Return.
[End of If Structure]
2. [Move N – 1 disks from peg BEG to Peg AUX.]
Call TOWER(N – 1, BEG, END, AUX)
3. Write BEG → END
4. [Move N – 1 disks from peg AUX to Peg END]
Call TOWER(N – 1, AUX, BEG, END)
5. Return
9
Unit-I
Arrays
Data structures are classified as either linear or non-linear. A data structure is said to be
linear if its elements form a sequence. There are two basic ways of representing such linear
structures in memory. One way is to have linear relationship between the elements
represented by means of sequential memory locations. These linear structures are called
arrays. The other way is to have the linear relationship between the elements represented by
means of a pointers or links. These linear structures called linked lists. They are frequently
used to store relatively permanent collections of data.
The operations one normally perform on any linear structure, whether it be an array or a
linked list, include the following:
a) Traversal: Processing each element in the list.
b) Search: Finding the location of the element with the given value, or the record with a
given key.
c) Insertion: Adding a new element to the list.
d) Deletion: Removing an element from the list.
e) Sorting: Arranging the elements in some type of order.
f) Merging: Combing two lists into a single list.
LINEAR ARRAYS
A linear array is a list of a finite number n of homogeneous data elements (i.e., data elements
of the same type) such that
a) The elements of the array are referenced respectively by an index set consisting of n
consecutive numbers.
b) The elements of the array are stored respectively in successive memory locations.
The number n of elements is called the length or size of the array. If not explicitly stated,
assume the index set consists of the integers 1,2,…..,n . In general, the length or the number
of data elements of the array can be obtained from the index set by the formula
Length = UB – LB + 1
Where UB is the largest index , called the Upper Bound and LB is the smallest index, called
the Lower Bound, of the array.
Ex:
10
Unit-I
Each programming language has its own rules for declaring arrays. Each such declaration
must give, implicitly or explicitly, three items of information:
1. The name of the array
2. The data type of the array
3. The index set of the array
Ex: Suppose DATA is a 6-element linear array containing real values. C language declares
such an array as follows:
Float DATA[6];
Computer Memory
The elements of LA are stored in successive memory cells. The computer keep track of the
address of the first element of LA, denoted by Base(LA) and called the base address of LA.
Using this address Base(LA) , the computer calculates the address of any element of LA by
the following formula :
LOC(LA[K]) = Base(LA) + w(K – lower bound)
where w is the number of words per memory cell for the array LA.
Ex: Consider the array AUTO, which records the number of automobiles sold each year from
1932 through 1984. Suppose Base(AUTO)=200 in memory and w=4 words per memory cell
for AUTO. Then
The address of the array element for the year K=1965 can be obtained by using equation
LOC(AUTO[K]) = Base(AUTO) + w(K – lower bound)
11
Unit-I
Algorithm: (Traversing a Linear Array) Here LA is a Linear Array with lower bound LB
and upper bound UB. This algorithm traverses LA applying an operation PROCESS to each
element of LA.
1. [ Initialize counter ] Set K := LB.
2. Repeat Step 3 and 4 while K UB.
3. [Visit element] Apply PROCESS to LA[K].
4. [Increase counter] Set K := K + 1.
[ End of Step 2 loop.]
5. Exit.
The operation PROCESS in the traversal algorithm may use certain variables which must be
initialized before PROCESS is applied to any of the elements in the array.
12
Unit-I
Fig. 4.5
MULTIDIMENSIONAL ARRAYS
Two Dimensional Arrays:
A two dimensional m x n array is a collection of m.n data elements such that each element is
specified by a pair of integers (such as J, K), called subscripts, with property that 1 ≤ J ≤ m
and 1 ≤ K ≤ n. The element of A with first subscript j and second subscript k will be denoted
by AJ,K or A[J, K] .
Two dimensional arrays are called matrices in mathematics and tables in business
applications; hence two-dimensional arrays are called matrix arrays.
There is a standard way of drawing two dimensional m x n array A where the elements of A
form a rectangular array with m rows and n columns and where the element A[J, K] appear
in row J and column K. For example, two dimensional 3 x 4 array A is represented as:
Suppose A is a two-dimensional m x n array. The first dimension of A contains the index set
1, ….., m with lower bound 1 and upper bound m ; and the second dimension of A contains
the index set 1, ….., n with lower bound 1 and upper bound n. The length of the dimension
is the number of integers in its index set. The pair of lengths m x n is called the size of the
array.
The length of given dimension can be obtained from the formula :
Length = upper bound – lower bound + 1
Storage representations
Row-major Representation:
Consider a two-dimensional array as a one-dimensional array since it has elements with a
single dimension. As a result, a two-dimensional array can be assumed as a single column
13
Unit-I
with many rows and mapped sequentially. Such a representation is called a Row-major
Representation.
Column-major Representation:
We can represent a two-dimensional array as one single row of columns and map it
sequentially. Such a representation is called column-major representation.
The computer keeps track of Base (A) – the address of the first element A[1,1] of A and
computes the address LOC ( A [ J ,K ] ) of A [ J , K ] using the formula
Column-major order : LOC ( A [ J ,K ] ) = Base ( A ) + w [ M ( K – 1 ) + ( J – 1 ) ]
Row-major order : LOC ( A [ J ,K ] ) = Base ( A ) + w [ N ( J – 1 ) + ( K – 1 ) ]
w denotes the number of words per memory locations for the array A.
Ex: Consider the 25 x 4 matrix array SCORE. Suppose Base(SCORE)=200 and there are w=4
words per memory cell. Furthermore, suppose the array is stored in row-major order. Then
the address of SCORE[12, 3], the third test of the twelfth student, follows:
SPARSE MATRICES
Matrices with a relatively high proportion of zero entries are called sparse matrices.
15
Unit-I
Here 2/3 of the total elements in a matrix are zeros. Two general types of n-square sparse
matrices, which occur in various applications. The first matrix, where all entries above the
main diagonal are zero, where nonzero entries can only occur on or below the main diagonal,
is called a (lower) triangular matrix. Similarly, a square matrix is called upper triangular
if all the entries below the main diagonal are zero.
The second matrix, where nonzero entries can only occur on the diagonal or on elements
immediately above or below the diagonal, is called a tridiagonal matrix.
16
Unit-I
Sorting
Sorting refers to the operation of arranging data in some given order, such as increasing or
decreasing, with numerical data or alphabetically, with character data. Let A be a list of n
elements A1,A2,A3,………,An in memory. Sorting A refers to the operation of rearranging
the elements of A so that they are increasing in order as:
A[1] ≤ A[2] ≤ A[3] ≤ ………. ≤ A[n]
For example, suppose A contains
8, 4,19, 2, 7, 13, 5, 16
After sorting, A is: 2, 4, 5, 7, 8, 13, 16, 19
SELECTION SORT
The algorithm achieves its name from the fact that with each iteration the smallest element
for a key position is selected from the list of remaining elements and put in the required
position of the array i.e., we start the search assuming that the current element is the smallest
until we find an element smaller than it and then interchange the elements. The algorithm is
not efficient for large arrays. The method of selection sort relies on comparison mechanism
to achieve its goals.
Algorithm: SELECTION_SORT(A, N)
Given a vector A of N elements, this procedure rearranges the array in ascending order. The
variable SMALL stores the smallest element in the vector.
1.[Examine all the elements on the array]
Repeat through step 2 for I = 0, 1, …..N-2
2.[Assume Ith element as smallest]
SMALL←A[I]
3.[Find the smallest element in the array]
Repeat through step 3 for J = I + 1,……,N-1
[Compare and exchange]
If A[J] < SMALL then
A[J]↔SMALL
[ End of If structure.]
[ End of loop. ]
4.[Finished]
Exit
17
Unit-I
Pass 2:
Now SMALL= A[1] = 14
Compare A[2] with SMALL. 42 is not less than 14
Compare A[3] with SMALL. 21 is not less than 14
List is not altered. Observe that Pass 2 involves N-2 comparisons.
Pass 3:
SMALL = A[2] = 42
Compare A[3] with SMALL. Since 21 < 42, SMALL = 21
When Pass 3 is completed, A[2] will contain the next smallest element of the array.
i.e., 9, 14, 21, 42
BUBBLE SORT
This is the most popular of all sorting algorithms because it is very simple to understand and
implement this algorithm. The algorithm achieves its name from the fact that in each iteration
a number moves like a bubble to its appropriate position. However, the algorithm is not
efficient for large arrays. The method of bubble sort relies heavily on an exchange mechanism
to achieve its goals. The method is also called as “sorting by exchange”.
The algorithm of bubble sort functions as follows:
The algorithm begins by comparing the element at the bottom of the array with the next
element. If the first element is larger than the second element then they are swapped or
interchanged. The process is then repeated for the next two elements. After n-1 comparisons
the largest of all the items slowly ascends to the top of the array. The entire process till now
forms one pass of comparisons. During the next pass the same steps are repeated from the
beginning of the array, however this time the comparisons are only for n-1 elements. The
second pass results in the second largest element ascending to its position. The process is
repeated again and again until only two elements are left for comparisons. The last iteration
ensures that the first two elements of the array are placed in the correct order.
Algorithm: ( Bubble Sort ) BUBBLE(DATA, N)
Here DATA is an array with N elements. This algorithm sorts the elements in DATA.
1. Repeat Steps 2 and 3 for K = 1 to N – 1
2. Set PTR := 1 [Initialize pass pointer PTR.]
3. Repeat while PTR ≤ N – K:
a) If DATA [ PTR ] > DATA [ PTR + 1 ], then:
Interchange DATA [ PTR ] and DATA [ PTR + 1 ].
[ End of If structure.]
b) Set PTR := PTR + 1
[ End of inner loop. ]
[ End of Step 1 outer loop ]
4. Exit
Compare A[1] with A[2]. Since 33 > 11, Interchange 33 & 11 as: 22, 11, 33, 44
When pass 2 is completed, A[N-2] will contain the second largest element.
PASS 3:
Compare A[0] with A[1]. Since 22 > 11, Interchange 22 & 11 as: 11, 22, 33, 44
After N-1 passes, the list will be sorted in increasing order.
QUICK SORT
Quicksort is a Divide and Conquer algorithm. It first selects a value, which is called the pivot
value and partitions the given array around the picked pivot. The role of the pivot value is to
assist with splitting the list. The actual position where the pivot value belongs in the final
sorted list, commonly called the split point. Quick sort follows the below steps:
• Make any element as pivot
• Partition the array on the basis of pivot
• Apply quick sort on left partition recursively
• Apply quick sort on right partition recursively
For example, decide any value to be the pivot from the following list. Here 54 will serve as
our first pivot value. 54 will eventually end up in the position currently holding 31. The
partition process will happen next. It will find the split point and at the same time move
other items to the appropriate side of the list, either less than or greater than the pivot value.
Partitioning begins by locating two position markers—let’s call them leftmark and right-
mark—at the beginning and end of the remaining items in the list (positions 1 and 8 in Fig.).
The goal of the partition process is to move items that are on the wrong side with respect to
the pivot value while also converging on the split point. Following Fig. shows this process as
we locate the position of 54.
19
Unit-I
We begin by incrementing leftmark until we locate a value that is greater than the pivot
value. We then decrement rightmark until we find a value that is less than the pivot value. At
this point we have discovered two items that are out of place with respect to the eventual split
point. For our example, this occurs at 93 and 20. Now we can exchange these two items and
then repeat the process again.
At the point where rightmark becomes less than leftmark, we stop. The position of rightmark
is now the split point. The pivot value can be exchanged with the contents of the split point
and the pivot value is now in correct place. In addition, all the items to the left of the split
point are less than the pivot value, and all the items to the right of the split point are greater
than the pivot value. The list can now be divided at the split point and the quick sort can be
invoked recursively on the two halves.
20
Unit-I
QUICKSORT(A,FIRST,LAST):
This algorithm sorts an array A with N elements.
1. If FIRST<LAST, then:
SPLITPOINT := partition(A,FIRST,LAST)
QUICKSORT(A,FIRST,SPLITPOINT-1)
QUICKSORT(A,SPLITPOINT+1,LAST)
[End of If structure]
2. Return
PARTITION(A,FIRST,LAST):
1. [Initialize]
PIVOTVALUE = A[FIRST]
LEFTMARK = FIRST+1
RIGHTMARK = LAST
2. Repeat while LEFTMARK < RIGHTMARK:
Repeat while LEFTMARK <= RIGHTMARK and A[LEFTMARK] <=
PIVOTVALUE:
LEFTMARK = LEFTMARK + 1
[End of while loop]
Repeat while A[RIGHTMARK] >= PIVOTVALUE and RIGHTMARK
>= LEFTMARK:
RIGHTMARK = RIGHTMARK -1
[End of while loop]
If RIGHTMARK < LEFTMARK:
A[LEFTMARK] → A[RIGHTMARK]
[End of If structure]
[End of while loop]
3. A[FIRST] → A[RIGHTMARK]
4. Return RIGHTMARK
INSERTION SORT
Suppose an array A with n elements A[1],A[2],…………A[n] is in memory. The insertion
sort algorithm scans A from A[1] to A[n], inserting each element A[k] into its proper
position in the previously sorted subarray A[1],A[2],…………A[k-1].
Pass 1: A[1] by itself is trivially sorted.
Pass 2: A[2] is inserted either before or after A[1] so that A[1],A[2] are sorted.
Pass 3: A[3] is inserted into its proper place in A[1], A[2] that is before A[1] between A[1]
and A[2] , or after A[2] so that A[1], A[2], A[3] is sorted.
Pass 4: A[4] is inserted into its proper place in A[1], A[2], A[3] so that A[1] , A[2], A[3],
A[4] is sorted.
Pass n: A[n] is inserted into its proper place in A[1], A[2],….. A[n-1] so that A[1] ,
A[2],…. A[n] is sorted.
21
Unit-I
22
Unit-I
23
Unit-I
MERGE SORT
This sorting method follows the technique of divide and conquer. The technique of merge
sort works as follows. Given a sequence of ‘N’ elements the idea is to split them into two sets.
Each set is individually sorted and the resulting sequence is then combined to produce a single
sorted sequence of N elements. If each subset is of the same type as the original set, then the
subset is further recursively divided into smaller subsets until each subset is small enough to
be solved independently without splitting. The sorted subsets are then combined to obtain a
single solution to the entire problem.
The process of dividing a problem into sub problems can be clearly understood by the example
shown in the following figure.
Is l<r?
Yes
m = (l + r)/2
24
Unit-I
The following diagram shows the complete merge sort process for an example array
{38, 27, 43, 3, 9, 82, 10}.
25
Unit-I
If we take a closer look at the diagram, we can see that the array is recursively divided
into two halves till the size becomes 1. Once the size becomes 1, the merge processes
come into action and start merging arrays back till the complete array is merged.
26
Unit-I
The process of comparison starts by comparing first element of both subarrays and then the
smaller element should be placed in the resultant vector. And then the process is again
continued with the next element. The algorithm of merging may have the following steps.
MERGE(A, LOW, MID, HIGH): Given a vector A with N elements, this procedure sorts
the elements in the ascending order. C is a vector to store the result. LOW, MID and HIGH
are the variables used to identify the low, mid and high position of the elements in each
partition. The first partition is from the position low to the position MID and the next
partition is from the position MID+1 to the position HIGH. I and J are the temporary
variables.
1.[ Initialize ]
I←LOW
J←MID + 1
K←LOW
2.[ Examine the elements of the vector ]
Repeat thru step 3 while I ≤ MID and J≤HIGH
3. [ Compare and store the Ith element in the resultant vector C ]
If A[I] < A[J] then
C[K]←A[I]
K←K + 1
I←I + 1
[ Otherwise store the Jth element in the resultant vector C]
else
C[K]←A[J]
K←K + 1
J←J + 1
[End of If structure.]
[End of Loop.]
4.[ Examine the elements of the vector till I is less than or equal to MID]
Repeat while I ≤ MID
C[K]←A[I]
K←K + 1
I←I + 1
[End of Loop.]
5.[ Examine the elements of the vector till J is less than or equal to HIGH]
Repeat while J ≤ HIGH
C[K]←A[J]
K←K + 1
J←J + 1
[End of Loop.]
6.[ Assign the elements of C into A vector]
Repeat for I = LOW,…., HIGH
A[I]←C[I]
[End of Loop.]
7.[ Finished]
Exit
27