0% found this document useful (0 votes)
853 views

Practical Consideration of Internal Sorting and External

Internal sorting refers to sorting data that fits entirely in main memory, while external sorting handles data too large for main memory by writing it to disk in chunks. Practical considerations for external sorting include using parallelism across disks/machines, increasing hardware speeds like more RAM or SSDs, and optimizing software for multicore and asynchronous I/O. The external sorting benchmark compares real-world implementations of external sorting algorithms using these techniques.

Uploaded by

rishi srivastava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
853 views

Practical Consideration of Internal Sorting and External

Internal sorting refers to sorting data that fits entirely in main memory, while external sorting handles data too large for main memory by writing it to disk in chunks. Practical considerations for external sorting include using parallelism across disks/machines, increasing hardware speeds like more RAM or SSDs, and optimizing software for multicore and asynchronous I/O. The external sorting benchmark compares real-world implementations of external sorting algorithms using these techniques.

Uploaded by

rishi srivastava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

PRACTICAL

CONSIDERATION OF
INTERNAL AND
EXTERNAL SORTING

Dr B R Ambedkar National Institute of Technology, Jalandhar


Contents for Today’s Lecture

• Concept of Sorting
• Concept of Internal Sorting
• Practical consideration of Internal Sorting
• Concept of External Sorting
• Practical consideration of External Sorting
• Conclusion
SORTING

In computer science, arranging in an ordered


sequence is called "sorting".
Sorting is a common operation in many
applications, and efficient algorithms to perform it
have been developed.
The most common uses of sorted sequences are:

•making merging of sequences efficient.


•making lookup or search efficient;
•enable processing of data in a defined order.
INTERNAL SORTING

•Internal sorting is a sorting technique in which


the entire sorting takes place inside the main
memory of the computer.
•There is no need for external memory for
execution of sorting program.
•It is used when size of input is small.
Internal sorting includes Bubble sort , Insertion
sort, selection sort , etc.
INTERNAL SORTING(continued)
Practical consideration
Bubble sort
•Bubble Sort is the simplest sorting algorithm that
works by repeatedly swapping the adjacent
elements if they are in wrong order.
•Several passes are performed and with each pass
the smallest element gets at the top of the array.
•Function for bubble sort:

void bubsort(ELEM * array, int n) {


for (int I=0; I<n-1; I++)
for (int j=n-i-1; j>I; j--)
if (key(array[j]) < key(array[j-1]))
swap (array[j], array[j-1]);
}
INTERNAL SORTING(continued)
Practical consideration

•Insertion sort is a simple sorting algorithm that


builds the final sorted array (or list) one item at a
time.
•In this sorting, the Nth element is inserted in list
containing (N-1) elements.
•It is more convenient to use in linked list rather
than array.
EXTERNAL SORTING

•External sorting is a technique in which the data is stored


on the secondary memory, in which part by part data is
loaded into the main memory and then sorting can be done
over there.
•Then this sorted data will be stored in the intermediate
files. Finally, these files will be merged to get a sorted data.
•Thus by using the external sorting technique, a huge
amount of data can be sorted easily.
•In case of external sorting, all the data cannot be
accommodated on the single memory, in this case, some
amount of memory needs to be kept on a memory such as
hard disk, compact disk and so on.
EXTERNAL SORTING(continued)

The requirement of external sorting is there, where


the data we have to store in the main memory does
not fit into it. Basically, it consists of two phases that
are:
•Sorting phase: This is a phase in which a large
amount of data is sorted in an intermediate file.
•Merge phase: In this phase, the sorted files are
combined into a single larger file.
One of the best examples of external sorting is
external merge sort.
EXTERNAL SORTING(continued)
Practical consideration
•The external merge sort is a technique in which the
data is stored in intermediate files and then each
intermediate files are sorted independently and then
combined or merged to get a sorted data.

•EXAMPLE:
Let us consider there are 10,000 records which have
to be sorted. For this, we need to apply the external
merge sort method. Suppose the main memory has a
capacity to store 500 records in a block, with having
each block size of 100 records
External Merge Sort Practical Example

In this example, we can see 5 blocks will be sorted in


intermediate files. This process will be repeated 20
times to get all the records. Then by this, we start
merging a pair of intermediate files in the main
memory to get a sorted output.
TWO WAY MERGE SORT

•Like QuickSort, Merge Sort is a Divide and Conquer


algorithm.
•It divides input array in two halves, calls itself for
the two halves and then merges the two sorted halves.
•The merge() function is used for merging two
halves. The merge(arr, l, m, r) is key process that
assumes that arr[l..m] and arr[m+1..r] are sorted and
merges the two sorted sub-arrays into one.
TWO WAY MERGE SORT(cont.)

ALGORITHM
MergeSort(arr[], l, r)
If r > l
1. Find the middle point to divide the array into two
halves: middle m = (l+r)/2
2. Call mergeSort for first half: Call mergeSort(arr,
l, m)
3. Call mergeSort for second half: Call
mergeSort(arr, m+1, r)
4. Merge the two halves sorted in step 2 and 3: Call
merge(arr, l, m, r)
TWO WAY MERGE
SORT(cont.)
Multi-way Mergesort

Idea: Do a K-way merge instead of a 2-


way merge.

Find the smallest of K elements at each


merge step.
Multi-way Mergesort Algorithm

Algorithm:
1. As before, read M values at a time into internal
memory, sort, and write as runs on disk
2. Merge K runs:
• Read first value on each of the k runs into
internal array and build min heap
• Remove minimum from heap and write to disk
• Read next value from disk and insert that value
on heap
Repeat steps until all first K runs are processed
Repeat merge on larger & larger runs until have
just one large run: sorted list
Multi-way Mergesort (cont.)

Let N = Number of records


B = Size of a Block (in records)
M = Size of internal memory (in records)
K = Number of runs to merge at once
Specific Example:
M = 80 records
B = 10 records
N = 16,000,000 records
So, K = ½ (M/B) = ½ (80/10) = 4
EXTERNAL SORTIING
(PRACTICAL CONSIDERATION)

The Sort Benchmark, created by computer scientist


Jim Gray, compares external sorting algorithms
implemented using finely tuned hardware and
software.
Winning implementations use several techniques:
•Using parallelism
•Increasing hardware speed
•Increasing software speed
EXTERNAL SORTING
PRACTICAL CONSIDERATION(cont.)
Using parallelism
•Multiple disk drives can be used in parallel in order to
improve sequential read and write speed.
•Multiple machines connected by fast network links can
each sort part of a huge dataset in parallel.
•This can be a very cost-efficient improvement
•Sorting software can use multiple threads, to speed up
the process on modern multicore computers. Software
can use asynchronous I/O so that one run of data can be
sorted or merged while other runs are being read from or
written to disk.
EXTERNAL SORTING
PRACTICAL CONSIDERATION(cont.)

Increasing hardware speed


•Using more RAM for sorting can reduce the number of disk
seeks and avoid the need for more passes.
•Fast external memory like solid-state drives can speed sorts,
either if the data is small enough to fit entirely on SSDs or, more
rarely, to accelerate sorting SSD-sized chunks in a three-pass
sort.
•Many other factors can affect hardware's maximum sorting
speed: CPU speed and number of cores, RAM access latency,
input/output bandwidth, disk read/write speed, disk seek time,
and others.
Conclusion

Internal sorting is done when the data is small in


size and external sorting is done when the
sorting process require hard disc or other
memory rather than main memory to perform
the required operations.

You might also like