SlideShare a Scribd company logo
External Sorting
• Used when the data to be sorted is so
large that we cannot use the computer’s
internal storage (main memory) to store
it
• We use secondary storage devices to
store the data
• The secondary storage devices we
discuss here are tape drives. Any other
storage device such as disk arrays, etc.
can be used
Two-way Sorting
• Assumptions:
– Computer’s internal storage can hold three
records at a time
– We denote the internal storage capacity by M.
Here we have M=3
– We only denote the integer key of every
record
– We use four tape drives. One pair of tape
drives is denoted by Ta1 and Ta2 and the other
pair is denoted by Tb1 and Tb2
– Initially, all the records that have to be sorted
are on Ta1. Ta2 , Tb1 and Tb2 are empty
Example (RAM = 128 MB)
Two-way Sorting Algorithm:
Sort Phase
Algorithm:
I. Sort Phase
1. Read M records from one pair of tape drives.
Initially, all the records are present only on one tape
drive
2. Sort the M records in the computer’s internal
storage. If M is small (< 10) use insertion sort. For
larger values of M use quick sort.
3. Write the M sorted records into the other pair of tape
drives (i.e., the pair which does not contain the input
records). While writing the records, alternate
between the two tape drives of that pair.
4. Repeat steps 1-3 until the end of input
Initially, Ta2 , Tb1 and Tb2 are empty and
Ta1 : 81 94 11 96 12 35 17 99 28 58 41 75 15
• Read in 81 94 11 into computer’s internal storage and sort
them. The output is 11 81 94 which gets written onto Tb1
• Read in 96 12 35 into computer’s internal storage and sort
them. The output is 12 35 96 which gets written onto Tb2
• At the end of the sort phase the contents of the tape drives
are:
Ta1: 81 94 11 96 12 35 17 99 28 58 41 75 15
Ta2:
Tb1: 11 81 94 17 28 99 15
Tb2: 12 35 96 41 58 75
Although Ta1 contains data, we have sorted and copied the data on
the other pair of tape drives. Therefore, Ta1 is ready to be
overwritten
Two-Way Sort Phase Example
Two-way Sorting Algorithm:
Merge Phase
Algorithm
II. Merge Phase
1. Perform a merge sort reading the data from
the input pair of tape drives and writing the
data to the output pair of tape drives
2. While writing the data alternate between
the two tape drives of the output pair
3. Repeat steps 1 and 2 until nothing is
written into one of the output pair of tape
drives
Merge Phase Example
• At the end of the sort phase the contents of the tape drives are:
Ta1: 81 94 11 96 12 35 17 99 28 58 41 75
Ta2:
Tb1: 11 81 94 17 28 99 15
Tb2: 12 35 96 41 58 75
• Pass 1 Merge Phase:
– The input pair of drives for this pass is the b-pair and the output
pair is the a-pair
• After pass 1 of the merge phase we get:
Ta1: 11 12 35 81 94 96 15
Ta2: 17 28 41 58 75 99
Tb1: 11 81 94 17 28 99 15
Tb2: 12 96 35 41 58 75
• The a-pair now contains the latest merged data and the data in b-
pair can be overwritten in the next pass
Merge Phase Example (contd.)
Pass 2 Merge Phase:
• The input pair of drives for this pass is the a-pair and the
output pair is the b-pair
• After pass 2 of the merge phase we get:
Ta1: 11 12 35 81 94 96 15
Ta2: 17 28 41 58 75 99
Tb1: 11 12 17 28 35 41 58 75 81 94 96 99
Tb2: 15
• The b-pair now contains the latest merged data and the
data in a-pair can be overwritten
Merge Phase Example (contd.)
Pass 3 Merge Phase:
• The input pair of drives for this pass is the b-pair and the
output pair is the a-pair
• After pass 3 of the merge phase we get:
Ta1: 11 12 15 17 28 35 41 58 75 81 94 96 99
Ta2:
Tb1: 11 12 17 28 35 41 58 75 81 94 96 99
Tb2: 15
• The a-pair now contains the latest merged data and is Ta2
empty
• The stopping condition for the merge phase is reached
• No. of passes in Two-way Sorting = ceil(log ceil((N/M)))
– N = # input records
– M = # records that can fit inside internal storage of computer
Multi-way Merge Sorting
• Sort phase remains the same as in two-
way sorting
• In two-way sorting we did a 2-way merge
• In multi-way sorting we make a k-way
merge
• For this we need two groups of tape
drives
• Each group contains k tape drives giving
2*k tape drives in all
Multi-way Merge Sorting Example
• Problem: Finding the smallest element in the merge
phase requires (k-1) comparisons
• Solution: Use a heap to store the elements currently
pointed to in each tape drive
• Example: Same data as last example.
• We use 3-way merge that requires 2 groups, each of
three tape drives
• At the end of the sort phase we get
Ta1 , Ta2 , Ta3 : can be overwritten
Tb1 : 11 81 94 41 58 75
Tb2 : 12 35 96 15
Tb3 : 17 28 99
11
12 17
Multi-way Merge Sorting Example
• For the first merge pass
– The b-tape drives are the input drives
– The a-tape drives are the output drives
• We store the data pointed to currently on each input
tape drive as a heap in the computer’s internal
storage
• Initially, the heap would contain 11 12 17
• We do a deletmin() on the heap and write the
record returned by the deletmin() into Ta1
• The cur-pointer to Tb1 advances by one to point to 81
• We now have to insert 81 inside the heap
• The heap then becomes 12 17 81
• The next deletmin() yields 12 which is written on
Ta1
• We continue to write on Ta1 until we have written 9
records in it and then we switch to Ta2 for output
• In this phase we have combined three 3-element
data sets from the input b-tape drives into 9-
element data sets on the output a-tape drive
Ta1 : 11 12 17 28 35 81 94 96 99
Ta2 : 15 41 58 75
Ta3 :
T , T , T : can be overwritten
Multi-way Merge Sorting Example
Multi-way Merge Sorting Example
• In the second merge pass we will combine the contents of
Ta1, Ta2 , and Ta3 and write the merged data on the b-tape
drives
• Here, we will be combining three 9-element data sets from
the input a-tape drives into one 27-element data set on the
output b-tape drives
• We stop when, after a merge pass, (k-1) of the output tape
drives are empty
• After the second merge pass Tb1 contains all the 13
elements of the input data set and Tb2 and Tb3 are empty
• The stopping condition is reached
• No. of passes in k-way merge = ceil(log k ceil(N/M))
Example 5-Way
50 110 95|10 100 36|153 40 120|60 70 130|22 140 80
pass 1
Ta1
Ta2
Ta3
Ta4
Ta5
Tb1
Tb2
Tb3
Tb4
Tb5
50 95 110
10 36 100
40 120 153
60 70 130
22 80 140
10
22 40
60 50
50 110 95|10 100 36|153 40 120|60 70 130|22 140 80
pass 2
Ta1
Ta2
Ta3
Ta4
Ta5
Tb1
Tb2
Tb3
Tb4
Tb5
36 40 50 60 70 80 95 100 110 120 130 140 153
Poly-Phase Merge
• K-way merge requires 2*k tape drives
• We can reduce the number of tape drives if we
unevenly split the input data set (runs or # of M)
for each merge pass
• If there is an available tape drive, stop the merge
pass, and begin new merge pass
• For a 2-way merge the ratio of splitting input data
(runs) is guided by the Fibonacci number series:
ai+1 = ai + ai-1, a1 = a2 = 1
1 1 2 3 5 8 13 21 34…
Poly-Phase Merge
• Example:
– 13 runs split as 8 and 5
– 34 runs split as 21 and 13
• For non-fibonacci numbers add dummy runs to
reach the nearest Fibonacci number
• K-way poly-phase merge uses (k+1) tape drives
instead of 2*k tape drives
Example 1
50 110 95|10 100 36|153 40 120|60 70 130|22 140 80
pass 1
Ta1
Ta2
Ta3
50 95 110 |
10 36 100 |
40 120 153 |
60 70 130
22 80 140
pass 2
Ta1
Ta2
Ta3
10 36 50 95 100 110 | 40 60 70 120 130 153
22 80 140
pass 3
Ta1
Ta2
Ta3
pass 4
Ta1
Ta2
Ta3
10 22 36 50 80 95 100 110 140
40 60 70 120 130 153
36 40 50 60 70 80 95 100 110 120 130 140 153
Example 2
50 110 95|10 100 36|153 40 120|60 70 130
Pass 1 Internal Sort
Ta1
Ta2
Ta3
50 95 110 |
10 36 100 |
40 120 153
60 70 130
Pass 2 Merge
Ta1
Ta2
Ta3
10 36 50 95 100 110 | 40 60 70 120 130 153
Pass 3 Distribute
Ta1
Ta2
Ta3
Pass 4 Merge
Ta1
Ta2
Ta3
10 36 50 95 100 110
40 60 70 120 130 153
36 40 50 60 70 80 95 100 110 120 130 140 153
Thus, even runs will not increase the
speed of sorting
Replacement Selection
• Replacement Selection allows for initial
runs to contain more records than can fit
in memory
• When records are written to tape drive, the
internal memory is available
• If next record in the input tape is larger
than the record we have just output, then it
can be include in the run
Replacement Selection
Algorithm:
1. Read M records as a heap in the computer’s
internal storage
2. while (heap is not empty){
a. Perform deletemin() and send to output
b. while (next input record > last deletemin )
insert input record into heap
a. if (next input record < last deletemin)
store the record outside the heap
/* the region outside the heap in the
computer’s internal storage is called
dead space */
1. If there are more input records create a new heap
of M records and repeat step 2
Perform an external sorting with replacement
selection technique on the following data. Assume that
the memory can hold 4 records (M = 4) at a time and
there are 4 tape drives (Ta1, Ta2, Tb1, and Tb2). Initially
all data are stored in tape drive Ta1.
Tape
drive
Data
Ta1 55 94 11 6 12 35 17 99 28 58 41 75 15 38 19 100 8 80
Ta2
Tb1
Tb2
2. Perform the 2-way poly phase merge sort on the following sequence of
data which is stored in tape drive Ta1.
55 94 11 6 12 35 17 99 28 58 41 75 15 38 19 100 8 80
Initially, Ta2, Ta3, are empty and M = 3.
Tape drive Contents
Ta1 55 94 11 6 12 35 17 99 28 58 41 75 15 38 19 100 8 80
Ta2
Ta3
Write the table at each pass. (including sort phase, and merge phase)
1. Sort the following data using a merge sort. You should use divide and
conquer method described in class.
5 7 1 12 10 8 620 9

More Related Content

What's hot (20)

PPTX
DeadLock in Operating-Systems
Venkata Sreeram
 
PPTX
Dynamic multi level indexing Using B-Trees And B+ Trees
Pooja Dixit
 
PPT
Bubble sort
Manek Ar
 
PPTX
Hashing Technique In Data Structures
SHAKOOR AB
 
PPT
Unit 4 external sorting
DrkhanchanaR
 
PPTX
Deadlock dbms
Vardhil Patel
 
PPTX
stack & queue
manju rani
 
PDF
Binary Search - Design & Analysis of Algorithms
Drishti Bhalla
 
PDF
Algorithms Lecture 1: Introduction to Algorithms
Mohamed Loey
 
PPTX
Binary Search Tree in Data Structure
Dharita Chokshi
 
PPTX
trees in data structure
shameen khan
 
PDF
Token, Pattern and Lexeme
A. S. M. Shafi
 
PDF
Lecture 01 introduction to compiler
Iffat Anjum
 
PPTX
Knapsack Problem
Jenny Galino
 
PPT
Red black tree
Rajendran
 
PPTX
Segmentation in operating systems
Dr. Jasmine Beulah Gnanadurai
 
PPTX
Stacks IN DATA STRUCTURES
Sowmya Jyothi
 
PPTX
Threaded Binary Tree.pptx
pavankumarjakkepalli
 
PPTX
2 phase locking protocol DBMS
Dhananjaysinh Jhala
 
PPT
Divide and conquer
Dr Shashikant Athawale
 
DeadLock in Operating-Systems
Venkata Sreeram
 
Dynamic multi level indexing Using B-Trees And B+ Trees
Pooja Dixit
 
Bubble sort
Manek Ar
 
Hashing Technique In Data Structures
SHAKOOR AB
 
Unit 4 external sorting
DrkhanchanaR
 
Deadlock dbms
Vardhil Patel
 
stack & queue
manju rani
 
Binary Search - Design & Analysis of Algorithms
Drishti Bhalla
 
Algorithms Lecture 1: Introduction to Algorithms
Mohamed Loey
 
Binary Search Tree in Data Structure
Dharita Chokshi
 
trees in data structure
shameen khan
 
Token, Pattern and Lexeme
A. S. M. Shafi
 
Lecture 01 introduction to compiler
Iffat Anjum
 
Knapsack Problem
Jenny Galino
 
Red black tree
Rajendran
 
Segmentation in operating systems
Dr. Jasmine Beulah Gnanadurai
 
Stacks IN DATA STRUCTURES
Sowmya Jyothi
 
Threaded Binary Tree.pptx
pavankumarjakkepalli
 
2 phase locking protocol DBMS
Dhananjaysinh Jhala
 
Divide and conquer
Dr Shashikant Athawale
 

Viewers also liked (20)

PDF
Algorithms for External Memory Sorting
Milind Gokhale
 
PPT
external sorting
Jothi Lakshmi
 
PPT
Overview of query evaluation
avniS
 
PPTX
Merging files (Data Structure)
Tech_MX
 
PPT
Lec25
Nikhil Chilwant
 
PPTX
Merging
Shantanu Mishra
 
PPT
Sorting algos
Omair Imtiaz Ansari
 
PPTX
Merging
ihdsinfo
 
PPTX
Algorithm - Mergesort & Quicksort
Varendra University Rajshahi-bangladesh
 
PPT
Chapter15
gourab87
 
PPTX
Union find
Vyakhya Shrivastava
 
PPTX
Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.
Andrea Angella
 
PPT
Disjoint sets
Core Condor
 
PPT
chapter24.ppt
Tareq Hasan
 
PDF
8 query processing and optimization
Kumar
 
PPT
lecture 21
sajinsc
 
PPTX
Sorting techniques Anil Dutt
Anil Dutt
 
PPT
Sets and disjoint sets union123
Ankita Goyal
 
PPTX
Introduction to datastructure and algorithm
Pratik Mota
 
PPT
trabajo de cultural
argelures
 
Algorithms for External Memory Sorting
Milind Gokhale
 
external sorting
Jothi Lakshmi
 
Overview of query evaluation
avniS
 
Merging files (Data Structure)
Tech_MX
 
Merging
Shantanu Mishra
 
Sorting algos
Omair Imtiaz Ansari
 
Merging
ihdsinfo
 
Algorithm - Mergesort & Quicksort
Varendra University Rajshahi-bangladesh
 
Chapter15
gourab87
 
Advanced Algorithms #1 - Union/Find on Disjoint-set Data Structures.
Andrea Angella
 
Disjoint sets
Core Condor
 
chapter24.ppt
Tareq Hasan
 
8 query processing and optimization
Kumar
 
lecture 21
sajinsc
 
Sorting techniques Anil Dutt
Anil Dutt
 
Sets and disjoint sets union123
Ankita Goyal
 
Introduction to datastructure and algorithm
Pratik Mota
 
trabajo de cultural
argelures
 
Ad

Similar to 3.9 external sorting (20)

PPTX
Polyphase
Adrita Chakraborty
 
PPTX
Mergesort
SimoniShah6
 
PPT
Tri Merge Sorting Algorithm
Ashim Sikder
 
PPTX
Merge sort analysis and its real time applications
yazad dumasia
 
PPTX
SORT AND SEARCH ARRAY WITH WITH C++.pptx
narifmsit18seecs
 
PPTX
Different Searching and Sorting Methods.pptx
Minakshee Patil
 
PPTX
Marge Sort
Ankit92Chitnavis
 
PPTX
sorting.pptx
DrRanjeetKumar51721
 
PPTX
Daa final
Gagan019
 
PPTX
Merge Sort (w/ principle, algorithm, code, visualizations)
JatinBhat4
 
PPTX
DFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxx
rajinevitable05
 
PPTX
2.Problem Solving Techniques and Data Structures.pptx
Ganesh Bhosale
 
PPT
Quicksort
Gayathri Gaayu
 
PPTX
Data structure using c module 3
smruti sarangi
 
PPTX
sorting-160810203705.pptx
VarchasvaTiwari2
 
PDF
PROPOSAL OF A TWO WAY SORTING ALGORITHM AND PERFORMANCE COMPARISON WITH EXIST...
IJCSEA Journal
 
PDF
Sorting algorithms bubble sort to merge sort.pdf
AyeshaMazhar21
 
PPTX
Insertion and merge sort
Preetham Devisetty
 
PPTX
Radix and Merge Sort
Gelo Maribbay
 
PPTX
Data structure 8.pptx
SajalFayyaz
 
Mergesort
SimoniShah6
 
Tri Merge Sorting Algorithm
Ashim Sikder
 
Merge sort analysis and its real time applications
yazad dumasia
 
SORT AND SEARCH ARRAY WITH WITH C++.pptx
narifmsit18seecs
 
Different Searching and Sorting Methods.pptx
Minakshee Patil
 
Marge Sort
Ankit92Chitnavis
 
sorting.pptx
DrRanjeetKumar51721
 
Daa final
Gagan019
 
Merge Sort (w/ principle, algorithm, code, visualizations)
JatinBhat4
 
DFC30233_CHAPTER 6 (1).pptxxxxxxxxxxxxxxxxxxxxxxxx
rajinevitable05
 
2.Problem Solving Techniques and Data Structures.pptx
Ganesh Bhosale
 
Quicksort
Gayathri Gaayu
 
Data structure using c module 3
smruti sarangi
 
sorting-160810203705.pptx
VarchasvaTiwari2
 
PROPOSAL OF A TWO WAY SORTING ALGORITHM AND PERFORMANCE COMPARISON WITH EXIST...
IJCSEA Journal
 
Sorting algorithms bubble sort to merge sort.pdf
AyeshaMazhar21
 
Insertion and merge sort
Preetham Devisetty
 
Radix and Merge Sort
Gelo Maribbay
 
Data structure 8.pptx
SajalFayyaz
 
Ad

More from Krish_ver2 (20)

PPT
5.5 back tracking
Krish_ver2
 
PPT
5.5 back track
Krish_ver2
 
PPT
5.5 back tracking 02
Krish_ver2
 
PPT
5.4 randomized datastructures
Krish_ver2
 
PPT
5.4 randomized datastructures
Krish_ver2
 
PPT
5.4 randamized algorithm
Krish_ver2
 
PPT
5.3 dynamic programming 03
Krish_ver2
 
PPT
5.3 dynamic programming
Krish_ver2
 
PPT
5.3 dyn algo-i
Krish_ver2
 
PPT
5.2 divede and conquer 03
Krish_ver2
 
PPT
5.2 divide and conquer
Krish_ver2
 
PPT
5.2 divede and conquer 03
Krish_ver2
 
PPT
5.1 greedyyy 02
Krish_ver2
 
PPT
5.1 greedy
Krish_ver2
 
PPT
5.1 greedy 03
Krish_ver2
 
PPT
4.4 hashing02
Krish_ver2
 
PPT
4.4 hashing
Krish_ver2
 
PPT
4.4 hashing ext
Krish_ver2
 
PPT
4.4 external hashing
Krish_ver2
 
PPT
4.2 bst
Krish_ver2
 
5.5 back tracking
Krish_ver2
 
5.5 back track
Krish_ver2
 
5.5 back tracking 02
Krish_ver2
 
5.4 randomized datastructures
Krish_ver2
 
5.4 randomized datastructures
Krish_ver2
 
5.4 randamized algorithm
Krish_ver2
 
5.3 dynamic programming 03
Krish_ver2
 
5.3 dynamic programming
Krish_ver2
 
5.3 dyn algo-i
Krish_ver2
 
5.2 divede and conquer 03
Krish_ver2
 
5.2 divide and conquer
Krish_ver2
 
5.2 divede and conquer 03
Krish_ver2
 
5.1 greedyyy 02
Krish_ver2
 
5.1 greedy
Krish_ver2
 
5.1 greedy 03
Krish_ver2
 
4.4 hashing02
Krish_ver2
 
4.4 hashing
Krish_ver2
 
4.4 hashing ext
Krish_ver2
 
4.4 external hashing
Krish_ver2
 
4.2 bst
Krish_ver2
 

Recently uploaded (20)

PPTX
How to Configure Refusal of Applicants in Odoo 18 Recruitment
Celine George
 
PPTX
AIMA UCSC-SV Leadership_in_the_AI_era 20250628 v16.pptx
home
 
PDF
Indian National movement PPT by Simanchala Sarab, Covering The INC(Formation,...
Simanchala Sarab, BABed(ITEP Secondary stage) in History student at GNDU Amritsar
 
PPTX
The Gift of the Magi by O Henry-A Story of True Love, Sacrifice, and Selfless...
Beena E S
 
PDF
Android Programming - Basics of Mobile App, App tools and Android Basics
Kavitha P.V
 
PPTX
ENGlish 8 lesson presentation PowerPoint.pptx
marawehsvinetshe
 
PPTX
PLANNING A HOSPITAL AND NURSING UNIT.pptx
PRADEEP ABOTHU
 
PDF
DIGESTION OF CARBOHYDRATES ,PROTEINS AND LIPIDS
raviralanaresh2
 
PDF
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
PPTX
ENG8_Q1_WEEK2_LESSON1. Presentation pptx
marawehsvinetshe
 
PPTX
Aerobic and Anaerobic respiration and CPR.pptx
Olivier Rochester
 
PPTX
How to Create & Manage Stages in Odoo 18 Helpdesk
Celine George
 
PPTX
Ward Management: Patient Care, Personnel, Equipment, and Environment.pptx
PRADEEP ABOTHU
 
PDF
Our Guide to the July 2025 USPS® Rate Change
Postal Advocate Inc.
 
PPTX
PLANNING FOR EMERGENCY AND DISASTER MANAGEMENT ppt.pptx
PRADEEP ABOTHU
 
PDF
I3PM Case study smart parking 2025 with uptoIP® and ABP
MIPLM
 
PDF
Lesson 1 - Nature of Inquiry and Research.pdf
marvinnbustamante1
 
PDF
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
 
PPTX
Nitrogen rule, ring rule, mc lafferty.pptx
nbisen2001
 
PPTX
GENERAL BIOLOGY 1 - Subject Introduction
marvinnbustamante1
 
How to Configure Refusal of Applicants in Odoo 18 Recruitment
Celine George
 
AIMA UCSC-SV Leadership_in_the_AI_era 20250628 v16.pptx
home
 
Indian National movement PPT by Simanchala Sarab, Covering The INC(Formation,...
Simanchala Sarab, BABed(ITEP Secondary stage) in History student at GNDU Amritsar
 
The Gift of the Magi by O Henry-A Story of True Love, Sacrifice, and Selfless...
Beena E S
 
Android Programming - Basics of Mobile App, App tools and Android Basics
Kavitha P.V
 
ENGlish 8 lesson presentation PowerPoint.pptx
marawehsvinetshe
 
PLANNING A HOSPITAL AND NURSING UNIT.pptx
PRADEEP ABOTHU
 
DIGESTION OF CARBOHYDRATES ,PROTEINS AND LIPIDS
raviralanaresh2
 
STATEMENT-BY-THE-HON.-MINISTER-FOR-HEALTH-ON-THE-COVID-19-OUTBREAK-AT-UG_revi...
nservice241
 
ENG8_Q1_WEEK2_LESSON1. Presentation pptx
marawehsvinetshe
 
Aerobic and Anaerobic respiration and CPR.pptx
Olivier Rochester
 
How to Create & Manage Stages in Odoo 18 Helpdesk
Celine George
 
Ward Management: Patient Care, Personnel, Equipment, and Environment.pptx
PRADEEP ABOTHU
 
Our Guide to the July 2025 USPS® Rate Change
Postal Advocate Inc.
 
PLANNING FOR EMERGENCY AND DISASTER MANAGEMENT ppt.pptx
PRADEEP ABOTHU
 
I3PM Case study smart parking 2025 with uptoIP® and ABP
MIPLM
 
Lesson 1 - Nature of Inquiry and Research.pdf
marvinnbustamante1
 
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
 
Nitrogen rule, ring rule, mc lafferty.pptx
nbisen2001
 
GENERAL BIOLOGY 1 - Subject Introduction
marvinnbustamante1
 

3.9 external sorting

  • 1. External Sorting • Used when the data to be sorted is so large that we cannot use the computer’s internal storage (main memory) to store it • We use secondary storage devices to store the data • The secondary storage devices we discuss here are tape drives. Any other storage device such as disk arrays, etc. can be used
  • 2. Two-way Sorting • Assumptions: – Computer’s internal storage can hold three records at a time – We denote the internal storage capacity by M. Here we have M=3 – We only denote the integer key of every record – We use four tape drives. One pair of tape drives is denoted by Ta1 and Ta2 and the other pair is denoted by Tb1 and Tb2 – Initially, all the records that have to be sorted are on Ta1. Ta2 , Tb1 and Tb2 are empty Example (RAM = 128 MB)
  • 3. Two-way Sorting Algorithm: Sort Phase Algorithm: I. Sort Phase 1. Read M records from one pair of tape drives. Initially, all the records are present only on one tape drive 2. Sort the M records in the computer’s internal storage. If M is small (< 10) use insertion sort. For larger values of M use quick sort. 3. Write the M sorted records into the other pair of tape drives (i.e., the pair which does not contain the input records). While writing the records, alternate between the two tape drives of that pair. 4. Repeat steps 1-3 until the end of input
  • 4. Initially, Ta2 , Tb1 and Tb2 are empty and Ta1 : 81 94 11 96 12 35 17 99 28 58 41 75 15 • Read in 81 94 11 into computer’s internal storage and sort them. The output is 11 81 94 which gets written onto Tb1 • Read in 96 12 35 into computer’s internal storage and sort them. The output is 12 35 96 which gets written onto Tb2 • At the end of the sort phase the contents of the tape drives are: Ta1: 81 94 11 96 12 35 17 99 28 58 41 75 15 Ta2: Tb1: 11 81 94 17 28 99 15 Tb2: 12 35 96 41 58 75 Although Ta1 contains data, we have sorted and copied the data on the other pair of tape drives. Therefore, Ta1 is ready to be overwritten Two-Way Sort Phase Example
  • 5. Two-way Sorting Algorithm: Merge Phase Algorithm II. Merge Phase 1. Perform a merge sort reading the data from the input pair of tape drives and writing the data to the output pair of tape drives 2. While writing the data alternate between the two tape drives of the output pair 3. Repeat steps 1 and 2 until nothing is written into one of the output pair of tape drives
  • 6. Merge Phase Example • At the end of the sort phase the contents of the tape drives are: Ta1: 81 94 11 96 12 35 17 99 28 58 41 75 Ta2: Tb1: 11 81 94 17 28 99 15 Tb2: 12 35 96 41 58 75 • Pass 1 Merge Phase: – The input pair of drives for this pass is the b-pair and the output pair is the a-pair • After pass 1 of the merge phase we get: Ta1: 11 12 35 81 94 96 15 Ta2: 17 28 41 58 75 99 Tb1: 11 81 94 17 28 99 15 Tb2: 12 96 35 41 58 75 • The a-pair now contains the latest merged data and the data in b- pair can be overwritten in the next pass
  • 7. Merge Phase Example (contd.) Pass 2 Merge Phase: • The input pair of drives for this pass is the a-pair and the output pair is the b-pair • After pass 2 of the merge phase we get: Ta1: 11 12 35 81 94 96 15 Ta2: 17 28 41 58 75 99 Tb1: 11 12 17 28 35 41 58 75 81 94 96 99 Tb2: 15 • The b-pair now contains the latest merged data and the data in a-pair can be overwritten
  • 8. Merge Phase Example (contd.) Pass 3 Merge Phase: • The input pair of drives for this pass is the b-pair and the output pair is the a-pair • After pass 3 of the merge phase we get: Ta1: 11 12 15 17 28 35 41 58 75 81 94 96 99 Ta2: Tb1: 11 12 17 28 35 41 58 75 81 94 96 99 Tb2: 15 • The a-pair now contains the latest merged data and is Ta2 empty • The stopping condition for the merge phase is reached • No. of passes in Two-way Sorting = ceil(log ceil((N/M))) – N = # input records – M = # records that can fit inside internal storage of computer
  • 9. Multi-way Merge Sorting • Sort phase remains the same as in two- way sorting • In two-way sorting we did a 2-way merge • In multi-way sorting we make a k-way merge • For this we need two groups of tape drives • Each group contains k tape drives giving 2*k tape drives in all
  • 10. Multi-way Merge Sorting Example • Problem: Finding the smallest element in the merge phase requires (k-1) comparisons • Solution: Use a heap to store the elements currently pointed to in each tape drive • Example: Same data as last example. • We use 3-way merge that requires 2 groups, each of three tape drives • At the end of the sort phase we get Ta1 , Ta2 , Ta3 : can be overwritten Tb1 : 11 81 94 41 58 75 Tb2 : 12 35 96 15 Tb3 : 17 28 99 11 12 17
  • 11. Multi-way Merge Sorting Example • For the first merge pass – The b-tape drives are the input drives – The a-tape drives are the output drives • We store the data pointed to currently on each input tape drive as a heap in the computer’s internal storage • Initially, the heap would contain 11 12 17 • We do a deletmin() on the heap and write the record returned by the deletmin() into Ta1 • The cur-pointer to Tb1 advances by one to point to 81 • We now have to insert 81 inside the heap
  • 12. • The heap then becomes 12 17 81 • The next deletmin() yields 12 which is written on Ta1 • We continue to write on Ta1 until we have written 9 records in it and then we switch to Ta2 for output • In this phase we have combined three 3-element data sets from the input b-tape drives into 9- element data sets on the output a-tape drive Ta1 : 11 12 17 28 35 81 94 96 99 Ta2 : 15 41 58 75 Ta3 : T , T , T : can be overwritten Multi-way Merge Sorting Example
  • 13. Multi-way Merge Sorting Example • In the second merge pass we will combine the contents of Ta1, Ta2 , and Ta3 and write the merged data on the b-tape drives • Here, we will be combining three 9-element data sets from the input a-tape drives into one 27-element data set on the output b-tape drives • We stop when, after a merge pass, (k-1) of the output tape drives are empty • After the second merge pass Tb1 contains all the 13 elements of the input data set and Tb2 and Tb3 are empty • The stopping condition is reached • No. of passes in k-way merge = ceil(log k ceil(N/M))
  • 14. Example 5-Way 50 110 95|10 100 36|153 40 120|60 70 130|22 140 80 pass 1 Ta1 Ta2 Ta3 Ta4 Ta5 Tb1 Tb2 Tb3 Tb4 Tb5 50 95 110 10 36 100 40 120 153 60 70 130 22 80 140 10 22 40 60 50 50 110 95|10 100 36|153 40 120|60 70 130|22 140 80
  • 15. pass 2 Ta1 Ta2 Ta3 Ta4 Ta5 Tb1 Tb2 Tb3 Tb4 Tb5 36 40 50 60 70 80 95 100 110 120 130 140 153
  • 16. Poly-Phase Merge • K-way merge requires 2*k tape drives • We can reduce the number of tape drives if we unevenly split the input data set (runs or # of M) for each merge pass • If there is an available tape drive, stop the merge pass, and begin new merge pass • For a 2-way merge the ratio of splitting input data (runs) is guided by the Fibonacci number series: ai+1 = ai + ai-1, a1 = a2 = 1 1 1 2 3 5 8 13 21 34…
  • 17. Poly-Phase Merge • Example: – 13 runs split as 8 and 5 – 34 runs split as 21 and 13 • For non-fibonacci numbers add dummy runs to reach the nearest Fibonacci number • K-way poly-phase merge uses (k+1) tape drives instead of 2*k tape drives
  • 18. Example 1 50 110 95|10 100 36|153 40 120|60 70 130|22 140 80 pass 1 Ta1 Ta2 Ta3 50 95 110 | 10 36 100 | 40 120 153 | 60 70 130 22 80 140 pass 2 Ta1 Ta2 Ta3 10 36 50 95 100 110 | 40 60 70 120 130 153 22 80 140
  • 19. pass 3 Ta1 Ta2 Ta3 pass 4 Ta1 Ta2 Ta3 10 22 36 50 80 95 100 110 140 40 60 70 120 130 153 36 40 50 60 70 80 95 100 110 120 130 140 153
  • 20. Example 2 50 110 95|10 100 36|153 40 120|60 70 130 Pass 1 Internal Sort Ta1 Ta2 Ta3 50 95 110 | 10 36 100 | 40 120 153 60 70 130 Pass 2 Merge Ta1 Ta2 Ta3 10 36 50 95 100 110 | 40 60 70 120 130 153
  • 21. Pass 3 Distribute Ta1 Ta2 Ta3 Pass 4 Merge Ta1 Ta2 Ta3 10 36 50 95 100 110 40 60 70 120 130 153 36 40 50 60 70 80 95 100 110 120 130 140 153 Thus, even runs will not increase the speed of sorting
  • 22. Replacement Selection • Replacement Selection allows for initial runs to contain more records than can fit in memory • When records are written to tape drive, the internal memory is available • If next record in the input tape is larger than the record we have just output, then it can be include in the run
  • 23. Replacement Selection Algorithm: 1. Read M records as a heap in the computer’s internal storage 2. while (heap is not empty){ a. Perform deletemin() and send to output b. while (next input record > last deletemin ) insert input record into heap a. if (next input record < last deletemin) store the record outside the heap /* the region outside the heap in the computer’s internal storage is called dead space */ 1. If there are more input records create a new heap of M records and repeat step 2
  • 24. Perform an external sorting with replacement selection technique on the following data. Assume that the memory can hold 4 records (M = 4) at a time and there are 4 tape drives (Ta1, Ta2, Tb1, and Tb2). Initially all data are stored in tape drive Ta1. Tape drive Data Ta1 55 94 11 6 12 35 17 99 28 58 41 75 15 38 19 100 8 80 Ta2 Tb1 Tb2
  • 25. 2. Perform the 2-way poly phase merge sort on the following sequence of data which is stored in tape drive Ta1. 55 94 11 6 12 35 17 99 28 58 41 75 15 38 19 100 8 80 Initially, Ta2, Ta3, are empty and M = 3. Tape drive Contents Ta1 55 94 11 6 12 35 17 99 28 58 41 75 15 38 19 100 8 80 Ta2 Ta3 Write the table at each pass. (including sort phase, and merge phase) 1. Sort the following data using a merge sort. You should use divide and conquer method described in class. 5 7 1 12 10 8 620 9