0% found this document useful (0 votes)
60 views

Parallel Sorting Algorithms

Parallel sorting algorithms can achieve optimal parallel time complexity of O(log n) using n processors, compared to the optimal sequential time of O(n log n). Popular parallel sorting algorithms include odd-even transposition sort and bitonic sort. Bitonic sort works by first merging elements into bitonic sequences then repeatedly applying a binary split operation to sort the list. For n elements on p processors, bitonic sort runs in O(log n) time. When n is much larger than p, the time is the local sorting time plus the time for parallel bitonic merges, which is O(n/p log p).

Uploaded by

accang mubariz
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Parallel Sorting Algorithms

Parallel sorting algorithms can achieve optimal parallel time complexity of O(log n) using n processors, compared to the optimal sequential time of O(n log n). Popular parallel sorting algorithms include odd-even transposition sort and bitonic sort. Bitonic sort works by first merging elements into bitonic sequences then repeatedly applying a binary split operation to sort the list. For n elements on p processors, bitonic sort runs in O(log n) time. When n is much larger than p, the time is the local sorting time plus the time for parallel bitonic merges, which is O(n/p log p).

Uploaded by

accang mubariz
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Parallel Sorting Algorithms

1
Potential Speedup

O(nlogn) optimal sequential sorting algorithm

Best we can expect based upon a sequential sorting algorithm


using n processors is:

O( n log n)
optimal parallel time complexity O(log n)
n

2
Compare-and-Exchange
Sorting Algorithms
Form the basis of several, if not most, classical sequential sorting
algorithms.

Two numbers, say A and B, are compared between P0 and P1.

P0 P1

A B
MA
MIN
X

3
Compare-and-Exchange Two Sublists

4
Odd-Even Transposition Sort - example

Parallel time complexity: Tpar = O(n) (for P=n) 5


Odd-Even Transposition Sort Example (N >> P)

Each PE gets n/p numbers. First, PEs sort n/p locally, then they run
odd-even trans. algorithm each time doing a merge-split for 2n/p numbers.

P0 P1 P2 P3
13 7 12 8 5 4 6 1 3 9 2 10

Local sort
7 12 13 4 5 8 1 3 6 2 9 10
O-E

4 5 7 8 12 13 1 2 3 6 9 10
E-O

4 5 7 1 2 3 8 12 13 6 9 10
O-E

1 2 3 4 5 7 6 8 9 10 12 13
E-O

SORTED: 1 2 3 4 5 6 7 8 9 10 12 13

Time complexity: Tpar = (Local Sort) + (p merge-splits) +(p exchanges)

Tpar = (n/p)log(n/p) + p*(n/p) + p*(n/p) = (n/p)log(n/p) + 2n 6


Parallelizing Mergesort

7
Mergesort - Time complexity
Sequential :
n n n
Tseq 1* n 2 * 2 * 2 2 * log n
2 log n

2 2 2
Tseq O( n log n)

Parallel :
n n n n
T par 2 0 1 2 k 2 1
2 2 2 2

2n 20 2 1 2 2 2 log n
T par O( 4n)
8
Bitonic Mergesort

Bitonic Sequence
A bitonic sequence is defined as a list with no more than one
LOCAL MAXIMUM and no more than one LOCAL MINIMUM.
(Endpoints must be considered - wraparound )

9
A bitonic sequence is a list with no more than one LOCAL
MAXIMUM and no more than one LOCAL MINIMUM.
(Endpoints must be considered - wraparound )

This is ok!
1 Local MAX; 1 Local MIN
The list is bitonic!

This is NOT bitonic! Why?

1 Local MAX; 2 Local MINs

10
Binary Split
1. Divide the bitonic list into two equal halves.
2. Compare-Exchange each item on the first half
with the corresponding item in the second half.

Result:
Two bitonic sequences where the numbers in one sequence are all less
than the numbers in the other sequence. 11
Repeated application of binary split
Bitonic list:
24 20 15 9 4 2 5 8 | 10 11 12 13 22 30 32 45

Result after Binary-split:


10 11 12 9 4 2 5 8 | 24 20 15 13 22 30 32
45

If you keep applying the BINARY-SPLIT to each half repeatedly, you


will get a SORTED LIST !

10 11 12 9 . 4 2 5 8 | 24 20 15 13 . 22 30 32
45
4 2 . 5 8 10 11 . 12 9 | 22 20 . 15 13 24 30 . 32
45
4 . 2 5 . 8 10 . 9 12 .11 15 . 13 22 . 20 24 . 30 32 .
45
Sorting a bitonic sequence
Compare-and-exchange moves smaller numbers of each pair to left
and larger numbers of pair to right.
Given a bitonic sequence,
recursively performing binary split will sort the list.

13
Sorting an arbitrary sequence

To sort an unordered sequence, sequences are merged into larger bitonic


sequences, starting with pairs of adjacent numbers.

By a compare-and-exchange operation, pairs of adjacent numbers


formed into increasing sequences and decreasing sequences. Pairs form
a bitonic sequence of twice the size of each original sequences.

By repeating this process, bitonic sequences of larger and larger lengths


obtained.

In the final step, a single bitonic sequence sorted into a single increasing
sequence.

14
Bitonic Sort
Step No. Processor No.

000 001 010 011 100 101 110 111

1 L H H L L H H L

2 L L H H H H L L

3 L H L H H L H L

4 L L L L H H H H

5 L L H H L L H H

6 L H L H L H L H

Figure 2: Six phases of Bitonic Sort on a hypercube of dimension 3


15
P0 P1 Bitonic
P2 P sort P
(for N P= P)
3 4 5 P6 P7
000 001 010 011 100 101 110 111

K G J M C A N F

Lo Hi Hi Lo Lo Hi
High Low
G K M J A C N F

L L H H H H
L L
G J M K N F A C

L H L H H L
H L
G J K M N F C A

L L L L H H
H H
G F C A N J K M

L L H H L L
H H
C A G F K J N M

16
A C F G J K M N
Number of steps (P=n)

In general, with n = 2k, there are k phases, each of 1, 2, 3, , k steps.


Hence the total number of steps is:

i log n
log n(log n 1)
T bitonic
par i O(log n)
2

i 1 2

17
Bitonic sort (for N >> P)

x x x x x x x x x x x x x x x x x x x x x x x x
x x x x x x x x

18
P0 P1 P2 P3 P4 P5 P6 P7
000 001 010 011 100 101 110 111
Bitonic sort (for N >> P)
2 7 4 13 6 9 4 18 5 12 1 7 6 3 14 11 6 8 4 10 5 2 15
17

Local Sort (ascending):


2 4 7 6 9 13 4 5 18 1 7 12 3 6 14 6 8 11 4 5 10 2 15
17

L H H L L H High Low

2 4 6 7 9 13 7 12 18 1 4 5 3 6 6 8 11 14 10 15 17 2 4 5

L L H H H H L L
2 4 6 1 4 5 7 12 18 7 9 13 10 15 17 8 11 14 3 6 6 2 4
5

L H L H H L H L
1 2 4 4 5 6 7 7 9 12 13 18 14 15 17 8 10 11 5 6 6 2 3
4

L L L L H H H H
1 2 4 4 5 6 5 6 6 2 3 4 14 15 17 8 10 11 7 7 9 12 13
18

L L H H L L H H
1 2 4 2 3 4 5 6 6 4 5 6 7 7 9 8 10 11 14 15 17 12 13
18
Number of steps (for N >> P)
bitonic
T par Local Sort Parallel Bitonic Merge
N N N
log 2 (1 2 3 ... log P)
P P P
N N log P (1 log P )
{log 2( )}
P P 2
N
(log N log P log P log P)2

P
N
T bitonic
par (log N log P )
2

P 20
Parallel sorting - summary

Computational time complexity using P=n processors

Odd-even transposition sort - O(n)

Parallel mergesort - O(n)


unbalanced processor load and Communication

Bitonic Mergesort - O(log2n) (** BEST! **)

Parallel Shearsort - O(n logn) (* covered later *)

Parallel Rank sort - O(n) (for P=n) (* covered later *)

21
Sorting on Specific Networks

Two network structures have received special attention:


mesh and hypercube
Parallel computers have been built with these networks.

However, it is of less interest nowadays because networks got


faster and clusters became a viable option.

Besides, network architecture is often hidden from the user.

MPI provides libraries for mapping algorithms onto meshes,


and one can always use a mesh or hypercube algorithm even if
the underlying architecture is not one of them.
22
Two-Dimensional Sorting on a Mesh

The layout of a sorted sequence on a mesh could be row by row or


snakelike:

23
Shearsort
Alternate row and column sorting until list is fully sorted.
Alternate row directions to get snake-like sorting:

24
Shearsort Time complexity

On a n x n Mesh, it takes 2log n phases to sort n2 numbers.


Therefore:
shearsort
T par O ( n log n) on a n x n mesh

Since sorting n2 numbers sequentially takes Tseq = O(n2 log n);

Tseq
Speedupshearsort O ( n) (for P n 2 )
T par

1
However, efficiency
n

25
Rank Sort

Number of elements that are smaller than each selected element is


counted. This count provides the position of the selected number, its
rank in the sorted list.

First a[0] is read and compared with each of the other numbers,
a[1] a[n-1], recording the number of elements less than a[0].

Suppose this number is x. This is the index of a[0] in the


final sorted list.

The number a[0] is copied into the final sorted list b[0] b[n-1],
at location b[x]. Actions repeated with the other numbers.

Overall sequential time complexity of rank sort: Tseq = O(n2)


(not a good sequential sorting algorithm!)
26
Sequential code
for (i = 0; i < n; i++) { /* for each number */
x = 0;
for (j = 0; j < n; j++) /* count number less than it */
if (a[i] > a[j]) x++;

b[x] = a[i]; /* copy number into correct place */


}
*This code needs to be fixed if duplicates exist in the sequence.

sequential time complexity of rank sort: Tseq = O(n2)

27
Parallel Rank Sort (P=n)
One number is assigned to each processor.
Pi finds the final index of a[i] in O(n) steps.

forall (i = 0; i < n; i++) { /* for each no. in parallel*/


x = 0;
for (j = 0; j < n; j++) /* count number less than it */
if (a[i] > a[j]) x++;
b[x] = a[i]; /* copy no. into correct place */
}

Parallel time complexity, O(n), as good as any sorting algorithm so


far. Can do even better if we have more processors.

Parallel time complexity: Tpar = O(n) (for P=n)

28
Parallel Rank Sort with P = n2
Use n processors to find the rank of one element. The final count,
i.e. rank of a[i] can be obtained using a binary addition operation
(global sum MPI_Reduce())

Time complexity
(for P=n2):

Tpar = O(log n)

Can we do it in O(1) ?

29

You might also like