Parallel Algorithms
Parallel Algorithms
Batcher
Professor, Kent State University
http://www.cs.kent.edu/~batcher
“Sorting networks and their applications”, AFIPS Proc. of 1968
Spring Joint Computer Conference, Vol. 32, pp 307-314.
2 Background
0 Sorting is fundamental
1 Low bound of any sequential sorting algorithms is O(nlogn)
8 Can we improve the time complexity further?
– Parallel algorithms
– Circuit/Network Design
– Parallel Computing Models
2 ①Bitonic Sequence 双调序列
0
1 sequence of elements {a0, a1, …, an-1} where
8 either
– (1) there exists an index, i, 0 i n-1, such that {a0,
…, ai} is monotonically increasing, and {ai+1, …, an-1}
is monotonically decreasing,
– e.g. {1, 2, 4, 7, 6, 0}
Or
– (2) there exists a cyclic shift of indices so that (1) is
satisfied
– e.g. {8, 9, 2, 1, 0, 4} {0, 4, 8, 9, 2, 1}
2 ①Bitonic Sequence : Examples
0
1 Value of
8 element { 3, 5, 7, 9, 8, 6, 4, 2 }
a0 a1 a2 a3 a4 a5 a6 a7 ai
Value of
element
{ 8, 6, 4, 2, 3, 5, 7, 9}
a0 a1 a2 a3 a4 a5 a6 a7 ai
2 ①Bitonic Sequence : Examples
0
1 Value of
8 element
{ 3, 5, 7, 9, 11, 13, 15, 17 }
a0 a1 a2 a3 a4 a5 a6 a7 ai
Value of
element
{ 5, 3, 1, 2, 4, 6, 8, 7 }
a0 a1 a2 a3 a4 a5 a6 a7 ai
2 Bitonic Sort: basic idea
0
1 Consider a bitonic sequence S of size n where
8
– the first half ( {a0, a1, …, an/2-1} ) is increasing, and
the second half ( {an/2, an/2+1, …, an-1} ) is decreasing
Value of
element
an/2 S2
an/2-1 value
value
a0 an-1 S1
There exists
2 – an element b in S1 such that all elements before b is
0 increasing and all elements after b is decreasing
1 – an element c in S2 such that all elements before c is
8 decreasing and all elements after c is increasing
S1 and S2
– Both S1 and S2 are bitonic sequences
– Any elements in S1 < any elements in S2 (because b <
c and b is the maximum value in S1 and c is the
minimum value in S2)
S2
value c
S1 b
2 pair-wise min-max comparison
0
1
8 e.g. { 2, 4, 6, 8, 7, 5, 3, 1}
{ 2, 4, 6, 8
Compare and exchange
7, 5, 3, 1 }
=> S1={2, 4, 3, 1}
S2={7, 5, 6, 8}
bitonic sequence of size 8
=> 2 bitonic sequence of size 4
2 ②Bitonic Split
0
1
8
The split is applicable to any bitonic sequence.
Need not to have the 1st half to be
increasing/decreasing and the 2nd half to be
decreasing/increasing:
2
0
1
8
16
16
2 Sort (any ordered of) sequence
0
1
8 Using bitonic merge repeatedly
Definition:
BM[n]: increasing bitonic merge of size n
• bitonic merge : sort a bitonic sequence of size n into a
monotonically increasing sequence
BM[n]: decreasing bitonic merge of size n
• bitonic merge that sort a bitonic sequence of size n into a
monotonically decreasing sequence
2 Steps:
0
1
Divide the sequence into a group of 2
8 – any sequence of size 2 is a bitonic sequence: either
the increasing part is of size 2 and the decreasing
part is of size 0, or vice versa
Using BM[2] on a group to form an
increasing sequence, and BM[2] on the
adjacent group to form an decreasing sequence
Concatenate the two group to form a bitonic
sequence of size 4
2 Steps:
0
1 Repeat the above steps on other groups
8 Repeat the above steps recursively, until a
bitonic sequence of size n is formed
Using bitonic merge again to turn the bitonic
sequence into a sorted sequence
Bitonic Sorting Circuit: BS(18)
2
0
1
8
Hypercube connections!
Try to Write Bitonic Sorting algorithm on hypercube.
27
Bitonic Sort on Butterfly
2
0
1
8
28
Bitonic Sort on Butterfly
2
0
1
8
29
Bitonic Sort on Butterfly
2
0
1
8
30
Bitonic Sort on Butterfly
2
0
1
8
31
Bitonic Sort on Butterfly
2
0
1
8
32
Bitonic Sort on Butterfly
2
0
1
8
33
Bitonic Sort on Butterfly
2
0
1
8
34
Bitonic Sort on Butterfly
2
0
1
8
35
Bitonic Sort on Butterfly
2
0
1
8
36
Bitonic Sort on Butterfly
2
0
1
8
37
Bitonic Sort on Butterfly
2
0
1
8
38
Bitonic Sort on Butterfly
2
0
1
8
39
2 PRAM Model
0
1 P1 P2 P3 … Pn
8
Memory
Access time from any processor to any memory
unit is equal
It is impossible in practice
So it is an ideal model for parallel computing
Let focus only on algorithm design
2 PRAM Model
0
1
8
0
1
8
a(1) a(2) a(3) a(4) a(1)+a(5) a(2)+a(6) a(3)+a(7) a(4)+a(8)
a(1)+a(2)+a(3)+a(4)
a(1)+a(2)+a(3)+a(4)+a(5)+a(6) +a(5)+a(6)+a(7)+a(8)
a(1)+a(2)+a(3)+a(4)
a(1)+a(2)+a(3)+a(4)
a(1) a(1)+a(2) a(1)+a(2)+a(3) a(1)+a(2)+a(3)+a(4)+a(5) +a(5)+a(6)+a(7)
2 Hypercube Model
0
Suppose node N(x) holds element a(x), where x is the value of node index x 1x2…xn
1
8
for i = 1 to n
for j = i to n
parallel do
N(00…0 (xj=0) xj+1…xn) N(00…0 (xj=1) xj+1…xn);
a(00…0 (xj=0) xj+1…xn) =
a(00…0 (xj=0) xj+1…xn) + a(00…0 (xj=1) xj+1…xn)
endpar
endfor
endfor