Chapter 03
Chapter 03
Multiprocessors, and
Multicomputers
Processor Organizations
(interconnection networks)
Mesh
Binary Trees
Hypertrees
Hypercube
Pyramid
Butterfly
Cube-Connected Cycles
Shuffle Exhange
De Brujin
Flynn’s Taxonomy
Processor Arrays
Multiprocessors
Multicomputers
Scaled Speedup & Parallelizability
Processor organization
(introduction network)
We evaluate processors organizations
according to the following criteria:
a) Diameter: largest distance between two
nodes.
b) Bisection width: the minimum number of
edges that must be removed in order to
divide the network into two halves.
c) Degree: max # of edges per node.
d) Maximum edge length: we need it to
be a constant.
1) Mesh Networks
The nodes are arranged into q-
dimensional lattice.
Communication is allowed only
between neighboring nodes; hence
interior nodes communication with
2q other processors.
The diameter of a q-dimensional
mesh with kq nodes is q(k-1)
…
The bisection width of q-
dimensional mesh with kq nodes is
kq-1
The max edges per node is 2q
The max edge length is constant for
two & three dimensional mesh
Ex: MMP, MasPar, Intel Paragon
XP/S
…
Mesh 4*4
…
Mesh with
warp
around in the
same row or
column
…
Mesh with
warp
around in
adjacent rows
or columns
Connectivity
An interior processor Pi,j has a link
with the following processors:
Pi,j+1
Pi,j-1
Pi+1,j
Pi-1,j
Binary Tree Network
It has 2k–1 nodes where k is the # of
levels.
A node has at most three links.
An interior node can communication
with its two children and its parent.
The diameter is 2(k-1), which is low.
It has a poor bisection width, which
is one.
…
Level 0 Root
Level 1
Internal
Level 2 node
Level 3 Leaves
k =4 , # of levels
a) # of nodes
b) Diameter
c) Bisection
width
d) Degree
Hypertree Networks
It has a low diameter and an improved
bisection width, improvement on the
binary tree.
A 4-ary hyper tree with depth d has 4d
node leaves and 2d (2d+1 -1) nodes in all.
The diameter is 2d =4
The bisection is 2d+1 =8
The max degree is 6
Ex: Connection Machine CM-5
…
Side view
Front view
d=2
# of leave nodes=42=16
# of all nodes= 22(23-
Pyramid Network
It is an attempt to obtain the
advantages of mesh & tree
networks.
A pyramid of size k2 is complete 4-
ary rooted tree of height log2k with
additional interprocessor links so
that the processors in every tree
level form 2-D mesh.
Apex Level 2
Size=16=42 Level 1
Level 0
Base
…
A pyramid of size k2 has its base a 2-D
mesh network of size k2.
Total # of all processors is (4/3) k2-(1/3).
Ex: when k=4, (4/3)42-(1/3)= 21
Every interior node is connected to 9
other nodes, 4+4+1=9.
It has a low diameter over the mesh
which is 2 log k =4.
It has a 2k bisection width.
Butterfly Network
0 1 2 3 4 5 6 7
Rank=0
Rank=1
Rank=2
Rank=3
…
It consists of (k+1)2k nodes dived
into k+1 rows (ranks), each
containing
n= 2k nodes, the ranks are
labeled 0 through k.
Ex: when k=3
(k+1)2k =4*8 =32 nodes
The diameter is 2k =6
The bisection is 2K=8
Connectivity
Node (i,j) refers to the jth node on
the ith rank.
Node (i,j) connected to two nodes
on rank i-1, node (i-1,j) & node (i-
1,m) where m is the integer found
by inverting the ith most significant
bit in the binary representation of
j.
Ex:
Node (2,6) is connected to nodes
(1,4) and (1,6).
How to get node (1,4)?
6 110 100 4 (1,4)
Hypercube Networks
Also called binary n-cube.
Consists of 2k nodes forming k-
dimensional hypercube.
The nodes are labeled 0,1, … 2k-1;
two nodes are adjacent if their
labels differ exactly one bit
position.
…
3D
3D
0 2D
2 1D
1D
1D 1D
5 2D 7
3D 3D
1 2D 3
…
The diameter of a hypercube with
n=2k nodes is log(2k) = k.
The bisection width is 2k-1=22=4.
The degree is k.
It is the most popular processor
organization
Ex: nCUBE, Connection Machine CM-
200.
Cube-Connected Cycles
Network
It is a k-dimensional hypercube whose 2k
P21 D P23
Connectivity
node (i,j) is connected to node
(i,m) if and only if m is the result of
inverting the ith most significant
bit of the binary representation.
Ex: node (2,5) is connected to
node (2,7). How?
Invert the 2nd bit for j=5=101
which is 111 and that is 7.
…
The degree is constant which is 3
(advantage over hypercube).
The diameter is 2k, twice that of
the hypercube (disadvantage).
The bisection width is 2k-1, which is
lower than that of the hypercube.
2k-1=22=4.
Shuffle-Exchange
Networks
It consists of n=2k nodes,
numbered 0,1,…,n-1.
It has two kinds of connections:
Exchange: links pairs of nodes whose
number differ in their least significant
bit (bidirectional).
Shuffle: links node i with node 2i mod
(n-1), with the exception that node n-
1 is connected to it self (direct).
Ex: k=3
0 1 2 3 4 5 6 7
00 00 01 01 10 10 11 11
0 1 0 1 0 1 0 1
001 011
100 110
…
Diameter is k.
# of edges per node is
constant.
Bisection width for a network
with 2k nodes is 2k-1=22=4.
Ex: Triton/1.
Flynn’s Taxonomy
Bases his taxonomy on dual concepts of
instruction stream and data stream.
An instruction stream is a sequence of
instructions performed by a computer.
A data stream is a sequence of data
manipulated by an instruction stream.
Categories depend on the multiplicity of
hardware used to manipulate
instruction and data stream.
Flynn’s Taxonomy
SISD Single Instruction Single
Data.
SIMD Single Instruction Multiple
Data.
MISD Multiple Instruction Single
Data.
MIMD Multiple Instruction Multiple
Data.
SISD
P
Sequential
machine
M
•One instruction in unit time
•Instruction execution may be pipelined
•Computer may have multiple functional unit,
but single control unit.
SISD : A Conventional Computer
Instructions
Data Input Processor
Processor Data Output
P2
Interconnection
CP
NW
Pn
Processor arrays
Ex: The Connection Machine CM-
200
SIMD Architecture
Instruction
Stream
Data Output
Data Input Processor stream A
stream A A
Data Output
Data Input Processor
stream B
stream B B
Processor Data Output
Data Input stream C
C
stream C
M
Systolic array or pipeline
Instruction
Stream B
Instruction Stream C
Processor
A Data
Output
Data Processor Stream
Input B
Stream
Processor
C
P2
Interconnection
NW
Pn
Data Output
Data Input Processor stream A
stream A A
Data Output
Data Input Processor
stream B
stream B B
Processor Data Output
Data Input stream C
C
stream C
(NUMA).
UMA Multiprocessors
The shared memory is centralized.
All processors work through a
central switching mechanism to
reach a centralized shared memory.
switching mechanism can be
Common bus to global memory
Crossbar switch
Packet switched network
UMA
CPU CPU CPU
Switching Mechanism
n=100
Speedu 0
p
n=100
n=1
0
# of
processors
H.W
Read about the following:
1) CM-200
2) Symmetry
3) TC2000
4) nCUBE2
5) CM-5
6) Paragon XP/S