0% found this document useful (0 votes)
2 views

Chapter 03

Chapter 3 discusses various processor organizations such as mesh, binary trees, hypertrees, hypercubes, and more, focusing on their interconnection networks. It also covers Flynn's Taxonomy, categorizing computer architectures based on instruction and data streams, including SISD, SIMD, MISD, and MIMD. Additionally, the chapter explains the characteristics and examples of processor arrays, multiprocessors, and multicomputers, highlighting their configurations and performance metrics.

Uploaded by

abdallahm.alsoud
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 03

Chapter 3 discusses various processor organizations such as mesh, binary trees, hypertrees, hypercubes, and more, focusing on their interconnection networks. It also covers Flynn's Taxonomy, categorizing computer architectures based on instruction and data streams, including SISD, SIMD, MISD, and MIMD. Additionally, the chapter explains the characteristics and examples of processor arrays, multiprocessors, and multicomputers, highlighting their configurations and performance metrics.

Uploaded by

abdallahm.alsoud
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

Chapter 3: Processor Arrays,

Multiprocessors, and
Multicomputers
 Processor Organizations
(interconnection networks)

Mesh

Binary Trees

Hypertrees

Hypercube

Pyramid

Butterfly

Cube-Connected Cycles

Shuffle Exhange

De Brujin
 Flynn’s Taxonomy
 Processor Arrays
 Multiprocessors
 Multicomputers
 Scaled Speedup & Parallelizability
Processor organization
(introduction network)
 We evaluate processors organizations
according to the following criteria:
a) Diameter: largest distance between two
nodes.
b) Bisection width: the minimum number of
edges that must be removed in order to
divide the network into two halves.
c) Degree: max # of edges per node.
d) Maximum edge length: we need it to
be a constant.
1) Mesh Networks
 The nodes are arranged into q-
dimensional lattice.
 Communication is allowed only
between neighboring nodes; hence
interior nodes communication with
2q other processors.
 The diameter of a q-dimensional
mesh with kq nodes is q(k-1)

 The bisection width of q-
dimensional mesh with kq nodes is
kq-1
 The max edges per node is 2q
 The max edge length is constant for
two & three dimensional mesh
 Ex: MMP, MasPar, Intel Paragon
XP/S

Mesh 4*4

Mesh with
warp
around in the
same row or
column

Mesh with
warp
around in
adjacent rows
or columns
Connectivity
 An interior processor Pi,j has a link
with the following processors:
 Pi,j+1
 Pi,j-1
 Pi+1,j
 Pi-1,j
Binary Tree Network
 It has 2k–1 nodes where k is the # of
levels.
 A node has at most three links.
 An interior node can communication
with its two children and its parent.
 The diameter is 2(k-1), which is low.
 It has a poor bisection width, which
is one.

Level 0 Root
Level 1
Internal
Level 2 node

Level 3 Leaves
 k =4 , # of levels

 # of nodes = 2k-1 = 24-1 =15


H.W: Quad Trees
Find out the
following?

a) # of nodes
b) Diameter
c) Bisection
width
d) Degree
Hypertree Networks
 It has a low diameter and an improved
bisection width, improvement on the
binary tree.
 A 4-ary hyper tree with depth d has 4d
node leaves and 2d (2d+1 -1) nodes in all.
 The diameter is 2d =4
 The bisection is 2d+1 =8
 The max degree is 6
 Ex: Connection Machine CM-5

Side view
Front view

H.W: Draw or construct a 3-dimensional


hypertree with d=2.
Ex: Hypertree

d=2
# of leave nodes=42=16
# of all nodes= 22(23-
Pyramid Network
 It is an attempt to obtain the
advantages of mesh & tree
networks.
 A pyramid of size k2 is complete 4-
ary rooted tree of height log2k with
additional interprocessor links so
that the processors in every tree
level form 2-D mesh.
Apex Level 2

Size=16=42 Level 1

Level 0
Base

 A pyramid of size k2 has its base a 2-D
mesh network of size k2.
 Total # of all processors is (4/3) k2-(1/3).
Ex: when k=4, (4/3)42-(1/3)= 21
 Every interior node is connected to 9
other nodes, 4+4+1=9.
 It has a low diameter over the mesh
which is 2 log k =4.
 It has a 2k bisection width.
Butterfly Network
0 1 2 3 4 5 6 7
Rank=0

Rank=1

Rank=2

Rank=3

 It consists of (k+1)2k nodes dived
into k+1 rows (ranks), each
containing
n= 2k nodes, the ranks are
labeled 0 through k.
 Ex: when k=3
(k+1)2k =4*8 =32 nodes
The diameter is 2k =6
The bisection is 2K=8
Connectivity
 Node (i,j) refers to the jth node on
the ith rank.
 Node (i,j) connected to two nodes
on rank i-1, node (i-1,j) & node (i-
1,m) where m is the integer found
by inverting the ith most significant
bit in the binary representation of
j.
Ex:
 Node (2,6) is connected to nodes
(1,4) and (1,6).
 How to get node (1,4)?
6  110  100 4  (1,4)
Hypercube Networks
 Also called binary n-cube.
 Consists of 2k nodes forming k-
dimensional hypercube.
 The nodes are labeled 0,1, … 2k-1;
two nodes are adjacent if their
labels differ exactly one bit
position.

0 00 10 000 010 110


100

1 01 11 001 011 101


111

K=1 K=2 K=3


N=2 N=4 N=8
Ex: k=3
4 2D 6

3D
3D
0 2D
2 1D
1D

1D 1D

5 2D 7

3D 3D
1 2D 3

 The diameter of a hypercube with
n=2k nodes is log(2k) = k.
 The bisection width is 2k-1=22=4.
 The degree is k.
 It is the most popular processor
organization
 Ex: nCUBE, Connection Machine CM-
200.
Cube-Connected Cycles
Network
 It is a k-dimensional hypercube whose 2k

vertices are actually cycles of k nodes.


 For each dimension, every cycle has a
node
connected to a node in the neighboring
cycle in that dimension.
 # of nodes is 2kk=8*3=24
Ex: N=24
P24 2 P26
P14 D P16
P10 1 P 1 P36
34 P22 D P12
D
P30 P20 2 P32 3
N=24 D D
Size is k2k= 3
3 3
3*8=24 DP P37
D 2 D
35 P27
D
P31 P111 P P25 P33 1P
D 15 2 D13
P
17

P21 D P23
Connectivity
 node (i,j) is connected to node
(i,m) if and only if m is the result of
inverting the ith most significant
bit of the binary representation.
 Ex: node (2,5) is connected to
node (2,7). How?
Invert the 2nd bit for j=5=101
which is 111 and that is 7.

 The degree is constant which is 3
(advantage over hypercube).
 The diameter is 2k, twice that of
the hypercube (disadvantage).
 The bisection width is 2k-1, which is
lower than that of the hypercube.
2k-1=22=4.
Shuffle-Exchange
Networks
 It consists of n=2k nodes,
numbered 0,1,…,n-1.
 It has two kinds of connections:
 Exchange: links pairs of nodes whose
number differ in their least significant
bit (bidirectional).
 Shuffle: links node i with node 2i mod
(n-1), with the exception that node n-
1 is connected to it self (direct).
Ex: k=3

0 1 2 3 4 5 6 7
00 00 01 01 10 10 11 11
0 1 0 1 0 1 0 1

Ex: Node 2 connected to node 3 through exchange,


and
node 2 connected to node (4 mod 7) = 4 through
shuffle.

 Connectivity using left cyclic shift.
Node ak-1 ak-2 .. a1a0 is connected to
node ak-2 .. a1a0 ak-1 using a shuffle.

 Ex: 001  010  100  001


and this is called a necklace.
Diameter=2k-1
Bisection width = 2k-1/k
De Brujin Networks
 It consist of n=2k.
 Let node ak-1 ak-2 .. a1a0 be a node,
then the two nodes reachable via
directed edges are:
ak-2 ak-3 .. a1a00
ak-2 ak-3 .. a1a01
Ex: k=3

001 011

000 010 101 111

100 110

 Diameter is k.
 # of edges per node is
constant.
 Bisection width for a network
with 2k nodes is 2k-1=22=4.
 Ex: Triton/1.
Flynn’s Taxonomy
 Bases his taxonomy on dual concepts of
instruction stream and data stream.
 An instruction stream is a sequence of
instructions performed by a computer.
 A data stream is a sequence of data
manipulated by an instruction stream.
 Categories depend on the multiplicity of
hardware used to manipulate
instruction and data stream.
Flynn’s Taxonomy
 SISD Single Instruction Single
Data.
 SIMD Single Instruction Multiple
Data.
 MISD Multiple Instruction Single
Data.
 MIMD Multiple Instruction Multiple
Data.
SISD
P

Sequential
machine
M
•One instruction in unit time
•Instruction execution may be pipelined
•Computer may have multiple functional unit,
but single control unit.
SISD : A Conventional Computer

Instructions
Data Input Processor
Processor Data Output

 Speed is limited by the rate at which computer can


transfer information internally.

Ex: PCs, Workstations


SIMD
P1

P2
Interconnection
CP
NW

Pn

Processor arrays
Ex: The Connection Machine CM-
200
SIMD Architecture
Instruction
Stream

Data Output
Data Input Processor stream A
stream A A
Data Output
Data Input Processor
stream B
stream B B
Processor Data Output
Data Input stream C
C
stream C

Ex: CRAY machine vector processing, Thinking machine cm*


Intel MMX (multimedia support)
SIMD cont…
 In this configuration, N processing elements are
connected via an interconnection network.
 Each processing element (PE) is a processor with
local memory.
 The PEs execute the instructions that are
distributed to the PEs by the CU via a broadcast
bus.
 Each PE then operates on data stored in its own
memory, and on data broadcast by the CU.
 Data is exchanged among PEs via a
unidirectional interconnection network, and the
I/O bus is used to transfer data from PEs to the
I/O interface and vice versa.
 To transfer results from particular PEs to the CU,
MISD
P1 P2 P3 P4 Pn

M
Systolic array or pipeline

More of an intellectual exercise than a practical


configuration. Few built, but commercially not
available
The MISD Architecture
Instruction
Stream A

Instruction
Stream B
Instruction Stream C
Processor
A Data
Output
Data Processor Stream
Input B
Stream
Processor
C

 More of an intellectual exercise than a practical configuration.


Few built, but commercially not available
MIMD
P1

P2
Interconnection
NW

Pn

Ex: nCUBE, CM-5, TC2000, Paragon XP/S


MIMD Architecture
Instruction Instruction Instruction
Stream A Stream B Stream C

Data Output
Data Input Processor stream A
stream A A
Data Output
Data Input Processor
stream B
stream B B
Processor Data Output
Data Input stream C
C
stream C

Shared memory (tightly coupled) MIMD


Distributed memory (loosely coupled) MIMD
Processor Arrays
 A vector computer: is a computer whose
instruction set includes operations on
vectors as well as scalars.
 Two ways to implement a vector
computer:
a) Pipeline Vector Processor
It streams vectors from memory to the
CPU, where pipelined arithmetic units
manipulate them.
Ex: CRAY-1, CYBER-205

b) Processor Array
it is a vector computer implemented
as a sequential computer connected
to a set of identical, synchronized
processing elements capable of
simultaneously performing the same
operation on different data.
Ex: CM-200, manufactured by
Thinking Machines Corporation.
Multiprocessors
 Multiple-CPU computer consist of # of
fully programmable processors, each
capable of executing its own program.
 Multiprocessors are multiple CPU
computers with a shared memory.
 Two types of shared memory:
 Uniform Memory Access (UMA).

 Non-Uniform Memory ACCESS

(NUMA).
UMA Multiprocessors
 The shared memory is centralized.
 All processors work through a
central switching mechanism to
reach a centralized shared memory.
 switching mechanism can be
 Common bus to global memory
 Crossbar switch
 Packet switched network
UMA
CPU CPU CPU

Switching Mechanism

Memory Banks I/O Devices



 Ex: Symmetry by Sequent
Computer System, Inc.
 Central problem is how to
ensure cache consistency
 Write through policy
 Copy back policy
NUMA Multiprocessors
 The shared memory is distributed.
 Ex: TC2000 by BBN Systems &
Technologies.
 Every processor has some nearby memory,
and the shared address space is formed by
combining these local memories.
 Time needed to access particular memory
depend on whether that location is local to
the processor.
Multicomputer
 The multi computer has no shared
memory. Each processor has its
own private memory and process
interaction occurs through
message-passing
 Ex: Paragon XP/S
nCUBE
Thinking Machine CM-5
 An important distinction between
early multicomputers (first
generation) and the second
generation multicomputers is how
processors communicate:
 Store-and-forward message passing
 Circuit-switched message routing
Store-and-Forward
 Message passing to send a message from
one processor to a nonadjacent processors,
every intermediate processor along the
message’s path must store the entire
message and then forward the message to
the next processor down the line.
 This means the CPU is interrupted every

time a transfer is initiated


 Ex: nCUBE/10,Intel iPSC,T800 Transputer
Circuit-Switched
 Every processor has a routing logic card
called the Direct-Connect-Module (DCM).
 The DCM set up a circuit from the source
node to the destination node, then the
message flows in a pipeline fashion from
the source node to the destination node
& none of the intermediate nodes
store the message.
 This way the CPUs for all
intermediate nodes are not
interrupted.
 Ex: iPSC/2, nCUBE2
 Advantage of the circuit-switching
message passing:-
a) no interrupt for CPU.
b) faster since you just switch the
message.
Scaled Speedup and
Parallelizability
 Speedup: The ratio between the time
taken by a parallel computer
executing the fastest serial algorithm
using one processor and the time
taken by the same parallel computer
executing the corresponding parallel
algorithm using p processors.

 Efficiency (cost): of a parallel algorithm
running on P processor is the speedup
divided by P.
 Parallelizability: The ratio between the
time taken by a parallel computer
executing a parallel algorithm on one
processor and the time taken by the
same parallel computer executing the
same parallel algorithm on P
processors.
Amdahl’s Law
 let f be the fraction of operations in
a computation that must be
performed sequentially,
0 f  where
1.
 Amdahl’s law states that the
maximum speedup achievable by
a parallel computer with p
processors performing the
computation is:
1
S
1 f
f
P
 This implies the following corollary:
a small # of sequential operations
can
significantly limit the speedup
achievable by a parallel computer.
Scaled Speedup
 The ratio between the time taken by a
sequential algorithm when it is
running on a single processor of a
parallel computer and the time taken
by the parallel algorithm on a parallel
machine.
Amdahl’s Effect
 It is a phenomenon which states that
speedup is an increasing function of
the problem size.
The Amdahl’s effect

n=100
Speedu 0
p
n=100

n=1
0

# of
processors
H.W
Read about the following:
1) CM-200

2) Symmetry

3) TC2000

4) nCUBE2

5) CM-5

6) Paragon XP/S

You might also like