0% found this document useful (0 votes)
21 views27 pages

Chapter2 part 3

The document discusses multiple issue scheduling in processors, including static and dynamic methods, as well as speculation for executing instructions simultaneously. It covers hardware multithreading techniques, including fine-grained and simultaneous multithreading (SMT), and explores Flynn's Taxonomy of parallel computing architectures such as SIMD and MIMD. Additionally, it addresses shared and distributed memory systems, interconnection networks, and the importance of bisection width in communication performance.

Uploaded by

hzfhzf137
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views27 pages

Chapter2 part 3

The document discusses multiple issue scheduling in processors, including static and dynamic methods, as well as speculation for executing instructions simultaneously. It covers hardware multithreading techniques, including fine-grained and simultaneous multithreading (SMT), and explores Flynn's Taxonomy of parallel computing architectures such as SIMD and MIMD. Additionally, it addresses shared and distributed memory systems, interconnection networks, and the importance of bisection width in communication performance.

Uploaded by

hzfhzf137
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Multiple Issue (2)

 static multiple issue - functional units are


scheduled at compile time.

 dynamic multiple issue – functional units


are scheduled at run-time.

Copyright © 2010, Elsevier Inc. All rights Reserved 1


Speculation (1)
 In order to make use of multiple issue, the
system must find instructions that can be
executed simultaneously.
 In speculation, the compiler or
the processor makes a guess
about an instruction, and then
executes the instruction on the
basis of the guess.

Copyright © 2010, Elsevier Inc. All rights Reserved 2


Speculation (2)

z=x+y;
i f ( z > 0) Z will be
w=x; positive

else
w=y;

If the system speculates incorrectly,


it must go back and recalculate w = y.

Copyright © 2010, Elsevier Inc. All rights Reserved 3


Hardware multithreading (1)
 There aren’t always good opportunities for
simultaneous execution of different
threads.
 Hardware multithreading provides a means
for systems to continue doing useful work
when the task being currently executed
has stalled.

Copyright © 2010, Elsevier Inc. All rights Reserved 4


Hardware multithreading (2)
 Fine-grained - the processor switches
between threads after each instruction,
skipping threads that are stalled.

 Pros: potential to avoid wasted machine time


due to stalls.
 Cons: a thread that’s ready to execute a long
sequence of instructions may have to wait to
execute every instruction.

Copyright © 2010, Elsevier Inc. All rights Reserved 5


Hardware multithreading (3)
 Simultaneous multithreading (SMT) - a
variation on fine-grained multithreading.

 Allows multiple threads to make use of the


multiple functional units.

Copyright © 2010, Elsevier Inc. All rights Reserved 6


.

PARALLEL HARDWARE

Copyright © 2010, Elsevier Inc. All rights Reserved 7


Flynn’s Taxonomy
a n n
N e um
v o n
s ic SISD (SIMD)
cla s
Single instruction stream Single instruction stream
Single data stream Multiple data stream

MISD (MIMD)
Multiple instruction stream Multiple instruction stream
Single data stream Multiple data stream

no
tc
ov
ere
d

Copyright © 2010, Elsevier Inc. All rights Reserved 8


SIMD
 Parallelism achieved by dividing data
among the processors.

 Applies the same instruction to multiple


data items.

 Called data parallelism.

Copyright © 2010, Elsevier Inc. All rights Reserved 9


SIMD example

n data items
control unit
n ALUs

x[1] x[2] … x[n]


ALU1 ALU2 ALUn

for (i = 0; i < n; i++)


x[i] += y[i];

Copyright © 2010, Elsevier Inc. All rights Reserved 10


SIMD
 What if we don’t have as many ALUs as
data items?
 Divide the work and process iteratively.
 Ex. m = 4 ALUs and n = 15 data items.

Round3 ALU1 ALU2 ALU3 ALU4


1 X[0] X[1] X[2] X[3]
2 X[4] X[5] X[6] X[7]
3 X[8] X[9] X[10] X[11]
4 X[12] X[13] X[14]

Copyright © 2010, Elsevier Inc. All rights Reserved 11


Graphics Processing Units (GPU)
 Real time graphics application
programming interfaces or API’s use
points, lines, and triangles to internally
represent the surface of an object.

Copyright © 2010, Elsevier Inc. All rights Reserved 12


GPUs
 A graphics processing pipeline converts
the internal representation into an array of
pixels that can be sent to a computer
screen.

 Several stages of this pipeline


(called shader functions) are
programmable.
 Typically just a few lines of C code.

Copyright © 2010, Elsevier Inc. All rights Reserved 13


GPUs
 Shader functions are also implicitly
parallel, since they can be applied to
multiple elements in the graphics stream.

 GPU’s can often optimize performance by


using SIMD parallelism.
 The current generation of GPU’s use SIMD
parallelism.
 Although they are not pure SIMD systems.

Copyright © 2010, Elsevier Inc. All rights Reserved 14


MIMD
 Supports multiple simultaneous instruction
streams operating on multiple data
streams.

 Typically consist of a collection of fully


independent processing units or cores,
each of which has its own control unit and
its own ALU.

Copyright © 2010, Elsevier Inc. All rights Reserved 15


Shared Memory System (1)
 A collection of autonomous processors is
connected to a memory system via an
interconnection network.
 Each processor can access each memory
location.
 The processors usually communicate
implicitly by accessing shared data
structures.

Copyright © 2010, Elsevier Inc. All rights Reserved 16


Shared Memory System (2)
 Most widely available shared memory
systems use one or more multicore
processors.

Copyright © 2010, Elsevier Inc. All rights Reserved 17


Shared Memory System

Figure 2.3

Copyright © 2010, Elsevier Inc. All rights Reserved 18


Distributed Memory System
 Clusters (most popular)
 A collection of commodity systems.
 Connected by a commodity interconnection
network.

 Nodes of a cluster are individual


computations units joined by a
communication network.

Copyright © 2010, Elsevier Inc. All rights Reserved 19


Distributed Memory System

Figure 2.4

Copyright © 2010, Elsevier Inc. All rights Reserved 20


Interconnection networks
 Affects performance of both distributed
and shared memory systems.

 Two categories:
 Shared memory interconnects
 Distributed memory interconnects

Copyright © 2010, Elsevier Inc. All rights Reserved 21


Shared memory interconnects
 Bus interconnect
 A collection of parallel communication wires
together with some hardware that controls
access to the bus.
 Communication wires are shared by the
devices that are connected to it.
 As the number of devices connected to the
bus increases, contention for use of the bus
increases, and performance decreases.

Copyright © 2010, Elsevier Inc. All rights Reserved 22


Shared memory interconnects
 Switched interconnect
 Uses switches to control the routing of data
among the connected devices.

 Crossbar –

Allows simultaneous communication among
different devices.

Faster than buses.

But the cost of the switches and links is relatively
high.

Copyright © 2010, Elsevier Inc. All rights Reserved 23


Distributed memory interconnects
 Two groups
 Direct interconnect

Each switch is directly connected to a processor
memory pair, and the switches are connected to
each other.

 Indirect interconnect

Switches may not be directly connected to a
processor.

Copyright © 2010, Elsevier Inc. All rights Reserved 24


Bisection width
 A measure of “number of simultaneous
communications” or “connectivity”.

 How many simultaneous communications


can take place “across the divide” between
the halves?

Copyright © 2010, Elsevier Inc. All rights Reserved 25


Two bisections of a ring

Figure 2.9

Copyright © 2010, Elsevier Inc. All rights Reserved 26


Fully connected network
 Each switch is directly connected to every
other switch.

Figure 2.11

Copyright © 2010, Elsevier Inc. All rights Reserved 27

You might also like