0% found this document useful (0 votes)

24 views

Lec4 - ILP Pipelining Intro

Uploaded by

WoloWizard

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Lec4 - ILP Pipelining Intro

Uploaded by

WoloWizard

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Parallel Computing and Programming.

Lecture 4: Instruction Level Parallelism: Pipelining

Intro.

Dr. Rony Kassam

IEF Tishreen Uni

S1 - 2021
Index
n Von Neumann vs Dataflow Models.
n ISA vs Microarchitecture.
n Single-cycle vs Multi-cycle Microarchitectures.
n Instruction Level Parallelism: Pipelining Intro.
n Instruction Level Parallelism: Issues in Pipeline Design.
n Thread Level Parallelism: Data Dependence Solutions.
n Thread Level Parallelism: Shared Memory and OpenMP.

2
Pipelining: Basic Idea
n More systematically:
q Pipeline the execution of multiple instructions
q Analogy: “Assembly line processing” of instructions

n Idea:
q Divide the instruction processing cycle into distinct “stages” of
processing
q Ensure there are enough hardware resources to process one
instruction in each stage
q Process a different instruction in each stage
n Instructions consecutive in program order are processed in
consecutive stages

n Benefit: Increases instruction processing throughput (1/CPI)

n Downside: Start thinking about this…
3
Example: Execution of Four Independent ADDs
n Multi-cycle: 4 cycles per instruction
F D E W
F D E W
F D E W
F D E W
Time

n Pipelined: 4 cycles per 4 instructions (steady state)

F D E W
F D E W
Is life always this beautiful?
F D E W
F D E W

Time

4
The Laundry Analogy
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A

n “place one dirty load of clothes in the washer”

n “when the washer is finished, place the wet load in the dryer”
n “when theTime
dryer
6 PM is7 finished,
8 take
9 out
10 the
11 dry 12
load and
1 fold”
2 AM

n “when folding is finished, ask your roommate (??) to put the clothes
away” order
Task

A - steps to do a load are sequentially dependent

B - no dependence between different loads
C
- different steps do not share resources
D
5
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Pipelining Multiple Loads of Laundry 6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
6 PM 7 8 9 10 11 12 1 2 AM
TimeA
Task
B
order
A
C

D
B

6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order 6 PM 7 8 9 10 11 12 1 2 AM
Time
A
- 4 loads of laundry in parallel
Task
order B
- no additional resources
A
C
- throughput increased by 4
B
- latency per load is the same
D

D 6
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Pipelining Multiple Loads of Laundry: In Practice
Time
Task
6 PM 7 8 9 10 11 12 1 2 AM

order
A
6 PM 7 8 9 10 11 12 1 2 AM
Time
B
Task
order
C
A
D
B

6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order
6 PM 7 8 9 10 11 12 1 2 AM
TimeA

Task B
order
C
A

D
B

C
the slowest step decides throughput
D
7
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Pipelining Multiple Loads of Laundry: In Practice
Time
6 PM 7 8 9 10 11 12 1 2 AM

Task
order
A 6 PM 7 8 9 10 11 12 1 2 AM
Time
Task B
order
C
A

D
B

6 PM 7 8 9 10 11 12 1 2 AM
Time

Task
order 6 PM 7 8 9 10 11 12 1 2 AM
Time
A A
Task B
order B
C
A
A
B
D
B
C

D
throughput restored (2 loads per hour) using 2 dryers
8
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
An Ideal Pipeline
n Goal: Increase throughput with little increase in cost
(hardware cost, in case of instruction processing)

n Repetition of identical operations

q The same operation is repeated on a large number of different
inputs (e.g., all laundry loads go through the same steps)
n Repetition of independent operations
q No dependencies between repeated operations
n Uniformly partitionable suboperations
q Processing can be evenly divided into uniform-latency
suboperations (that do not share resources)

n Fitting examples: automobile assembly line, doing laundry

q What about the instruction processing “cycle”?
9
Ideal Pipelining

combinational logic (F,D,E,M,W) BW=~(1/T)

T psec

T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T)

T/3 T/3 T/3 BW=~(3/T)

ps (F,D) ps (E,M) ps (M,W)

10
More Realistic Pipeline: Throughput
n Nonpipelined version with delay T
BW = 1/(T+S) where S = latch delay

T ps

n k-stage pipelined version

BWk-stage = 1 / (T/k +S ) Latch delay reduces throughput
BWmax = 1 / (1 gate delay + S ) (switching overhead b/w stages)

T/k T/k
ps ps

11
More Realistic Pipeline: Cost
n Nonpipelined version with combinational cost G
Cost = G+L where L = latch cost

G gates

n k-stage pipelined version

Costk-stage = G + Lk Latches increase hardware cost

G/k G/k

12
Pipelining Instruction Processing

13
Remember: The Instruction Processing Cycle

q Fetch fetch (IF)

1. Instruction
2. Instruction
q Decodedecode and
register operand
q Evaluate fetch (ID/RF)
Address
3. Execute/Evaluate
q Fetch Operands
memory address (EX/AG)
4. Memory operand fetch (MEM)
q Execute
5. Store/writeback result (WB)
q Store Result

14
Remember the Single-Cycle Uarch
Instruction [25– 0] Shift Jump address [31– 0]

26
left 2
28 0 PCSrc
1 1=Jump
PC+4 [31– 28] M M
u u
x x
ALU
Add result 1 0
Add Shift
RegDst
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg PCSrc2=Br Taken
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
bcond data
16 32
Instruction [15– 0] Sign
extend ALU
control

Instruction [5– 0]

ALU operation

T BW=~(1/T)
Based on original figure from [P&H CO&D, COPYRIGHT 2004
Elsevier. ALL RIGHTS RESERVED.]
15
Dividing Into Stages
200ps 100ps 200ps 200ps 100ps
IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
0
M
u
x
1
ignore
for now
Add

4 Add Add
result
Shift
left 2

Read
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction Registers Read ALU ALU
0 Read
RF
Write data 2 result Address 1
register M data
Instruction M
u Data
memory u
Write
data
x
1
memory x
0
write
Write
data
16 32
Sign
extend

Is this the correct partitioning?

Why not 4 or 6 stages? Why not different boundaries?
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
16
Instruction Pipeline Throughput
Program
execution 2 2004 400 6 600 8 800 101000 1200
12 1400
14 1600
16 1800
18
order Time
(in instructions)
Instruction Data
lw $1, 100($0) fetch
Reg ALU
access
Reg

Instruction Data
lw $2, 200($0) 8 ns
800ps fetch
Reg ALU
access
Reg

Instruction
lw $3, 300($0) 8800ps
ns fetch
...
8 ns
800ps

Program 200 4 400 6 600

execution 2 8 800 1000
10 1200
12 1400
14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access

Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
200ps fetch access

Instruction Data
lw $3, 300($0) 2 ns
200ps Reg ALU Reg
fetch access

2 ns
200ps 2 ns
200ps 2200ps
ns 2 ns
200ps 2 ns
200ps

5-stage speedup is 4, not 5 as predicted by the ideal model. Why?

17
Enabling Pipelined Processing: Pipeline Registers

IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
00
MM
No resource is used by more than 1 stage!
uu
xx
11

IF/ID
PCD+4 ID/EX EX/MEM MEM/WB

PCE+4

nPCM
Add
Add

Add
44 Add Add
Add result
result
Shift
Shift
leftleft
22

Read
Read
Instruction

AE
PCF

PCPC Address
Address register Read
Read

AoutM
data
data 11
Read
Read Zero

MDRW
Instruction register
register 22 Zero
Instruction Registers Read
Registers Read ALU ALU
ALU
IRD

memory Write 00 ALU Read

Read
Write data
data 22 result
result Address
Address 11
register
register MM data
data
Instruction
uu Data
Data MM
memory Write
Write BE xx uu
memory
memory xx
data
data 11
Write 00
Write
data
data

AoutW
BM
ImmE

1616 3232
Sign
Sign
extend
extend

Pipelined Operation Example

16 32Sign
Sign
extend
extend

lw
All instruction classes must follow the same path
Instruction fetch and timing through the pipeline stages.
lw lw lw
000
00
lw
Instruction decode
MMM
uuu
x
xxx
Any performance impact?
Execution Memory
Write back
111

IF/ID
IF/ID
IF/ID
IF/ID
IF/ID ID/EX
ID/EX
ID/EX
ID/EX
ID/EX EX/MEM
EX/MEM
EX/MEM
EX/MEM
EX/MEM MEM/WB
MEM/WB
MEM/WB
MEM/WB
MEM/WB

Add
Add
Add

Add
Add
Add
444 Add
Add Add
Add result
Add result
result
result
Shift
Shift
Shift
Shift
left 22
left22
left
left

Read
Read
Read
Instruction

Read
Instruction
Instruction
Instruction
Instruction

PC Address register
register111
register Read
PC
PC Address
Address Read
Read
Read
Read
Read data111
data
data
data
Read
Read Zero
Instruction register
register222
register Zero
Zero
Zero
Instruction
Instruction Registers
memory Registers Read
Registers Read
Read ALU
ALU ALU
ALU
ALU ALU
ALU
ALU
memory
memory Write 00
00 ALU Address Read
Read
Read
Write
Write data
data222 result Address
Address Read 11
register data
data M result
result
result Address
Address data
data
data
data 11
register
register MMM Data
Data data M
uuuu Data
Data M M
Write
Write
Write xxxx
memory
memory uu
uu
memory
memory x
xxx
data
data
data 11
11 0
00
Write 00
Write
Write
Write
data
data
data
data
16
16
16 32
32
32
Sign
Sign
Sign
extend
extend
extend

lw
0
0 M Instruction decode lw
M
u
u
x 19
Based on original figure from [P&H
x CO&D,
1 COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Write back
1
memory
memory Read 00x ALU Read u
Write
Write data 22
data result
result Addressmemory Read 11x
register
data
register M data
data
1M M
uu Data 0M
Write
Write xx
Write uu
data memory
memory xx
data
data 11

Pipelined Operation Example

16 32 Write 00
Write
Sign data
data
extend
16
16 32
32
Sign
Sign
extend
extend

Clock 1
Clock
Clock 5 3

lw $10,
sub $11,20($1)
$2, $3 lw $10,
sub $11,20($1)
$2, $3 lw $10, 20($1)
Instruction fetch Instruction decode Execution
sub $11, $2, $3 lw $10,
sub $11,20($1)
$2, $3 sub $11,20($1)
lw $10, $2, $3
00
M
MM
u
uu Execution Memory Write back
Write back
xx
11

IF/ID
IF/ID ID/EX
ID/EX EX/MEM
EX/MEM MEM/WB
MEM/WB

Add
Add
Add

Add AddAdd
Add
44 Add
Add result
result
result
Shift
Shift
left
left 22

Read
Read
Instruction
Instruction
Instruction

PC Address
Address register 11
register
register 1 Read
PC
PC Address Read
Read
Read data 11
data
data 1
Read
Read Zero
Instruction register 22
register Zero
Zero
Instruction
Instruction Registers Read
Registers Read
Read ALU ALU
ALU ALU
ALU
memory
memory Write 0
00 Address Read
Read
Read
Write
Write data 22
data
data result
result
result Address 1
11
register
register
register M
M
M data
data
u M
M
M
uu Data
Data u

Is life always this beautiful?

Write
Write xxx uu
memory
memory xx
data
data
data 1
11 0
Write 00
Write
Write
data
data
data
16
16
16 32
32
32
Sign
Sign
extend
extend
extend

Clock
Clock
Clock56 21 43
Clock
Clock

sub $11, $2, $3 lw $10, 20($1) 20

Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Instruction00 fetch Instruction decode sub $11, $2, $3 lw $10, 20($1) sub $11, $2, $3
Illustrating Pipeline Operation: Operation View

t0 t1 t2 t3 t4 t5
Inst0 IF ID EX MEM WB
Inst1 IF ID EX MEM WB
Inst2 IF ID EX MEM WB
Inst3 IF ID EX MEM WB
Inst4 IF ID EX MEM
IF ID EX
steady state
IF ID
(full pipeline) IF

21
Illustrating Pipeline Operation: Resource View

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10

IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10

ID I0 I1 I2 I3 I4 I5 I6 I7 I8 I9

EX I0 I1 I2 I3 I4 I5 I6 I7 I8

MEM I0 I1 I2 I3 I4 I5 I6 I7

WB I0 I1 I2 I3 I4 I5 I6

22
Remember: An Ideal Pipeline
n Goal: Increase throughput with little increase in cost
(hardware cost, in case of instruction processing)

n Repetition of identical operations

q The same operation is repeated on a large number of different
inputs (e.g., all laundry loads go through the same steps)
n Repetition of independent operations
q No dependencies between repeated operations
n Uniformly partitionable suboperations
q Processing an be evenly divided into uniform-latency
suboperations (that do not share resources)

n Fitting examples: automobile assembly line, doing laundry

q What about the instruction processing “cycle”?
23
Instruction Pipeline: Not An Ideal Pipeline
n Identical operations ... NOT!
Þ different instructions à not all need the same stages
Forcing different instructions to go through the same pipe stages
à external fragmentation (some pipe stages idle for some instructions)

n Uniform suboperations ... NOT!

Þ different pipeline stages à not the same latency
Need to force each stage to be controlled by the same clock
à internal fragmentation (some pipe stages are too fast but all take
the same clock cycle time)

n Independent operations ... NOT!

Þ instructions are not independent of each other
Need to detect and resolve inter-instruction dependencies to ensure
the pipeline provides correct results
à pipeline stalls (pipeline is not always moving)
24

LED Player 6.0
No ratings yet
LED Player 6.0
16 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
CSE332 / EEE336 Computer Organization & Architecture Pipelining I
No ratings yet
CSE332 / EEE336 Computer Organization & Architecture Pipelining I
21 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
1. Lecture 13 Pipelining
No ratings yet
1. Lecture 13 Pipelining
12 pages
3-Pipelining_241110_203716
No ratings yet
3-Pipelining_241110_203716
59 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Design of 3 Stage Pipelining Processor Using VHDL
No ratings yet
Design of 3 Stage Pipelining Processor Using VHDL
22 pages
Lec18 Pipeline Chap9 2
No ratings yet
Lec18 Pipeline Chap9 2
26 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Pipeline
100% (2)
Pipeline
8 pages
Pipeline
No ratings yet
Pipeline
33 pages
CH7-Parallel and Pipelined Processing
No ratings yet
CH7-Parallel and Pipelined Processing
23 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
Pipelining: 5-Stage Pipeline: Mahdi Nazm Bojnordi
No ratings yet
Pipelining: 5-Stage Pipeline: Mahdi Nazm Bojnordi
35 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
No ratings yet
Computer Systems Architecture: Thorsten Altenkirch and Liyang Hu
20 pages
Module 4 - Parallel & Pipeline Processing - Final
No ratings yet
Module 4 - Parallel & Pipeline Processing - Final
31 pages
Lec_7_CSE-509_Pipelining_5a944f3dd357e191fc77502f92eb2be7
No ratings yet
Lec_7_CSE-509_Pipelining_5a944f3dd357e191fc77502f92eb2be7
27 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
What Is The Most Boring Household Activity?
No ratings yet
What Is The Most Boring Household Activity?
27 pages
Module-5_DDCO
No ratings yet
Module-5_DDCO
35 pages
Week 11
No ratings yet
Week 11
33 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
PIPELINING
No ratings yet
PIPELINING
30 pages
Lec11 Pipeline 1 Notes
No ratings yet
Lec11 Pipeline 1 Notes
26 pages
CH 12.ppt Type I
No ratings yet
CH 12.ppt Type I
54 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
13054119-176 - Pipelining - Aqsa Saleem
No ratings yet
13054119-176 - Pipelining - Aqsa Saleem
21 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
L04 Pipelining
No ratings yet
L04 Pipelining
48 pages
Lec07 Pipelining Review
No ratings yet
Lec07 Pipelining Review
121 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
L14 MipsPipeline Ovw
No ratings yet
L14 MipsPipeline Ovw
17 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Pipelining in MIPs Architecture
100% (3)
Pipelining in MIPs Architecture
23 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
CA unit-2 Chapter-2
No ratings yet
CA unit-2 Chapter-2
36 pages
DDCO-Jan25-Unit5
No ratings yet
DDCO-Jan25-Unit5
30 pages
DLCOA_6.1_Sep2024
No ratings yet
DLCOA_6.1_Sep2024
81 pages
Pipelining and Parallel Processing
No ratings yet
Pipelining and Parallel Processing
25 pages
Lec 04 Pipeline D Processor
No ratings yet
Lec 04 Pipeline D Processor
106 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
CAO Fall 2024 Lecture 07 RISC v Pipelined Implementation
No ratings yet
CAO Fall 2024 Lecture 07 RISC v Pipelined Implementation
114 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
38 pages
L24 Pipeline
No ratings yet
L24 Pipeline
40 pages
Module1 UPD
No ratings yet
Module1 UPD
72 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
Comp Architecture Chapter 4 - Pipelining
No ratings yet
Comp Architecture Chapter 4 - Pipelining
53 pages
Advanced Linux Programming
No ratings yet
Advanced Linux Programming
31 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
Pipelines - #1 RISC ISA Without Pipe
No ratings yet
Pipelines - #1 RISC ISA Without Pipe
9 pages
Lecture 5 PT- CreateTopology
No ratings yet
Lecture 5 PT- CreateTopology
17 pages
Lec1 - Von Neumann vs Dataflow Models
No ratings yet
Lec1 - Von Neumann vs Dataflow Models
52 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
Lec2 - ISA vs Microarchitecture
No ratings yet
Lec2 - ISA vs Microarchitecture
38 pages
Lec6 - TLP Data Dependence Solutions
No ratings yet
Lec6 - TLP Data Dependence Solutions
20 pages
Lec3 - Single-Cycle and Multi-cycle Microarchitectures
No ratings yet
Lec3 - Single-Cycle and Multi-cycle Microarchitectures
121 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
Chapter 2
No ratings yet
Chapter 2
4 pages
Lesson Plan
No ratings yet
Lesson Plan
1 page
Bruker: Router / Combiner Router / Combiner - E Service Manual
No ratings yet
Bruker: Router / Combiner Router / Combiner - E Service Manual
35 pages
WS-011 Windows Server 2019 Administration
No ratings yet
WS-011 Windows Server 2019 Administration
75 pages
Chapter 2 Basic Computer Module
No ratings yet
Chapter 2 Basic Computer Module
11 pages
1102 - Chapter 10 Essential Peripherals - Slide Handouts
No ratings yet
1102 - Chapter 10 Essential Peripherals - Slide Handouts
9 pages
GEM S2 - S3 Issues - HXC Floppy Drive Emulator
No ratings yet
GEM S2 - S3 Issues - HXC Floppy Drive Emulator
6 pages
Lampiran 1
No ratings yet
Lampiran 1
32 pages
Disadvantages of Microprocessor
No ratings yet
Disadvantages of Microprocessor
37 pages
NRG Final
No ratings yet
NRG Final
11 pages
Lumia Devices Flashing
No ratings yet
Lumia Devices Flashing
3 pages
001-98285 Flash Memory PDF
No ratings yet
001-98285 Flash Memory PDF
108 pages
2 Mark Question With Answers
No ratings yet
2 Mark Question With Answers
9 pages
Debug 1214
No ratings yet
Debug 1214
3 pages
ucs-x410c-m7-compute-node-ds (1)
No ratings yet
ucs-x410c-m7-compute-node-ds (1)
8 pages
Sasasas
No ratings yet
Sasasas
18 pages
Atp Document - Drdo
No ratings yet
Atp Document - Drdo
23 pages
Fargo DTC4500 ID Card Printer
No ratings yet
Fargo DTC4500 ID Card Printer
2 pages
Dsa 105826 PDF
No ratings yet
Dsa 105826 PDF
264 pages
Cheaper Airpods Series Price List
No ratings yet
Cheaper Airpods Series Price List
2 pages
PE1102N Datasheet
No ratings yet
PE1102N Datasheet
5 pages
Dell MD Storage - High Performance Tier Implementation Guide
No ratings yet
Dell MD Storage - High Performance Tier Implementation Guide
20 pages
corsair
No ratings yet
corsair
3 pages
Human/Machine Interfaces: Magelis ST6 Basic HMI Panels
No ratings yet
Human/Machine Interfaces: Magelis ST6 Basic HMI Panels
22 pages
Spool Administration
67% (3)
Spool Administration
43 pages
CEN498 Embedded Systems: Assoc. Prof. Dr. Kamil Dimililer
No ratings yet
CEN498 Embedded Systems: Assoc. Prof. Dr. Kamil Dimililer
9 pages
Computer Class 3 Third
No ratings yet
Computer Class 3 Third
6 pages
A Seminar Report ON: Rom-Bios
No ratings yet
A Seminar Report ON: Rom-Bios
12 pages
Cciot Pfe
No ratings yet
Cciot Pfe
15 pages

Lec4 - ILP Pipelining Intro

Uploaded by

Lec4 - ILP Pipelining Intro

Uploaded by

Parallel Computing and Programming.

Lecture 4: Instruction Level Parallelism: Pipelining

Dr. Rony Kassam

IEF Tishreen Uni

n Benefit: Increases instruction processing throughput (1/CPI)

n Pipelined: 4 cycles per 4 instructions (steady state)

n “place one dirty load of clothes in the washer”

A - steps to do a load are sequentially dependent

n Repetition of identical operations

n Fitting examples: automobile assembly line, doing laundry

combinational logic (F,D,E,M,W) BW=~(1/T)

T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T)

T/3 T/3 T/3 BW=~(3/T)

n k-stage pipelined version

n k-stage pipelined version

q Fetch fetch (IF)

Instruction [25– 21] Read

Is this the correct partitioning?

Program 200 4 400 6 600

5-stage speedup is 4, not 5 as predicted by the ideal model. Why?

memory Write 00 ALU Read

Pipelined Operation Example

Pipelined Operation Example

Is life always this beautiful?

sub $11, $2, $3 lw $10, 20($1) 20

n Repetition of identical operations

n Fitting examples: automobile assembly line, doing laundry

n Uniform suboperations ... NOT!

n Independent operations ... NOT!

You might also like