0% found this document useful (0 votes)
23 views

L11 Pipelined Datapath and

Uploaded by

Carlos Araujo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

L11 Pipelined Datapath and

Uploaded by

Carlos Araujo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

1

A pipeline diagram
A pipeline diagram shows the execution of a series of instructions.
The instruction sequence is shown vertically, from top to bottom.
Clock cycles are shown horizontally, from left to right.
Each instruction is divided into its component stages. (We show five
stages for every instruction, which will make the control unit easier.)
This clearly indicates the overlapping of instructions. For example, there
are three instructions active in the third cycle above.
The lw instruction is in its Execute stage.
Simultaneously, the sub is in its Instruction Decode stage.
Also, the and instruction is just being fetched.
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp)
IF ID EX MEM WB
sub $v0, $a0, $a1
IF ID EX MEM WB
and $t1, $t2, $t3
IF ID EX MEM WB
or $s0, $s1, $s2
IF ID EX MEM WB
add $sp, $sp, -4
IF ID EX MEM WB
2
Pipeline terminology
The pipeline depth is the number of stagesin this case, five.
In the first four cycles here, the pipeline is filling, since there are unused
functional units.
In cycle 5, the pipeline is full. Five instructions are being executed
simultaneously, so all hardware units are in use.
In cycles 6-9, the pipeline is emptying.
filling full emptying
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp)
IF ID EX MEM WB
sub $v0, $a0, $a1
IF ID EX MEM WB
and $t1, $t2, $t3
IF ID EX MEM WB
or $s0, $s1, $s2
IF ID EX MEM WB
add $sp, $sp, -4
IF ID EX MEM WB
3
Pipelined datapath and control
Now well see a basic implementation of a pipelined processor.
The datapath and control unit share similarities with both the single-
cycle and multicycle implementations that we already saw.
An example execution highlights important pipelining concepts.
In future lectures, well discuss several complications of pipelining that
were hiding from you for now.
4
Pipelining concepts
A pipelined processor allows multiple instructions to execute at once, and
each instruction uses a different functional unit in the datapath.
This increases throughput, so programs can run faster.
One instruction can finish executing on every clock cycle, and simpler
stages also lead to shorter cycle times.

Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp)
IF ID EX MEM WB
sub $v0, $a0, $a1
IF ID EX MEM WB
and $t1, $t2, $t3
IF ID EX MEM WB
or $s0, $s1, $s2
IF ID EX MEM WB
add $t5, $t6, $0
IF ID EX MEM WB
5
Pipelined Datapath
The whole point of pipelining is to allow multiple instructions to execute
at the same time.
We may need to perform several operations in the same cycle.
Increment the PC and add registers at the same time.
Fetch one instruction while another one reads or writes data.







Thus, like the single-cycle datapath, a pipelined processor will need to
duplicate hardware elements that are needed several times in the same
clock cycle.
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp)
IF ID EX MEM WB
sub $v0, $a0, $a1
IF ID EX MEM WB
and $t1, $t2, $t3
IF ID EX MEM WB
or $s0, $s1, $s2
IF ID EX MEM WB
add $t5, $t6, $0
IF ID EX MEM WB
6
We need only one register file to support both the ID and WB stages.






Reads and writes go to separate ports on the register file.
Writes occur in the first half of the cycle, reads occur in the second half.

One register file is enough
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
7
Single-cycle datapath, slightly rearranged
MemToReg
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite
MemRead
1


0
4
Shift
left 2
P
C
Add
1
0
PCSrc
Sign
extend
ALUSrc
Result
Zero
ALU
ALUOp
Instr [15 - 0]
RegDst
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite
Add
Instr [15 - 11]
Instr [20 - 16]
0
1
0
1
10
Registers added to the multi-cycle
Memory
data
register
Result
Zero
ALU
ALUOp
0
M
u
x
1
ALUSrcA
0
1
2
3
ALUSrcB
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite
Address
Memory
Mem
Data
Write
data
Sign
extend
Shift
left 2
0
M
u
x
1
PCSource
PC
A
B
ALU
Out
4 [31-26]
[25-21]
[20-16]
[15-11]
[15-0]
Instruction
register
IRWrite
0
M
u
x
1
RegDst
0
M
u
x
1
MemToReg
0
M
u
x
1
IorD
MemRead
MemWrite
PCWrite
11
Pipeline registers
Well add intermediate registers to our pipelined datapath too.
Theres a lot of information to save, however. Well simplify our diagrams
by drawing just one big pipeline register between each stage.
The registers are named for the stages they connect.
IF/ID ID/EX EX/MEM MEM/WB
No register is needed after the WB stage, because after WB the
instruction is done.

12
Pipelined datapath
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite
MemRead
1


0
MemToReg

4
Shift
left 2
Add
Sign
extend
ALUSrc
Result
Zero
ALU
ALUOp
Instr [15 - 0]
RegDst
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite
Add
Instr [15 - 11]
Instr [20 - 16]
0
1
0
1
IF/ID ID/EX EX/MEM MEM/WB
1
0
PCSrc
P
C
13
Propagating values forward
Any data values required in later stages must be propagated through the
pipeline registers.
The most extreme example is the destination register.
The rd field of the instruction word, retrieved in the first stage (IF),
determines the destination register. But that register isnt updated
until the fifth stage (WB).
Thus, the rd field must be passed through all of the pipeline stages,
as shown in red on the next slide.
Why cant we keep a single instruction register like we did in the multi-
cycle data-path?
14
The destination register
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite
MemRead
1


0
MemToReg

4
Shift
left 2
Add
ALUSrc
Result
Zero
ALU
ALUOp
Instr [15 - 0]
RegDst
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite
Add
Instr [15 - 11]
Instr [20 - 16]
0
1
0
1
IF/ID ID/EX EX/MEM MEM/WB
1
0
PCSrc
P
C
Sign
extend
15
What about control signals?
The control signals are generated in the same way as in the single-cycle
processorafter an instruction is fetched, the processor decodes it and
produces the appropriate control values.
But just like before, some of the control signals will not be needed until
some later stage and clock cycle.
These signals must be propagated through the pipeline until they reach
the appropriate stage. We can just pass them in the pipeline registers,
along with the other data.
Control signals can be categorized by the pipeline stage that uses them.

16
Pipelined datapath and control
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite
MemRead
1


0
MemToReg

4
Shift
left 2
Add
ALUSrc
Result
Zero
ALU
ALUOp
Instr [15 - 0]
RegDst
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite
Add
Instr [15 - 11]
Instr [20 - 16]
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
P
C
1
0
PCSrc
Sign
extend
EX
M
WB
19
Notes about the diagram
The control signals are grouped together in the pipeline registers, just to
make the diagram a little clearer.
Not all of the registers have a write enable signal.
Because the datapath fetches one instruction per cycle, the PC must
also be updated on each clock cycle. Including a write enable for the
PC would be redundant.
Similarly, the pipeline registers are also written on every cycle, so no
explicit write signals are needed.
20
Heres a sample sequence of instructions to execute.
1000: lw $8, 4($29)
1004: sub $2, $4, $5
1008: and $9, $10, $11
1012: or $16, $17, $18
1016: add $13, $14, $0
Well make some assumptions, just so we can show actual data values.
Each register contains its number plus 100. For instance, register $8
contains 108, register $29 contains 129, and so forth.
Every data memory location contains 99.
Our pipeline diagrams will follow some conventions.
An X indicates values that arent important, like the constant field of
an R-type instruction.
Question marks ??? indicate values we dont know, usually resulting
from instructions coming before and after the ones in our example.
An example execution sequence
addresses
in decimal
21
Cycle 1 (filling)
IF: lw $8, 4($29) MEM: ??? WB: ??? EX: ??? ID: ???
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite (?)
MemRead (?)
1


0
MemToReg
(?)
Shift
left 2
Add
1
0
PCSrc
ALUSrc (?)
Result
Zero
ALU
ALUOp (???)
RegDst (?)
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite (?)
Add
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
1000
1004
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???
???

4
P
C
Sign
extend
EX
M
WB
22
Cycle 2
ID: lw $8, 4($29) IF: sub $2, $4, $5 MEM: ??? WB: ??? EX: ???
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
1


0

4
Shift
left 2
Add
PCSrc
Result
Zero
ALU
4
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
Add
X
8
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
1004
29
X
1008
129
X
MemToReg
(?)
???
???
???
???
???
???
RegWrite (?)
MemWrite (?)
MemRead (?)
???
???
???
ALUSrc (?)
ALUOp (???)
RegDst (?)
???
???
???
???
???
???
???
P
C
Sign
extend
EX
M
WB
1
0
23
Cycle 3
ID: sub $2, $4, $5 IF: and $9, $10, $11 EX: lw $8, 4($29) MEM: ??? WB: ???
MemToReg
(?)
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite (?)
MemRead (?)
1


0

4
Shift
left 2
Add
PCSrc
ALUSrc (1)
Result
Zero
ALU
ALUOp (add)
X
RegDst (0)
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
Add
2
X
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
1008
4
5
1012
104
105
129
4
X
8
X
8
133
4
???
???
???
???
???
???
RegWrite (?)
???
???
???
P
C
Sign
extend
1
0
EX
M
WB
24
Cycle 4
ID: and $9, $10, $11 IF: or $16, $17, $18 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ???
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite (0)
MemRead (1)
1


0
MemToReg
(?)

4
Shift
left 2
Add
PCSrc
ALUSrc (0)
Result
Zero
ALU
ALUOp (sub)
X
RegDst (1)
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite (?)
Add
9
X
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
1012
10
11
1016
110
111
104
X
105
X
2
2
1
133
X
99
8
???
???
???
???
???
???
P
C
Sign
extend
EX
M
WB
1
0
25
Cycle 5 (full)
ID: or $16, $17, $18 IF: add $13, $14, $0 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB:
lw $8, 4($29)
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite (0)
MemRead (0)
1


0
MemToReg
(1)

4
Shift
left 2
Add
PCSrc
ALUSrc (0)
Result
Zero
ALU
ALUOp (and)
X
RegDst (1)
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite (1)
Add
16
X
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
1016
17
18
1020
117
118
110
X
111
X
9
9
110
-1
105
X
2
99
133
99
8
99
8
P
C
Sign
extend
EX
M
WB
1
0
26
Cycle 6 (emptying)
ID: add $13, $14, $0 IF: ??? EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub
$2, $4, $5
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite (0)
MemRead (0)
1


0
MemToReg
(0)

4
Shift
left 2
Add
PCSrc
ALUSrc (0)
Result
Zero
ALU
ALUOp (or)
X
RegDst (1)
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite (1)
Add
13
X
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
1020
14
0
???
114
0
117
X
118
X
16
16
119
110
111
X
9



-1
2
P
C
Sign
extend
1
0
EX
M
WB
27
Cycle 7
ID: ??? IF: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and
$9, $10, $11
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite (0)
MemRead (0)
1


0
MemToReg
(0)

4
Shift
left 2
Add
PCSrc
ALUSrc (0)
Result
Zero
ALU
ALUOp (add)
???
RegDst (1)
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite (1)
Add
???
???
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
???
???
???
???
114
X
0
X
13
13
114
119
118
X
16
X
110
110
9
110
9
P
C
Sign
extend
???
???
EX
M
WB
1
0
28
Cycle 8
ID: ??? IF: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16,
$17, $18
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite (0)
MemRead (0)
1


0
MemToReg
(0)

4
Shift
left 2
Add
PCSrc
ALUSrc (?)
Result
Zero
ALU
ALUOp (???)
???
RegDst (?)
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite (1)
Add
???
???
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
???
???
???
???
???
???
114
0
X
13
X
119
119
16
119
16
P
C
Sign
extend
???
??? ???
???
???
???
???
1
0
EX
M
WB
29
Cycle 9
ID: ??? IF: ??? EX: ??? MEM: ??? WB: add
$13, $14, $0
Read
address
Instruction
memory
Instruction
[31-0]
Address
Write
data
Data
memory
Read
data
MemWrite (?)
MemRead (?)
1


0
MemToReg
(0)

4
Shift
left 2
Add
PCSrc
ALUSrc (?)
Result
Zero
ALU
ALUOp (???)
???
RegDst (?)
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite (1)
Add
???
???
0
1
0
1
IF/ID
ID/EX
EX/MEM
MEM/WB Control
M
WB
WB
???
???
???
???
???
???
???
???
?
X
???
X
114
114
13
114
13
P
C
Sign
extend
???
???
???
???
???
???
1
0
EX
M
WB
30
Thats a lot of diagrams there






Compare the last nine slides with the pipeline diagram above.
You can see how instruction executions are overlapped.
Each functional unit is used by a different instruction in each cycle.
The pipeline registers save control and data values generated in
previous clock cycles for later use.
When the pipeline is full in clock cycle 5, all of the hardware units
are utilized. This is the ideal situation, and what makes pipelined
processors so fast.
Try to understand this example or the similar one in the book at the end
of Section 6.3.
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp)
IF ID EX MEM WB
sub $v0, $a0, $a1
IF ID EX MEM WB
and $t1, $t2, $t3
IF ID EX MEM WB
or $s0, $s1, $s2
IF ID EX MEM WB
add $t5, $t6, $0
IF ID EX MEM WB
31
Performance Revisited
Assuming the following functional unit latencies:





What is the cycle time of a single-cycle implementation?
What is its throughput?


What is the cycle time of a ideal pipelined implementation?
What is its steady-state throughput?


How much faster is pipelining?





A
L
U

Inst
mem
Reg
Read
Data
Mem
Reg
Write
3ns 2ns 2ns 3ns 2ns
32
Ideal speedup
In our pipeline, we can execute up to five instructions simultaneously.
This implies that the maximum speedup is 5 times.
In general, the ideal speedup equals the pipeline depth.
Why was our speedup on the previous slide only 4 times?
The pipeline stages are imbalanced: a register file and ALU operations
can be done in 2ns, but we must stretch that out to 3ns to keep the
ID, EX, and WB stages synchronized with IF and MEM.
Balancing the stages is one of the many hard parts in designing a
pipelined processor.
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp)
IF ID EX MEM WB
sub $v0, $a0, $a1
IF ID EX MEM WB
and $t1, $t2, $t3
IF ID EX MEM WB
or $s0, $s1, $s2
IF ID EX MEM WB
add $sp, $sp, -4
IF ID EX MEM WB
33
The pipelining paradox
Pipelining does not improve the execution time of any single instruction.
Each instruction here actually takes longer to execute than in a single-
cycle datapath (15ns vs. 12ns)!
Instead, pipelining increases the throughput, or the amount of work done
per unit time. Here, several instructions are executed together in each
clock cycle.
The result is improved execution time for a sequence of instructions, such
as an entire program.
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp)
IF ID EX MEM WB
sub $v0, $a0, $a1
IF ID EX MEM WB
and $t1, $t2, $t3
IF ID EX MEM WB
or $s0, $s1, $s2
IF ID EX MEM WB
add $sp, $sp, -4
IF ID EX MEM WB
34
Instruction set architectures and pipelining
The MIPS instruction set was designed especially for easy pipelining.
All instructions are 32-bits long, so the instruction fetch stage just
needs to read one word on every clock cycle.
Fields are in the same position in different instruction formatsthe
opcode is always the first six bits, rs is the next five bits, etc. This
makes things easy for the ID stage.
MIPS is a register-to-register architecture, so arithmetic operations
cannot contain memory references. This keeps the pipeline shorter
and simpler.
Pipelining is harder for older, more complex instruction sets.
If different instructions had different lengths or formats, the fetch
and decode stages would need extra time to determine the actual
length of each instruction and the position of the fields.
With memory-to-memory instructions, additional pipeline stages may
be needed to compute effective addresses and read memory before
the EX stage.
35
Summary
The pipelined datapath combines ideas from the single and multicycle
processors that we saw earlier.
It uses multiple memories and ALUs.
Instruction execution is split into several stages.
Pipeline registers propagate data and control values to later stages.
The MIPS instruction set architecture supports pipelining with uniform
instruction formats and simple addressing modes.

Next lecture, well start talking about Hazards.

You might also like