0% found this document useful (0 votes)
61 views

CS104: Computer Organization: 30 March, 2020

This document discusses the single cycle processor and its datapath. It has the following key points: - A single cycle processor has a simple logic and clock, but inefficient utilization of functional units since instructions take varying times. The cycle time must accommodate the longest instruction. - It shows the datapath of a single cycle processor, including the control unit and data paths. R-type instructions use the ALU, while load instructions access memory. - Control signals select the appropriate operations and data paths for different instruction types in each cycle. For example, a load instruction activates the memory read path instead of the ALU.

Uploaded by

Om Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

CS104: Computer Organization: 30 March, 2020

This document discusses the single cycle processor and its datapath. It has the following key points: - A single cycle processor has a simple logic and clock, but inefficient utilization of functional units since instructions take varying times. The cycle time must accommodate the longest instruction. - It shows the datapath of a single cycle processor, including the control unit and data paths. R-type instructions use the ALU, while load instructions access memory. - Control signals select the appropriate operations and data paths for different instruction types in each cycle. For example, a load instruction activates the memory read path instead of the ALU.

Uploaded by

Om Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

L12

30/03/2020

CS104: Computer Organization


30th March, 2020
Manojit Ghose
Computer Organization (IS F242)
Dip Sankar
Lecture Banerjee
1 : Introductory Thoughts

Dip Sankar Banerjee


[email protected]
Department
Indian Institute of CS &Technology,
of Information IS Guwahati
Jan-Apr 2020
L12
30/03/2020

Single Cycle Processor


• Advantages
– Single cycle per instruction makes logic and clock simple
– All machines would have a CPI of 1
• Disadvantages
– Inefficient utilization of memory and functional units since different
instructions take different lengths of time
• Each functional unit is used only once per clock cycle
• e.g. ALU only computes values a small amount of the time
– Cycle time is the worst case path  long cycle times!
• Load instruction
– PC CLK-to-Q +
– instruction memory access time +
– register file access time +
– ALU delay +
– data memory access time +
– register file setup time +
– clock skew
– All machines would have a CPI of 1, with cycle time set by the
longest instruction!
L12
30/03/2020
Single Cycle Datapath with Control Unit
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign ALU


16 Extend 32 control
Instr[5-0]
L12
30/03/2020
R-type Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign ALU


16 Extend 32 control
Instr[5-0]
L12
30/03/2020
Load Word Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign
Find the active ALU
16 Extend 32 control
control & data-
Instr[5-0]
path connections
L12
30/03/2020
Load Word Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign ALU


16 Extend 32 control
Instr[5-0]
L12
30/03/2020
Branch Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign
Find the active ALU
16 Extend 32 control
control & data-
Instr[5-0]
path connections
L12
30/03/2020
Branch Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign ALU


16 Extend 32 control
Instr[5-0]
L12
30/03/2020
Adding the Jump Operation
Instr[25-0] 1
Shift
26 28 32
left 2 0
PC+4[31-28]
Add 0
Add 1
4 Shift
Jump left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Read Data
PC Instr[31-0] 0 File Memory Read Data 1
Address ALU
Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign
ALU
16 Extend 32 control
Instr[5-0]
L12
30/03/2020
Instruction Times (Critical Paths)
 What is the clock cycle time (assuming negligible
delays for muxes, control unit, sign extend, PC access,
shift left 2, wires, setup and hold times) but with:
 Instruction and Data Memory (200 ps)
 ALU and adders (200 ps)
 Register File access (reads or writes) (100 ps)
Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total
R-
type
load
store
beq
jump
L12
30/03/2020
Instruction Critical Paths
 What is the clock cycle time (assuming negligible
delays for muxes, control unit, sign extend, PC access,
shift left 2, wires, setup and hold times) but with:
 Instruction and Data Memory (200 ps)
 ALU and adders (200 ps)
 Register File access (reads or writes) (100 ps)
Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total
R-
type 200 100 200 100 600

load 200 100 200 200 100 800


store 200 100 200 200 700
beq 200 100 200 500
jump
200 200
L12
30/03/2020
Single Cycle Disadvantages & Advantages
Uses the clock cycle inefficiently – the clock cycle must
be timed to accommodate the slowest instruction
 especially problematic for more complex instructions like
floating point multiply

Cycle 1 Cycle 2
Clk

lw sw Waste

May be wasteful of area since some functional units


(e.g., adders) must be duplicated since they can not be
shared during a clock cycle
but
Is simple and easy to understand
L12
30/03/2020

How Can We Make It Faster?


Start fetching and executing the next instruction before
the current one has completed
 Pipelining – all modern processors are pipelined for performance
 Remember the performance equation:
CPU time = CPI * CC * IC

Under ideal conditions and with a large number of


instructions, the speedup from pipelining is approximately
equal to the number of pipe stages
 A five stage pipeline is nearly five times faster because the CC is
nearly five times faster

Fetch (and execute) more than one instruction at a time


 Superscalar processing – stay tuned
L12
30/03/2020

The Five Stages of Load Instruction


Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

lw IFetch Dec Exec Mem WB

IFetch: Instruction Fetch and Update PC


Dec: Registers Fetch and Instruction Decode
Exec: Execute R-type; calculate memory address
Mem: Read/write the data from/to the Data Memory
WB: Write the result data into the register file
L12
30/03/2020
A Pipelined MIPS Processor
Start the next instruction before the current one has
completed
 improves throughput - total amount of work done in a given time
 instruction latency (execution time, delay time, response time -
time from the start of an instruction to its completion) is not
reduced
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

lw IFetch Dec Exec Mem WB

sw IFetch Dec Exec Mem WB

R-type IFetch Dec Exec Mem WB

- clock cycle (pipeline stage time) is limited by the slowest stage


- for some stages don’t need the whole clock cycle (e.g., WB)
- for some instructions, some stages are wasted cycles (i.e.,
nothing is done during that cycle for that instruction)
L12
30/03/2020

Single Cycle versus Pipeline


Single Cycle Implementation (CC = 800 ps):
Cycle 1 Cycle 2
Clk

lw sw Waste

Pipeline Implementation (CC = 200 ps): 400 ps


lw IFetch Dec Exec Mem WB

sw IFetch Dec Exec Mem WB

R-type IFetch Dec Exec Mem WB

To complete an entire instruction in the pipelined case


takes 1000 ps (as compared to 800 ps for the single
cycle case). Why ?
How long does each take to complete 1,000,000 adds ?
L12
30/03/2020

Pipelining the MIPS ISA


What makes it easy
 all instructions are the same length (32 bits)
- can fetch in the 1st stage and decode in the 2nd stage
 few instruction formats (only 3) with symmetry across formats
- can begin reading register file in 2nd stage
 memory operations occur only in loads and stores
- can use the execute stage to calculate memory addresses
 each instruction writes at most one result (i.e., changes the
machine state) and does it in the last few pipeline stages (MEM
or WB)
 operands must be aligned in memory so a single data transfer
takes only one data memory access
L12
30/03/2020
MIPS Pipeline Datapath Additions/Mods
State registers between each pipeline stage to isolate them
IF:IFetch ID:Dec EX:Execute MEM: WB:
MemAccess WriteBack

IF/ID ID/EX EX/MEM

Add

4 Shift Add MEM/WB


left 2
Read Addr 1
Instruction Data
Register Read
Memory Memory
Read Addr 2Data 1
Read
PC

File Address
Read
Address Write Addr ALU
Read Data
Data 2 Write Data
Write Data

Sign
16 Extend 32

System Clock
L12
30/03/2020
MIPS Pipeline Control Path Modifications
All control signals can be determined during Decode
 and held in the state registers between pipeline stages
PCSrc
ID/EX
EX/MEM
Control
IF/ID

Add
Branch MEM/WB
RegWrite Shift Add
4
left 2
Read Addr 1
Instruction Data
Register Read
Memory Memory
Read Addr 2Data 1 MemtoReg
Read ALUSrc
PC

File Address Read


Address Write Addr ALU
Read Data
Data 2 Write Data
Write Data
ALU
cntrl
Sign MemRead

16 Extend 32 ALUOp

RegDst
L12
30/03/2020

Pipeline Control
IF Stage: read Instr Memory (always asserted) and write
PC (on System Clock)
ID Stage: no optional control signals to set

EX Stage MEM Stage WB Stage

Reg ALU ALU ALU Brch Mem Mem Reg Mem


Dst Op1 Op0 Src Read Write Write toReg

R 1 1 0 0 0 0 0 1 0

lw 0 0 0 1 0 1 0 1 1

sw X 0 0 1 0 0 1 0 X

beq X 0 1 0 1 0 0 0 X
L12
30/03/2020

Graphically Representing MIPS Pipeline

ALU
IM Reg DM Reg

Can help with answering questions like:


 How many cycles does it take to execute this code?
 What is the ALU doing during cycle 4?
 Is there a hazard, why does it occur, and how can it be fixed?
L12
30/03/2020

Why Pipeline? For Performance!


Time (clock cycles)

Once the

ALU
I Inst 0 IM Reg DM Reg pipeline is full,
n one instruction
s is completed

ALU
t Inst 1 IM Reg DM Reg
every cycle, so
r. CPI = 1

ALU
O Inst 2 IM Reg DM Reg
r
d

ALU
e Inst 3 IM Reg DM Reg
r

ALU
Inst 4 IM Reg DM Reg

Time to fill the pipeline


L12
30/03/2020
Can Pipelining Get Us Into Trouble?
Yes: Pipeline Hazards
 structural hazards: attempt to use the same resource by two
different instructions at the same time
 data hazards: attempt to use data before it is ready
- An instruction’s source operand(s) are produced by a prior
instruction still in the pipeline
 control hazards: attempt to make a decision about program
control flow before the condition has been evaluated and the
new PC target address calculated
- branch and jump instructions, exceptions

Can usually resolve hazards by waiting


 pipeline control must detect the hazard
 and take action to resolve hazards
L12
30/03/2020
A Single Memory Would Be a Structural Hazard
Time (clock cycles)

Reading data
lw

ALU
I Mem Reg Mem Reg
from memory
n
s

ALU
t Inst 1 Mem Reg Mem Reg
r.

ALU
O Inst 2 Mem Reg Mem Reg
r
d

ALU
e Inst 3 Mem Reg Mem Reg
r

ALU
Inst 4 Mem Reg Mem Reg

Reading instruction
from memory
Fix with separate instr and data memories (I$ and D$)
L12
30/03/2020
How About RegisterTime
File Access?
(clock cycles)

Fix register file


add $1,

ALU
I IM Reg DM Reg access hazard by
n doing reads in the
s second half of the

ALU
t Inst 1 IM Reg DM Reg
cycle and writes in
r. the first half

ALU
O Inst 2 IM Reg DM Reg
r
d

ALU
e add $2,$1, IM Reg DM Reg
r

clock edge that controls clock edge that controls


register writing loading of pipeline state
registers
L12
30/03/2020
Register Usage Can Cause Data Hazards
Dependencies backward in time cause hazards

ALU
add $1, IM Reg DM Reg

ALU
sub $4,$1,$5 IM Reg DM Reg

ALU
and $6,$1,$7 IM Reg DM Reg

ALU
or $8,$1,$9 IM Reg DM Reg

ALU
xor $4,$1,$5 IM Reg DM Reg

Read after write data hazard (RAW in the code)


L12
30/03/2020
Loads Can Cause Data Hazards
Dependencies backward in time cause hazards

ALU
I lw $1,4($2) IM Reg DM Reg
n
s

ALU
t sub $4,$1,$5 IM Reg DM Reg
r.

ALU
O and $6,$1,$7 IM Reg DM Reg
r
d

ALU
e or $8,$1,$9 IM Reg DM Reg
r

ALU
xor $4,$1,$5 IM Reg DM Reg

Load-use data hazard


L12
30/03/2020
Branch Instructions Cause Control Hazards
Dependencies backward in time cause hazards

beq

ALU
I IM Reg DM Reg
n
s

ALU
t lw IM Reg DM Reg
r.

ALU
O Inst 3 IM Reg DM Reg
r
d

ALU
e Inst 4 IM Reg DM Reg
r
L12
30/03/2020
Other Pipeline Structures Are Possible
What about the (slow) multiply operation?
 Make the clock twice as slow or …
 let it take two cycles (since it doesn’t use the DM stage)
MUL

ALU
IM Reg DM Reg

What if the data memory access is twice as slow as


the instruction memory?
 make the clock twice as slow or …
 let data memory access take two cycles (and keep the same
clock rate)
ALU

IM Reg DM1 DM2 Reg


L12
30/03/2020
Other Sample Pipeline Alternatives
ARM7
IM Reg EX

PC update decode ALU op


IM access reg DM access
access shift/rotate
commit result
(write back)

XScale

ALU
IM1 IM2 Reg DM1 Reg
SHFT DM2
PC update decode DM write
BTB access reg 1 access ALU op reg write
start IM access
shift/rotate start DM access
IM access reg 2 access exception
L12
30/03/2020

Summary
All modern day processors use pipelining
Pipelining doesn’t help latency of single task, it helps
throughput of entire workload
Potential speedup: a CPI of 1 and fast a CC
Pipeline rate limited by slowest pipeline stage
 Unbalanced pipe stages makes for inefficiencies
 The time to “fill” pipeline and time to “drain” it can impact
speedup for deep pipelines and short code runs
Must detect and resolve hazards
 Stalling negatively affects CPI (makes CPI less than the ideal
of 1)

You might also like