onur-447-spring15-lecture5-uarch-afterlecture
onur-447-spring15-lecture5-uarch-afterlecture
Computer Architecture
Lecture 5: Intro to Microarchitecture:
Single-Cycle
Single-cycle Microarchitectures
Multi-cycle Microarchitectures
Microprogrammed Microarchitectures
Pipelining
3
Assignment for You
Not to be turned in
As you learn the MIPS ISA, think about what tradeoffs the
designers have made
in terms of the ISA properties we talked about
RISC
Simple instructions
Fixed length
Uniform decode
Few addressing modes
CISC
Complex instructions
Variable length
Non-uniform decode
Many addressing modes
7
Now That We Have an ISA
How do we implement it?
8
Implementing the ISA:
Microarchitecture Basics
9
How Does a Machine Process Instructions?
What does processing an instruction mean?
Remember the von Neumann model
Process instruction
12
A Very Basic Instruction Processing Engine
Single-cycle machine
AS’ Sequential AS
Combinational
Logic Logic
(State)
M[0]
M[1]
M[2]
M[3] Registers
M[4] - given special names in the ISA
(as opposed to addresses)
- general vs. special purpose
M[N-1]
Memory Program Counter
array of storage locations memory address
indexed by an address of the current instruction
Multi-cycle machines
Instruction processing broken into multiple cycles/stages
State updates can be made during an instruction’s execution
Architectural state updates made only at the end of an instruction’s
execution
Advantage over single-cycle: The slowest “stage” determines cycle time
15
Instruction Processing “Cycle”
Instructions are processed under the direction of a “control
unit” step by step.
Instruction cycle: Sequence of steps to process an instruction
Fundamentally, there are six phases:
Fetch
Decode
Evaluate Address
Fetch Operands
Execute
Store Result
Not all instructions require all six stages (see P&P Ch. 4)
16
Instruction Processing “Cycle” vs. Machine Clock Cycle
Single-cycle machine:
All six phases of the instruction processing cycle take a single
machine clock cycle to complete
Multi-cycle machine:
All six phases of the instruction processing cycle can take
multiple machine clock cycles to complete
In fact, each phase can take multiple clock cycles to complete
17
Instruction Processing Viewed Another Way
Instructions transform Data (AS) to Data’ (AS’)
This transformation is done by functional units
Units that “operate” on data
These units need to be told what to do to the data
Multi-cycle machine:
Control signals needed in the next cycle can be generated in
the current cycle
Latency of control processing can be overlapped with latency
of datapath operation (more parallelism)
22
Remember…
Single-cycle machine
AS’ Sequential AS
Combinational
Logic Logic
(State)
23
Let’s Start with the State Elements
Data and control inputs 5 Read 3
register 1
Read
on Register 5 data 1
Read
numbers register 2
Registers Data AL
PC 5 Write
Instruction Add Sum register
Read
ion Write data 2
ory Data data
RegWrite
Instruction
address
Address Read
PC data 16
Sign
Instruction Add Sum
extend
Write Data
Instruction
data memory
memory
MemRead
a. Instruction memory b. Program counter c. Adder
Combinational read
output of the read data port is a combinational function of the
register file contents and the corresponding read select port
Synchronous write
the selected register is updated on the positive edge clock
transition when write enable is asserted
Cannot affect read output in between clock edges
WB
IF Data
Register #
PC Address Instruction Registers ALU Address
Register #
Instruction
memory ID/RF Data
Register # EX/AG memory
Data
MEM
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 26
What Is To Come: The Full MIPS Datapath
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
28
R-Type ALU Instructions
Assembly (e.g., register-register signed addition)
ADD rdreg rsreg rtreg
Machine encoding
0 rs rt rd 0 ADD R-type
6-bit 5-bit 5-bit 5-bit 5-bit 6-bit
Semantics
if MEM[PC] == ADD rd rs rt
GPR[rd] GPR[rs] + GPR[rt]
PC PC + 4
29
ALU Datapath
Add
4
ALU operation
25:21 Read 3
Read register 1
PC address Read
20:16 Read data 1
register 2 Zero
Instruction
Instruction Registers ALU ALU
15:11 Write result
Instruction register
Read
memory data 2
Write
data
RegWrite
1
IF ID EX MEM WB
if MEM[PC] == ADD rd rs rt
GPR[rd] GPR[rs] + GPR[rt]
Combinational
PCfrom
**Based on original figure [P&HPC + 4 2004 Elsevier. ALL RIGHTS RESERVED.]
CO&D, COPYRIGHT
state update logic
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 30
I-Type ALU Instructions
Assembly (e.g., register-immediate signed additions)
ADDI rtreg rsreg immediate16
Machine encoding
Semantics
if MEM[PC] == ADDI rt rs immediate
GPR[rt] GPR[rs] + sign-extend (immediate)
PC PC + 4
31
Datapath for R and I-Type ALU Insts.
Add
4
3 ALU operation
Read
Read 25:21
PC register 1 Mem
address Read
data 1
Read
20:16 Zero
Instruction register 2
Instruction Registers ALU ALU
15:11
Write result Address
Instruction register
Read
memory data 2
Write Data
RegDest data
memo
Write
isItype RegWrite
ALUSrc data
116 32
Sign isItype Mem
extend
IF ID EX MEM WB
if MEM[PC] == ADDI rt rs immediate
GPR[rt] GPR[rs] + sign-extend (immediate)
Combinational
PC PC + 4 state update logic 32
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Single-Cycle Datapath for
Data Movement Instructions
33
Load Instructions
Assembly (e.g., load 4-byte word)
LW rtreg offset16 (basereg)
Machine encoding
LW base rt offset I-type
6-bit 5-bit 5-bit 16-bit
Semantics
if MEM[PC]==LW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
GPR[rt] MEM[ translate(EA) ]
PC PC + 4
34
LW Datapath
Add
0
4 add
ALU operation MemWrite
Read 3
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero Address Read
Instruction Registers ALU ALU data 16
Write Read
result Address
Instruction register data
Read Data
memory data 2 Write
Write data Data memory
data
memory
RegDest RegWrite Write
data
isItype 116
ALUSrc
MemRead
Sign
32
isItype MemRead
extend
1
a. Data memory unit b. Si
Machine encoding
Semantics
if MEM[PC]==SW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
MEM[ translate(EA) ] GPR[rt]
PC PC + 4
36
SW Datapath
Add
1
4 add
ALU operation MemWrite
Read 3
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero Address Read
Instruction Registers ALU ALU data 16
Write Read
result Address
Instruction register data
Read Data
memory data 2 Write
Write data Data memory
data
memory
RegDest RegWrite Write
data
isItype 016 ALUSrc MemRead
Sign
32
isItype MemRead
extend
0
a. Data memory unit b. Si
Add
4
add
Read 3 ALU operation isStore
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
Write Data
data
memory
RegDest RegWrite Write
data
isItype !isStore
16 32
ALUSrc
Sign isItype MemRead
extend
isLoad
Add
MemtoReg
isLoad
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 39
Single-Cycle Datapath for
Control Flow Instructions
40
Unconditional Jump Instructions
Assembly
J immediate26
Machine encoding
J immediate J-type
6-bit 26-bit
Semantics
if MEM[PC]==J immediate26
target = { PC[31:28], immediate26, 2’b00 }
PC target
41
Unconditional Jump Datapath
isJ Add
PCSrc
4
XALU operation
Read 3 0
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
concat Write Data
data
memory
? RegWrite Write
data
ALUSrc
0 16 32
Sign X MemRead
extend
if MEM[PC]==J immediate26
PC = { PC[31:28], immediate26, 2’b00 } 42
What about JR, JAL, JALR?
Aside: MIPS Cheat Sheet
http://www.ece.cmu.edu/~ece447/s15/lib/exe/fetch.php?m
edia=mips_reference_data.pdf
43
Conditional Branch Instructions
Assembly (e.g., branch if equal)
BEQ rsreg rtreg immediate16
Machine encoding
44
Conditional Branch Datapath (for you to finish)
watch out
PC + 4 from instruction datapath
Add
PCSrc Add Sum Branch target
4
Shift
left 2
Read
PC address sub
ALU operation
Read 3
Instruction register 1
Read
Instruction data 1
Read
Instruction register 2 To branch
memory Registers ALU bcond
Zero
concat Write control logic
register
Read
data 2
Write
data
RegWrite
16 0 32
Sign
extend
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
45
How to uphold the delayed branch semantics?
Putting It All Together
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
47
Single-Cycle Hardwired Control
As combinational function of Inst=MEM[PC]
31 26 21 16 11 6 0
Consider
All R-type and I-type ALU instructions
LW and SW
48
Single-Bit Control Signals
49
JAL and JALR require additional RegDest and MemtoReg options
Single-Bit Control Signals
50
JR and JALR require additional PCSrc options
ALU Control
case opcode
‘0’ select operation according to funct
‘ALUi’ selection operation according to opcode
‘LW’ select addition
‘SW’ select addition
‘Bxx’ select bcond generation function
__ don’t care
51
R-Type ALU
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
X
Instruction u register M data
u M
X
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation 0
Instruction [5– 0]
X
u M
u
X
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcondALU operation
ALU
control
0
Instruction [5– 0]
X
u M
u
X
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcondALU operation
ALU
control
0
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
X
u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
X
u M
X
u
X
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU operation 0
X
ALU
control
Instruction [5– 0]
59
Evaluating the Single-Cycle
Microarchitecture
60
A Single-Cycle Microarchitecture
Is this a good idea/design?
61
A Single-Cycle Microarchitecture: Analysis
Every instruction takes 1 cycle to execute
CPI (Cycles per instruction) is strictly 1
62
What is the Slowest Instruction to Process?
Let’s go back to the basics
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps
MemWrite
ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
memory Instruction [15– 11] x
Write 400ps u
x
M
u
1 data 1 350ps Write
Data
memory 0
x
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps
MemWrite
ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
Instruction
memory Instruction [15– 11]
M
u
x
Write
register
Write
data 2
M
u
x
result Address
data
550ps
1
M
u
1
600ps data 1 350ps Write
Data
memory 0
x
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps
MemWrite
ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
Write x
1 data 1 350ps 550ps
Write
Data
memory 0
x
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
M M
PC+4 [31– 28]
200ps u u
100ps ALU
Add result
x
1
x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
350ps
MemWrite
ALUSrc
RegWrite
PC
Read
address
Instruction [25– 21] Read
register 1
Read
350ps
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps ALU
Add result
x
1
x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
200ps
MemWrite
ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
71
What is the Slowest Instruction to Process?
Memory is not magic
72
Single Cycle uArch: Complexity
Contrived
All instructions run as slow as the slowest instruction
Inefficient
All instructions run as slow as the slowest instruction
Must provide worst-case combinational resources in parallel as required
by any instruction
Need to replicate a resource if it is needed more than once by an
instruction during different parts of the instruction processing cycle
Balanced design
Balance instruction/data flow through hardware components
Design to eliminate bottlenecks: balance the hardware for the
work
74
Single-Cycle Design vs. Design Principles
Critical path design
Balanced design
75
Aside: System Design Principles
When designing computer systems/architectures, it is
important to follow good principles
76
Aside: From Lecture 1
“architecture […] based upon principle, and not upon
precedent”
77
Aside: System Design Principles
We will continue to cover key principles in this course
Here are some references where you can learn more
78
Aside: One Important Principle
Keep it simple
80