Arch2 Microarchitecture Design Afterlecture
Arch2 Microarchitecture Design Afterlecture
ETH Zürich
Spring 2023
30 March 2023
Extra Credit Assignment 1: Talk Analysis
Intelligent Architectures for Intelligent Machines
Watch and analyze this short lecture (33 minutes)
https://www.youtube.com/watch?v=WxHribseelw (Oct 2022)
3
Extra Credit Assignment 2: Moore’s Law
Guidelines on how to review papers critically
5
Readings
This week
Introduction to microarchitecture and single-cycle
microarchitecture
H&H, Chapter 7.1-7.3
P&P, Appendices A and C
Multi-cycle microarchitecture
H&H, Chapter 7.4
P&P, Appendices A and C
8
Recall: The Instruction Processing “Cycle”
FETCH
DECODE
EVALUATE ADDRESS
FETCH OPERANDS
EXECUTE
STORE RESULT
9
Instruction Processing “Cycle” vs. Machine Clock Cycle
Single-cycle machine:
All six phases of the instruction processing cycle take a single
machine clock cycle to complete
Multi-cycle machine:
All six phases of the instruction processing cycle can take
multiple machine clock cycles to complete
In fact, each phase can take multiple clock cycles to complete
10
Recall: Single-Cycle Machine
Single-cycle machine
AS’ Sequential AS
Combinational
Logic Logic
(State)
12
A Single-Cycle Microarchitecture
From the Ground Up
Single-Cycle Control Logic
Single-Cycle Uarch I (We Developed in Lectures)
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite
CLK CLK
CLK
25:21 WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD
ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+
SignImm
4 15:0
<<2
Sign Extend PCBranch
+
Result
17
A Single-Cycle Microarchitecture
Is this a good idea/design?
18
Performance Analysis Basics
Recall: Performance Analysis Basics
Execution time of a single instruction
{CPI} x {clock cycle time}
CPI: Number of cycles it takes to execute an instruction
20
Carnegie Mellon
Processor Performance
How fast is my program?
Every program consists of a series of instructions
Each instruction needs to be executed
21
Carnegie Mellon
Processor Performance
How fast is my program?
Every program consists of a series of instructions
Each instruction needs to be executed
22
Carnegie Mellon
Processor Performance
How fast is my program?
Every program consists of a series of instructions
Each instruction needs to be executed
23
Carnegie Mellon
Processor Performance
As a general formula
Our program consists of executing N instructions
Our processor needs CPI cycles (on average) for each instruction
The clock frequency of the processor is f
the clock period is therefore T=1/f
24
Carnegie Mellon
Processor Performance
As a general formula
Our program consists of executing N instructions
Our processor needs CPI cycles (on average) for each instruction
The clock frequency of the processor is f
the clock period is therefore T=1/f
N x CPI x (1/f) =
N x CPI x T seconds
25
Performance Analysis of
Our Single-Cycle Design
A Single-Cycle Microarchitecture: Analysis
Every instruction takes 1 cycle to execute
CPI (Cycles per instruction) is strictly 1
27
What is the Slowest Instruction to Process?
Let’s go back to the basics
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
250ps Zero
bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
M
memory Instruction [15– 11]
1
x
Write
data
400ps 1
u
x
350ps Data
memory
u
x
0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
250ps Zero
bcond
ALU ALU
[31– 0] 0 Read
Write
Instruction
memory Instruction [15– 11]
M
u
x
register
Write
data 2
M
u
x
result Address
data
550ps
1
M
u
1
600ps data 1 350ps Write
Data
memory 0
x
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
250ps Zero
bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
Write x
1 data 1 350ps 550ps
Write
Data
memory 0
x
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
M M
100ps
PC+4 [31– 28]
200ps u
x
u
x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
350ps ALUSrc
RegWrite
PC
Read
address
Instruction [25– 21] Read
register 1
Read
350ps
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
250ps Zero
bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps ALU
Add result
x
1
x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
200ps ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
Zero
bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
38
What is Really the Slowest Instruction to Process?
Real world: Memory is slow (not magic)
39
Single Cycle uArch: Complexity
Contrived
All instructions run as slow as the slowest instruction
Inefficient
All instructions run as slow as the slowest instruction
Must provide worst-case combinational resources in parallel as required by
any instruction
Need to replicate a resource if it is needed more than once by an
instruction during different parts of the instruction processing cycle
Balanced design
Balance instruction/data flow through hardware components
Design to eliminate bottlenecks: balance the hardware for the
work
41
Single-Cycle Design vs. Design Principles
Critical path design
Balanced design
42
Aside: System Design Principles
When designing computer systems/architectures, it is
important to follow good principles
Actually, this is true for *any* system design
Real architectures, buildings, bridges, train stations, …
Good consumer products
Security & safety-critical systems
Decision making systems
…
43
Aside: From Lecture 2
“architecture […] based upon principle, and not upon
precedent”
44
This
45
That
46
Recall: Takeaways
47
Aside: System Design Principles
We will continue to cover key principles in this course
Here are some references where you can learn more
48
A Key System Design Principle
Keep it simple
50
Multi-Cycle Microarchitectures
51
Multi-Cycle Microarchitectures
Goal: Let each instruction take (close to) only as much time
it really needs
Idea
Determine clock cycle time independently of instruction
processing time
Each instruction takes as many clock cycles as it needs to take
Multiple state transitions per instruction
The states followed by each instruction is different
52
Recall: The “Process Instruction” Step
ISA specifies abstractly what AS’ should be, given an
instruction and AS
It defines an abstract finite state machine where
State = programmer-visible state
Next-state logic = instruction execution specification
From ISA point of view, there are no “intermediate states”
between AS and AS’ during instruction execution
One state transition per instruction
State 2
MDR is loaded with the instruction
State 3
The FSM asserts GateMDR and
LD.IR
State 4
The FSM goes to next state
depending on opcode
State 63
JMP loads register into PC
55
This is an FSM Controlling a Multi-Cycle LC-3 Microarchitecture
Recall: Full State Machine for LC-3b
https://safari.ethz.ch/digitaltechnik/spring2022/lib/exe/fetch.php?media=pp-appendixc.pdf
56
Benefits of Multi-Cycle Design
Critical path design
Can keep reducing the critical path independently of the worst-
case processing time of any instruction
Balanced design
No need to provide more capability or resources than really
needed
An instruction that needs resource X multiple times does not require
multiple X’s to be implemented
Leads to more efficient hardware: Can reuse hardware components
needed multiple times for an instruction
57
Downsides of Multi-Cycle Design
Need to store the intermediate results at the end of each
clock cycle
Hardware overhead for microarchitectural registers
Register setup/hold overhead (i.e., sequencing overhead) is
paid multiple times for an instruction
58
Multi-Cycle LC-3 Data Path
Extra registers
not needed in a
single-cycle
design
Processing
Unit
59
Remember: Performance Analysis
Execution time of a single instruction
{CPI} x {clock cycle time} CPI: Cycles Per Instruction
61
An Elegant Multi-Cycle Processor Design
Maurice Wilkes, “The Best Way to Design an Automatic
Calculating Machine,” Manchester Univ. Computer
Inaugural Conf., 1951.
An elegant implementation:
The concept of microcoded/microprogrammed machines
62
Multi-Cycle Microarchitectures
Key Idea for Realization
63
Recall: The Instruction Processing “Cycle”
FETCH
DECODE
EVALUATE ADDRESS
FETCH OPERANDS
EXECUTE
STORE RESULT
64
A Basic Multi-Cycle Microarchitecture
Instruction processing cycle divided into “states”
A stage in the instruction processing cycle can take multiple states
State 2
MDR is loaded with the instruction
State 3
The FSM asserts GateMDR and
LD.IR
State 4
The FSM goes to next state
depending on opcode
State 63
JMP loads register into PC
66
This is an FSM Controlling a Multi-Cycle LC-3 Microarchitecture
Recall: Full State Machine for LC-3b
https://safari.ethz.ch/digitaltechnik/spring2022/lib/exe/fetch.php?media=pp-appendixc.pdf
67
Recall: Multi-Cycle LC-3 Data Path
Extra registers
not needed in a
single-cycle
design
Processing
Unit
68
Another Example Multi-Cycle
Microarchitecture
69
Remember the Single-Cycle Uarch II (Readings)
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite
CLK CLK
CLK
25:21 WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD
ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+
SignImm
4 15:0
<<2
Sign Extend PCBranch
+
Result
Multi-cycle microarchitecture:
+ higher clock frequency
+ simpler instructions take only a few clock cycles
+ reuse expensive hardware across multiple cycles
-- hardware overhead for storing intermediate results
-- sequential logic overhead paid many times for each instruction
71
What Can We Optimize with Multi-Cycle
Single-cycle microarchitecture uses two memories
One memory stores instructions, the other data
We want to use a single memory (lower cost)
72
Let’s Construct
the Multi-Cycle Datapath
73
Carnegie Mellon
We need to:
Read the instruction from memory
Then read $t1 from register array
Add the immediate value (0x20) to calculate the memory address
Read the content of this address
Write to the register $t0 this content
74
Carnegie Mellon
IRWrite
CLK CLK
CLK CLK
WE WE3
PC' PC Instr A1 RD1
b A
RD
A2 RD2
EN
Instr / Data
Memory A3
Register
WD
File
WD3
IRWrite
I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
76
Carnegie Mellon
IRWrite
SignImm
15:0
Sign Extend
I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
77
Carnegie Mellon
IRWrite ALUControl2:0
ALU
A EN A2 RD2 ALUResult ALUOut
Instr / Data SrcB
Memory A3
Register
WD
File
WD3
SignImm
15:0
Sign Extend
I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
78
Carnegie Mellon
ALU
A EN A2 RD2 ALUResult ALUOut
1
Instr / Data SrcB
Memory CLK A3
Register
WD
Data File
WD3
SignImm
15:0
Sign Extend
I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
79
Carnegie Mellon
ALU
Adr ALUResult ALUOut
A EN A2 RD2
1
Instr / Data SrcB
Memory 20:16
CLK A3
Register
WD
Data File
WD3
SignImm
15:0
Sign Extend
I-Type
op rs rt imm
6 bits 5 bits 5 bits 16 bits
80
Carnegie Mellon
ALU
Adr ALUResult ALUOut
EN A EN A2 RD2 00
1 SrcB
Instr / Data 4 01
Memory 20:16
CLK A3 10
Register
WD 11
Data File
WD3
SignImm
15:0
Sign Extend
81
Carnegie Mellon
Multi-Cycle Datapath: sw
Write data in rt to memory
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00
1
Instr / Data 4 01 SrcB
Memory 20:16
CLK A3 10
Register
WD 11
Data File
WD3
SignImm
15:0
Sign Extend
82
Carnegie Mellon
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
SignImm
15:0
Sign Extend
83
Carnegie Mellon
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1
Instr / Data 20:16
4 01 SrcB
0
Memory 15:11
A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
84
Carnegie Mellon
MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A Zero CLK
25:21
PC' PC Instr A1 RD1 1 0
0 RD
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
86
Carnegie Mellon
Control Unit
Control
MemtoReg
Unit
RegDst
IorD Multiplexer
PCSrc Selects
Main ALUSrcB1:0
Controller
Opcode5:0 (FSM) ALUSrcA
IRWrite
MemWrite
Register
PCWrite
Enables
Branch
RegWrite
ALUOp1:0
ALU
Funct5:0 ALUControl2:0
Decoder
87
Carnegie Mellon
Reset
CLK
PCWrite 1
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
CLK CLK CLK 0
CLK 0 CLK 0
0 SrcA 010
0 WE WE3 A Zero CLK 0
25:21
PC' PC Instr A1 RD1 1 0
0 RD 01
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 1 20:16 4 01 SrcB
1 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
88
Carnegie Mellon
CLK
PCWrite 1
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
CLK CLK CLK 0
CLK 0 CLK 0
0 SrcA 010
0 WE WE3 A Zero CLK 0
25:21
PC' PC Instr A1 RD1 1 0
0 RD 01
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 1 20:16 4 01 SrcB
1 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
89
Carnegie Mellon
CLK
PCWrite 0
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
CLK CLK CLK X
CLK 0 CLK 0
0 SrcA XXX
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 RD XX
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
90
Carnegie Mellon
Op = LW
or
S2: MemAdr Op = SW CLK
PCWrite 0
Branch 0 PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
CLK CLK CLK 1
CLK 0 CLK 0
0 SrcA 010
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 RD 10
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
91
Carnegie Mellon
Op = LW
or
S2: MemAdr Op = SW CLK
PCWrite 0
Branch 0 PCEn
ALUSrcA = 1 IorD Control PCSrc
ALUSrcB = 10 MemWrite Unit ALUControl2:0
ALUOp = 00 IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
CLK CLK CLK 1
CLK 0 CLK 0
0 SrcA 010
X WE WE3 A Zero CLK X
25:21
PC' PC Instr A1 RD1 1 0
0 RD 10
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1 X
Instr / Data 0 20:16 4 01 SrcB
0 0
Memory 15:11 A3 10
CLK 1 X Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
92
Carnegie Mellon
Op = LW
or
S2: MemAdr Op = SW
CLK
PCWrite
Branch PCEn
ALUSrcA = 1 IorD Control PCSrc
MemWrite Unit ALUControl2:0
ALUSrcB = 10 IRWrite ALUSrcB1:0
ALUOp = 00 31:26
Op
ALUSrcA
5:0 RegWrite
Funct
MemtoReg
RegDst
Op = LW CLK
CLK
CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
A1 RD1 1
S3: MemRead PC' PC
0 RD
Instr 00
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
IorD = 1 <<2 27:0
<<2
ImmExt
15:0
Sign Extend
25:0 (Addr)
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
93
Carnegie Mellon
Op = LW
or
S2: MemAdr Op = SW CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
ALUSrcA = 1 IRWrite ALUSrcB1:0
ALUSrcB = 10 31:26
Op
ALUSrcA
5:0 RegWrite
ALUOp = 00 Funct
MemtoReg
RegDst
Op = SW
CLK CLK CLK
Op = LW CLK CLK
0 SrcA
S5: MemWrite WE 25:21
A1
WE3
RD1
A 31:28
1
Zero CLK
PC' PC Instr 00
S3: MemRead 0 RD
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
IorD = 1 Data WD3
IorD = 1 1
MemWrite <<2
<<2
27:0
ImmExt
15:0
Sign Extend
25:0 (Addr)
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
94
Carnegie Mellon
MemtoReg
RegDst
ALUSrcA = 1 ALUSrcA = 1
CLK CLK CLK
ALUSrcB = 10 ALUSrcB = 00 CLK
WE
CLK
WE3
0 SrcA
Zero CLK
25:21 A 31:28
A1 RD1 1
ALUOp = 00 ALUOp = 10 PC' PC
0 RD
Instr 00
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
Op = SW Data
0
WD3
File
1
Op = LW S7: ALU <<2
<<2
27:0
S5: MemWrite
Writeback ImmExt
S3: MemRead 15:0
Sign Extend
25:0 (Addr)
RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
95
Carnegie Mellon
CLK
PCWrite
MemtoReg
RegDst
S0: Fetch S1: Decode
IorD = 0 CLK
CLK
CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
Reset AluSrcA = 0 PC' PC
0 RD
Instr
25:21
A1 RD1 1 00
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
ALUSrcB = 01 ALUSrcA = 0 Instr / Data 20:16
0
4 01 SrcB 10
Memory A3 10
ALUOp = 00 ALUSrcB = 11 WD
CLK
15:11
1
0
Register
File
11
PCJump
Data WD3
PCSrc = 0 ALUOp = 00 1
<<2 27:0
IRWrite <<2
ImmExt
PCWrite 15:0
Sign Extend
25:0 (Addr)
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute
S8: Branch
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01
ALUOp = 00 ALUOp = 10 PCSrc = 1
Branch
Op = SW
Op = LW S7: ALU
S5: MemWrite
Writeback
S3: MemRead
RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
96
Carnegie Mellon
MemtoReg
ALUOp = 00 ALUOp = 10 PCSrc = 1
RegDst
Branch CLK
CLK
CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD
ALU
Adr B
Op = SW EN
1
A
Instr / Data
EN
20:16
20:16
0
A2 RD2
4
00
01 SrcB
ALUResult ALUOut
01
10
Memory 15:11 A3 10
1 PCJump
Op = LW S7: ALU WD
CLK
0
Register
File
11
Writeback <<2
<<2
27:0
RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
97
CLK
Carnegie Mellon
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
S0: Fetch S1: Decode Memory
CLK
15:11
0
1
A3
Register
10
PCJump
IorD = 0 WD
Data
0
WD3
File
11
1
Reset AluSrcA = 0 <<2
<<2
27:0
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01
ALUOp = 00 ALUOp = 10 PCSrc = 1
Branch
Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback
RegDst = 1
IorD = 1
IorD = 1 MemtoReg = 0
MemWrite
RegWrite
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
98
CLK
Carnegie Mellon
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
S0: Fetch S1: Decode Memory
CLK
15:11
0
1
A3
Register
10
PCJump
IorD = 0 WD
Data
0
WD3
File
11
1
Reset AluSrcA = 0 <<2
<<2
27:0
PCSrc = 0 ALUOp = 00
IRWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type
S2: MemAdr Op = SW
S6: Execute S9: ADDI
S8: Branch
Execute
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 1 ALUOp = 00
Branch
Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback
RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
99
Carnegie Mellon
Extended Functionality: j
PCEn
IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB1:0 ALUControl2:0 Branch PCWrite PCSrc1:0
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2
SignImm
15:0
Sign Extend
25:0 (jump)
100
PCEn
IorD MemWrite IRWrite RegDst MemtoReg RegWrite Carnegie Mellon
ALUSrcA ALUSrcB1:0 ALUControl2:0 Branch PCWrite PCSrc1:0
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Control FSM: j
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2
SignImm
15:0
Sign Extend
25:0 (jump)
Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback
RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
101
PCEn
IorD MemWrite IRWrite RegDst MemtoReg RegWrite Carnegie Mellon
ALUSrcA ALUSrcB1:0 ALUControl2:0 Branch PCWrite PCSrc1:0
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Control FSM: j
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2
SignImm
15:0
Sign Extend
25:0 (jump)
Op = SW
Op = LW S7: ALU
S5: MemWrite S10: ADDI
Writeback
S3: MemRead Writeback
RegDst = 1 RegDst = 0
IorD = 1
IorD = 1 MemtoReg = 0 MemtoReg = 0
MemWrite
RegWrite RegWrite
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
102
We Constructed a Multi-Cycle
MIPS Microarchitecture
103
Review: Single-Cycle MIPS Microarchitecture
Jump MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite
CLK CLK
CLK
0 25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0 Result
1 A RD
ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
PCJump 15:11
1
WriteReg4:0
PCPlus4
+
SignImm
4 15:0
<<2
Sign Extend PCBranch
+
27:0 31:28
25:0
<<2
AS’ Sequential AS
Combinational
Logic Logic
(State)
MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2
ImmExt
15:0
Sign Extend
25:0 (Addr)
Op = SW
Op = LW
S5: MemWrite
S7: ALU
Writeback S10: ADDI What does
S3: MemRead Writeback
this design
RegDst = 1 RegDst = 0
IorD = 1
IorD = 1
MemtoReg = 0 MemtoReg = 0 assume
MemWrite
RegWrite RegWrite
about memory?
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
108
Recall: Full State Machine for LC-3b
Memory Access
Full FSM Controlling
a Multi-Cycle LC-3b
Microarchitecture
Memory Access
Memory Access
Memory Access
https://safari.ethz.ch/digitaltechnik/spring2022/lib/exe/fetch.php?media=pp-appendixc.pdf
109
Another Example:
Microprogrammed Multi-Cycle
Microarchitecture
Recall: An Elegant Multi-Cycle Processor Design
Maurice Wilkes, “The Best Way to Design an Automatic
Calculating Machine,” Manchester Univ. Computer
Inaugural Conf., 1951.
An elegant implementation:
The concept of microcoded/microprogrammed machines
111
Microprogrammed Control Terminology
Control signals associated with the current state
Microinstruction
BEN
Example
Control
Microsequencer
Structure
6
Simple Design
of the Control Structure
Control Store
2 6 x 35
35
Microinstruction
9 26
113
(J, COND, IRD)
Example uProgrammed Control & Datapath
For your own study 3
Memory, I/O
P&P Revised Appendix C
On website 16
+ In Backup Slides Data, Data
Inst.
16
R 16 Addr
IR[15:11]
BEN
23
Data Path
Control
35
Control Signals
9 26
115
Example uProgrammed Control & Datapath
For your own study 3
Memory, I/O
P&P Revised Appendix C
On website 16
+ In Backup Slides Data, Data
Inst.
16
R 16 Addr
IR[15:11]
BEN
23
Data Path
Control
35
Control Signals
9 26
BEN
uProgrammed
Control
Microsequencer
Structure
6
Simple Design
of the Control Structure
Control Store
2 6 x 35
35
Microinstruction
9 26
118
(J, COND, IRD)
HF U X
UX
UX
E
Ga ARM
UX
LS .SIZ
1M
2M
Ga DR
A D UX
G a LU
L D AR
L D DR
X
LD N
LD EG
R.W N
RM
Ga C
MU
1
LD C
.BE
MU
DR
DR
.PC
O.E
teM
teM
UK
TA
teA
1M
.IR
.M
.M
teP
teS
HF
.R
.C
nd
IRD
MA
AD
DA
DR
AL
LD
LD
PC
SR
MI
Ga
Co
J
000000 (State 0)
000001 (State 1)
000010 (State 2)
000011 (State 3)
000100 (State 4)
000101 (State 5)
000110 (State 6)
000111 (State 7)
001000 (State 8)
001001 (State 9)
001010 (State 10)
001011 (State 11)
Each entry in
001100 (State 12)
001101 (State 13)
001110 (State 14)
001111 (State 15)
010000 (State 16)
010001 (State 17)
the control store is a
010010 (State 18)
010011 (State 19) microinstruction
Control 010100 (State 20)
010101 (State 21)
010110 (State 22)
010111 (State 23)
011000 (State 24)
corresponding
to the FSM state
011001 (State 25)
33
MDR <! M
R R
35
IR <! MDR
32
1011
RTI
BEN<! IR[11] & N + IR[10] & Z + IR[9] & P To 11
1010
To 8
ADD [IR[15:12]]
BR
To 10
AND
0
1 XOR
DR<! SR1+OP2* JMP
TRAP [BEN] 0
set CC JSR
SHF
LEA STB
LDB LDW STW 1
22
To 18 5
DR<! SR1&OP2*
PC<! PC+LSHF(off9,1)
set CC
9 12
To 18
DR<! SR1 XOR OP2* To 18
PC<! BaseR
set CC
To 18 15 4
To 18
MAR<! LSHF(ZEXT[IR[7:0]],1) [IR[11]]
0 1
28 20
MDR<! M[MAR]
R7<! PC R7<! PC
PC<! BaseR
R R
21
30
PC<! MDR R7<! PC
To 18 PC<! PC+LSHF(off11,1)
13
To 18
DR<! SHF(SR,A,D,amt4)
set CC To 18
14 2 6 7 3
To 18 DR<! PC+LSHF(off9, 1)
set CC MAR<! B+off6 MAR<! B+LSHF(off6,1) MAR<! B+LSHF(off6,1) MAR<! B+off6
To 18
29 25 23 24
NOTES MDR<! M[MAR[15:1]’0] MDR<! M[MAR] MDR<! SR MDR<! SR[7:0]
B+off6 : Base + SEXT[offset6]
PC+off9 : PC + SEXT[offset9] R R R R
27 16 17
*OP2 may be SR2 or SEXT[imm5] 31
DR<! SEXT[BYTE.DATA] DR<! MDR
** [15:8] or [7:0] depending on M[MAR]<! MDR M[MAR]<! MDR**
set CC set CC
MAR[0]
R R R R 120
To 18 To 18 To 18 To 19
A Simple Datapath
Can Become
Very Powerful
121
COND1 COND0
0,0,IR[15:12]
6
IRD
State 18 (010010)
State 33 (100001)
State 35 (100011)
State 32 (100000)
State 6 (000110)
State 25 (011001)
State 27 (011011)
The Power of Abstraction
The concept of a control store of microinstructions enables
the hardware designer with a new abstraction:
microprogramming
HLL
Small Semantic Gap
X86-64 ISA with
Complex Inst
& Data Types Software or Hardware Translator
& Addressing Modes
124
How to Change the Semantic Gap Tradeoffs
Translate from one ISA into a different “implementation” ISA
HLL
Small Semantic Gap
ISA ISA with
Complex Inst
& Data Types Hardware Translator
& Addressing Modes (Microsequencer)
125
Advantages of Microprogrammed Control
Allows a very simple design to do powerful computation by
controlling the datapath (using a sequencer)
High-level ISA translated into microcode (sequence of u-instructions)
Microcode (u-code) enables a minimal datapath to emulate an ISA
Microinstructions can be thought of as a user-invisible ISA (u-ISA)
Historical Examples
IBM 370 Model 145: microcode stored in main memory, can be
updated after a reboot
IBM System z: Similar to 370/145.
Heller and Farrell, “Millicode in an IBM zSeries processor,” IBM
JR&D, May/Jul 2004.
B1700 microcode can be updated while the processor is running
User-microprogrammable machine!
Wilner, “Microprogramming environment on the Burroughs B1700”, CompCon 1972.
Systems today use microcode patches to fix HW bugs/issues
127
For More on Microprogrammed Designs
https://www.youtube.com/onurmutlulectures 128
Detailed Lectures on Microprogramming
https://www.youtube.com/watch?v=u4GhShuBP3Y&list=PL5Q2soXY2Zi_QedyPWtR
mFUJ2F8DdYP7l&index=13
https://www.youtube.com/watch?v=_igvSl5h8cs&list=PL5PHm2jkkXmidJOd59REog
9jDnPDTG6IJ&index=7
https://www.youtube.com/onurmutlulectures 129
Digital Design & Computer Arch.
Lecture 11: Multi-Cycle
Microarchitecture Design
ETH Zürich
Spring 2023
30 March 2023
Backup Slides
131
A Bit More on
Performance Analysis
Carnegie Mellon
133
Carnegie Mellon
134
Carnegie Mellon
135
Carnegie Mellon
MemtoReg
Control
MemWrite
Unit
Branch 0 0
ALUControl 2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite
CLK CLK
CLK 1 0
010 1
25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD
ALU
1 ALUResult ReadData
1 A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
0
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+
SignImm
4 15:0 <<2
Sign Extend PCBranch
+ Result
138
Single-Cycle Performance
Single-cycle critical path:
Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU +
tmem + tmux + tRFsetup
In most implementations, limiting paths are:
memory, ALU, register file.
Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup
MemtoReg
Control
MemWrite
Unit
Branch 0 0
ALUControl 2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite
CLK CLK
CLK 1 0
010 1
25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD
ALU
1 ALUResult ReadData
1 A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
0
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+
SignImm
4 15:0 <<2
Sign Extend PCBranch
+
Result 139
Single-Cycle Performance Example
Tc =
140
Single-Cycle Performance Example
142
Single-Cycle Performance Example
Example:
For a program with 100 billion instructions executing on a
single-cycle MIPS processor:
143
Multi-Cycle Performance: CPI
Instructions take different number of cycles:
3 cycles: beq, j
4 cycles: R-Type, sw, addi
5 cycles: lw Realistic?
CPI is weighted average, e.g. SPECINT2000 benchmark:
25% loads
10% stores
11% branches
2% jumps
52% R-type
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
16
Multi-Cycle Performance: Cycle Time
Multi-cycle critical path:
Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 1
1
Instr / Data 20:16 4 01 SrcB
0
Memory 15:11 A3 10
CLK 1 Register
WD 11
0 File
Data WD3
1
<<2
SignImm
15:0
Sign Extend
17
Multi-Cycle Performance Example
Tc =
18
Multi-Cycle Performance Example
Jump MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite
CLK CLK
CLK
0 25:21
WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0 Result
1 A RD
ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
PCJump 15:11
1
WriteReg4:0
PCPlus4
+
SignImm
4 15:0
<<2
Sign Extend PCBranch
+
27:0 31:28
25:0
<<2
150
Review: Single-Cycle MIPS FSM
Single-cycle machine
AS’ Sequential AS
Combinational
Logic Logic
(State)
MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2
ImmExt
15:0
Sign Extend
25:0 (Addr)
152
Review: Multi-Cycle MIPS FSM
S0: Fetch S1: Decode
IorD = 0
Reset AluSrcA = 0 S11: Jump
ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUSrcB = 11 Op = J
PCSrc = 00 ALUOp = 00 PCSrc = 10
IRWrite PCWrite
PCWrite
Op = ADDI
Op = BEQ
Op = LW
or Op = R-type What is the
S2: MemAdr Op = SW
S6: Execute
S8: Branch
S9: ADDI shortcoming of
Execute
ALUSrcA = 1 ALUSrcA = 1
ALUSrcA = 1
ALUSrcB = 00 ALUSrcA = 1
this design?
ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10
ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00
Branch
Op = SW
Op = LW
S5: MemWrite
S7: ALU
Writeback S10: ADDI What does
S3: MemRead Writeback
this design
RegDst = 1 RegDst = 0
IorD = 1
IorD = 1
MemtoReg = 0 MemtoReg = 0 assume
MemWrite
RegWrite RegWrite
about memory?
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
153
What If Memory Takes > One Cycle?
Stay in the same “memory access” state until memory
returns the data
“Memory Ready?” bit is an input to the control logic that
determines the next state
154
Backup Slides on
Microprogrammed Multi-Cycle
Microarchitectures
These Slides Are Covered in A Past Lecture
https://www.youtube.com/onurmutlulectures 156
Lectures on Microprogrammed Designs
https://www.youtube.com/watch?v=u4GhShuBP3Y&list=PL5Q2soXY2Zi_QedyPWtR
mFUJ2F8DdYP7l&index=13
https://www.youtube.com/watch?v=_igvSl5h8cs&list=PL5PHm2jkkXmidJOd59REog
9jDnPDTG6IJ&index=7
https://www.youtube.com/onurmutlulectures 157
Another Example:
Microprogrammed Multi-Cycle
Microarchitecture
An Elegant Multi-Cycle Processor Design
Maurice Wilkes, “The Best Way to Design an Automatic
Calculating Machine,” Manchester Univ. Computer
Inaugural Conf., 1951.
An elegant implementation:
The concept of microcoded/microprogrammed machines
159
Recall: A Basic Multi-Cycle Microarchitecture
Instruction processing cycle divided into “states”
A stage in the instruction processing cycle can take multiple
states
160
Microprogrammed Control Terminology
Control signals associated with the current state
Microinstruction
BEN
Example
Control
Microsequencer
Structure
6
Simple Design
of the Control Structure
Control Store
2 6 x 35
35
Microinstruction
9 26
162
(J, COND, IRD)
What Happens In A Clock Cycle?
The control signals (microinstruction) for the current state
control two things:
Processing in the data path
Generation of control signals (microinstruction) for the next
cycle
See Supplemental Figure 1 (next-next slide)
163
Example uProgrammed Control & Datapath
Read P&P Revised Appendix C 3
Memory, I/O
On website
16
Data, Data
Inst.
16
R 16 Addr
IR[15:11]
BEN
23
Data Path
Control
35
Control Signals
9 26
165
A Bad Clock Cycle!
166
A Simple LC-3b Control and Datapath
Read P&P Revised Appendix C 3
Memory, I/O
On website
16
Data, Data
Inst.
16
R 16 Addr
IR[15:11]
BEN
23
Data Path
Control
35
Control Signals
9 26
16
Data, Data
Inst.
16
R 16 Addr
IR[15:11]
BEN
23
Data Path
Control
35
Control Signals
9 26
170
An LC-3b State Machine
Patt and Patel, Revised Appendix C, Figure C.2
Examples
State 18,19 correspond to the beginning of the instruction
processing cycle
Fetch phase: state 18, 19 state 33 state 35
Decode phase: state 32
171
18, 19
MAR <! PC
PC <! PC + 2
33
MDR <! M
R R
35
IR <! MDR
32
1011
RTI
BEN<! IR[11] & N + IR[10] & Z + IR[9] & P To 11
1010
To 8
ADD [IR[15:12]]
BR
To 10
AND
0
1 XOR
DR<! SR1+OP2* JMP
TRAP [BEN] 0
set CC JSR
SHF
LEA STB
LDB LDW STW 1
22
To 18 5
DR<! SR1&OP2*
PC<! PC+LSHF(off9,1)
set CC
9 12
To 18
DR<! SR1 XOR OP2* To 18
PC<! BaseR
set CC
To 18 15 4
To 18
MAR<! LSHF(ZEXT[IR[7:0]],1) [IR[11]]
0 1
28 20
MDR<! M[MAR]
R7<! PC R7<! PC
PC<! BaseR
R R
21
30
PC<! MDR R7<! PC
To 18 PC<! PC+LSHF(off11,1)
13
To 18
DR<! SHF(SR,A,D,amt4)
set CC To 18
14 2 6 7 3
To 18 DR<! PC+LSHF(off9, 1)
set CC MAR<! B+off6 MAR<! B+LSHF(off6,1) MAR<! B+LSHF(off6,1) MAR<! B+off6
To 18
29 25 23 24
NOTES MDR<! M[MAR[15:1]’0] MDR<! M[MAR] MDR<! SR MDR<! SR[7:0]
B+off6 : Base + SEXT[offset6]
PC+off9 : PC + SEXT[offset9] R R R R
27 16 17
*OP2 may be SR2 or SEXT[imm5] 31
DR<! SEXT[BYTE.DATA] DR<! MDR
** [15:8] or [7:0] depending on M[MAR]<! MDR M[MAR]<! MDR**
set CC set CC
MAR[0]
R R R R 172
To 18 To 18 To 18 To 19
The FSM Implements the LC-3b ISA
P&P Appendix A
(revised):
https://safari.ethz.ch/digi
taltechnik/spring2018/lib/
exe/fetch.php?media=pp
-appendixa.pdf
173
LC-3b State Machine: Some Questions
How many cycles does the fastest instruction take?
174
LC-3b Datapath
Patt and Patel, Revised Appendix C, Figure C.3
175
176
IR[11:9] IR[11:9]
DR SR1
111 IR[8:6]
DRMUX SR1MUX
(a) (b)
IR[11:9]
N Logic BEN
Z
P
(c)
177
178
LC-3b Datapath: Some Questions
How does instruction fetch happen in this datapath
according to the state machine?
179
LC-3b Microprogrammed Control Structure
Patt and Patel, Appendix C, Figure C.4
Three components:
Microinstruction, control store, microsequencer
BEN
Microsequencer
6
Simple Design
of the Control Structure
Control Store
2 6 x 35
35
Microinstruction
9 26
181
(J, COND, IRD)
COND1 COND0
BEN R IR[11]
0,0,IR[15:12]
6
IRD
182
Address of Next State
HF U X
UX
UX
E
Ga ARM
UX
LS .SIZ
1M
2M
Ga DR
A D UX
G a LU
L D AR
L D DR
X
LD N
LD EG
R.W N
RM
Ga C
MU
1
LD C
.BE
MU
DR
DR
.PC
O.E
teM
teM
UK
TA
teA
1M
.IR
.M
.M
teP
teS
HF
.R
.C
nd
IRD
MA
AD
DA
DR
AL
LD
LD
PC
SR
MI
Ga
Co
J
000000 (State 0)
000001 (State 1)
000010 (State 2)
000011 (State 3)
000100 (State 4)
000101 (State 5)
000110 (State 6)
000111 (State 7)
001000 (State 8)
001001 (State 9)
001010 (State 10)
001011 (State 11)
001100 (State 12)
001101 (State 13)
001110 (State 14)
001111 (State 15)
010000 (State 16)
010001 (State 17)
010010 (State 18)
010011 (State 19)
010100 (State 20)
010101 (State 21)
010110 (State 22)
010111 (State 23)
011000 (State 24)
011001 (State 25)
011010 (State 26)
011011 (State 27)
011100 (State 28)
011101 (State 29)
011110 (State 30)
011111 (State 31)
100000 (State 32)
100001 (State 33)
100010 (State 34)
100011 (State 35)
100100 (State 36)
100101 (State 37)
100110 (State 38)
100111 (State 39)
101000 (State 40)
101001 (State 41)
101010 (State 42)
101011 (State 43)
101100 (State 44)
101101 (State 45)
101110 (State 46)
101111 (State 47)
110000 (State 48)
110001 (State 49)
110010 (State 50)
110011 (State 51)
110100 (State 52)
110101 (State 53)
110110 (State 54)
110111 (State 55)
111000 (State 56)
111001 (State 57)
111010 (State 58)
111011 (State 59)
111100 (State 60)
111101 (State 61)
111110 (State 62) 183
111111 (State 63)
LC-3b Microsequencer
Patt and Patel, Appendix C, Figure C.5
184
COND1 COND0
BEN R IR[11]
0,0,IR[15:12]
6
IRD
185
Address of Next State
The Microsequencer: Some Questions
When is the IRD signal asserted?
186
An Exercise in
Microprogramming
Handouts
7 pages of Microprogrammed LC-3b design
https://safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetc
h.php?media=lc3b-figures.pdf
188
A Simple LC-3b Control and Datapath
Memory, I/O 3
16
Data, Data
Inst.
16
R 16 Addr
IR[15:11]
BEN
23
Data Path
Control
35
Control Signals
9 26
33
MDR <! M
R R
35
IR <! MDR
32
1011
RTI
BEN<! IR[11] & N + IR[10] & Z + IR[9] & P To 11
1010
To 8
ADD [IR[15:12]]
BR
To 10
AND
0
1 XOR
DR<! SR1+OP2* JMP
TRAP [BEN] 0
set CC JSR
SHF
LEA STB
LDB LDW STW 1
22
To 18 5
DR<! SR1&OP2*
PC<! PC+LSHF(off9,1)
set CC
9 12
To 18
DR<! SR1 XOR OP2* To 18
PC<! BaseR
set CC
To 18 15 4
To 18
MAR<! LSHF(ZEXT[IR[7:0]],1) [IR[11]]
0 1
28 20
MDR<! M[MAR]
R7<! PC R7<! PC
PC<! BaseR
R R
21
30
PC<! MDR R7<! PC
To 18 PC<! PC+LSHF(off11,1)
13
To 18
DR<! SHF(SR,A,D,amt4)
set CC To 18
14 2 6 7 3
To 18 DR<! PC+LSHF(off9, 1)
set CC MAR<! B+off6 MAR<! B+LSHF(off6,1) MAR<! B+LSHF(off6,1) MAR<! B+off6
To 18
29 25 23 24
NOTES MDR<! M[MAR[15:1]’0] MDR<! M[MAR] MDR<! SR MDR<! SR[7:0]
B+off6 : Base + SEXT[offset6]
PC+off9 : PC + SEXT[offset9] R R R R
27 16 17
*OP2 may be SR2 or SEXT[imm5] 31
DR<! SEXT[BYTE.DATA] DR<! MDR
** [15:8] or [7:0] depending on M[MAR]<! MDR M[MAR]<! MDR**
set CC set CC
MAR[0]
R R R R 190
To 18 To 18 To 18 To 19
A Simple Datapath
Can Become
Very Powerful
191
COND1 COND0
0,0,IR[15:12]
6
IRD
State 18 (010010)
State 33 (100001)
State 35 (100011)
State 32 (100000)
State 6 (000110)
State 25 (011001)
State 27 (011011)
IR[11:9] IR[11:9]
DR SR1
111 IR[8:6]
DRMUX SR1MUX
(a) (b)
IR[11:9]
N Logic BEN
Z
P
(c)
193
194
R
IR[15:11]
BEN
Microsequencer
6
Simple Design
of the Control Structure
Control Store
2 6 x 35
35
Microinstruction
9 26
195
(J, COND, IRD)
COND1 COND0
BEN R IR[11]
0,0,IR[15:12]
6
IRD
196
Address of Next State
HF U X
UX
UX
E
Ga ARM
UX
LS .SIZ
1M
2M
Ga DR
A D UX
G a LU
L D AR
L D DR
X
LD N
LD EG
R.W N
RM
Ga C
MU
1
LD C
.BE
MU
DR
DR
.PC
O.E
teM
teM
UK
TA
teA
1M
.IR
.M
.M
teP
teS
HF
.R
.C
nd
IRD
MA
AD
DA
DR
AL
LD
LD
PC
SR
MI
Ga
Co
J
000000 (State 0)
000001 (State 1)
000010 (State 2)
000011 (State 3)
000100 (State 4)
000101 (State 5)
000110 (State 6)
000111 (State 7)
001000 (State 8)
001001 (State 9)
001010 (State 10)
001011 (State 11)
001100 (State 12)
001101 (State 13)
001110 (State 14)
001111 (State 15)
010000 (State 16)
010001 (State 17)
010010 (State 18)
010011 (State 19)
010100 (State 20)
010101 (State 21)
010110 (State 22)
010111 (State 23)
011000 (State 24)
011001 (State 25)
011010 (State 26)
011011 (State 27)
011100 (State 28)
011101 (State 29)
011110 (State 30)
011111 (State 31)
100000 (State 32)
100001 (State 33)
100010 (State 34)
100011 (State 35)
100100 (State 36)
100101 (State 37)
100110 (State 38)
100111 (State 39)
101000 (State 40)
101001 (State 41)
101010 (State 42)
101011 (State 43)
101100 (State 44)
101101 (State 45)
101110 (State 46)
101111 (State 47)
110000 (State 48)
110001 (State 49)
110010 (State 50)
110011 (State 51)
110100 (State 52)
110101 (State 53)
110110 (State 54)
110111 (State 55)
111000 (State 56)
111001 (State 57)
111010 (State 58)
111011 (State 59)
111100 (State 60)
111101 (State 61)
111110 (State 62) 197
111111 (State 63)
End of the Exercise in
Microprogramming
Variable-Latency Memory
The ready signal (R) enables memory read/write to execute
correctly
Example: transition from state 33 to state 35 is controlled by
the R bit asserted by memory when memory data is available
199
The Microsequencer: Advanced Questions
What happens if the machine is interrupted?
200
The Power of Abstraction
The concept of a control store of microinstructions enables
the hardware designer with a new abstraction:
microprogramming
202
x86 REP MOVS (String Copy) Instruction
REP MOVS (DEST SRC)
LC-3b has byte load and byte store instructions that move
data not aligned at the word-address boundary
Convenience to the programmer/compiler
204
Aside: Memory Mapped I/O
Address control logic determines whether the specified
address of LDW and STW are to memory or I/O devices
205
Advantages of Microprogrammed Control
Allows a very simple design to do powerful computation by
controlling the datapath (using a sequencer)
High-level ISA translated into microcode (sequence of u-instructions)
Microcode (u-code) enables a minimal datapath to emulate an ISA
Microinstructions can be thought of as a user-invisible ISA (u-ISA)
Examples
IBM 370 Model 145: microcode stored in main memory, can be
updated after a reboot
IBM System z: Similar to 370/145.
Heller and Farrell, “Millicode in an IBM zSeries processor,” IBM
JR&D, May/Jul 2004.
B1700 microcode can be updated while the processor is running
User-microprogrammable machine!
Wilner, “Microprogramming environment on the Burroughs B1700”, CompCon 1972.
207
Multi-Cycle vs. Single-Cycle uArch
Advantages
Disadvantages
208
Segue into Pipelining
209
Review: Single-Cycle MIPS Processor (I)
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
CLK CLK
CLK
25:21 WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD
ALU
1 ALUResult ReadData
A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+
SignImm
4 15:0
<<2
Sign Extend PCBranch
+
Result
AS’ Sequential AS
Combinational
Logic Logic
(State)
213
Review: Multi-Cycle MIPS Processor
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2
ImmExt
15:0
Sign Extend
25:0 (Addr)
Op = SW
Op = LW
S5: MemWrite
S7: ALU
Writeback S10: ADDI What does
S3: MemRead Writeback
this design
RegDst = 1 RegDst = 0
IorD = 1
IorD = 1
MemtoReg = 0 MemtoReg = 0 assume
MemWrite
RegWrite RegWrite
about memory?
S4: Mem
Writeback
RegDst = 0
MemtoReg = 1
RegWrite
215
Can We Do Better?
216
Can We Do Better?
What limitations do you see with the multi-cycle design?
Limited concurrency
Some hardware resources are idle during different phases of
instruction processing cycle
“Fetch” logic is idle when an instruction is being “decoded” or
“executed”
Most of the datapath is idle when a memory access is
happening
217
Can We Use the Idle Hardware to Improve Concurrency?
219
Can Have Different Instructions in Different Stages
CLK
PCWrite
Branch PCEn
IorD Control PCSrc
MemWrite Unit ALUControl2:0
IRWrite ALUSrcB1:0
31:26 ALUSrcA
Op
5:0 RegWrite
Funct
MemtoReg
RegDst
CLK CLK CLK
CLK CLK
0 SrcA
WE WE3 A 31:28 Zero CLK
25:21
PC' PC Instr A1 RD1 1 00
0 RD
ALU
Adr 20:16 B ALUResult ALUOut
EN A EN A2 RD2 00 01
1
Instr / Data 20:16 4 01 SrcB 10
0
Memory 15:11 A3 10
CLK 1 Register PCJump
WD 11
0 File
Data WD3
1
<<2 27:0
<<2
ImmExt
15:0
Sign Extend
25:0 (Addr)
221
Pipelining: Basic Idea
More systematically:
Pipeline the execution of multiple instructions
Analogy: “Assembly line processing” of instructions
Idea:
Divide the instruction processing cycle into distinct “stages” of
processing
Ensure there are enough hardware resources to process one
instruction in each stage
Process a different instruction in each stage
Instructions consecutive in program order are processed in
consecutive stages