0% found this document useful (0 votes)
14 views80 pages

onur-447-spring15-lecture5-uarch-afterlecture

This lecture introduces microarchitecture concepts, focusing on single-cycle microarchitectures and comparing them with multi-cycle architectures. Key topics include instruction processing, the von Neumann model, and the differences in control and data handling between single-cycle and multi-cycle machines. The lecture also discusses the implications of design choices on performance, particularly in relation to the MIPS instruction set architecture.

Uploaded by

Wanni Xie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views80 pages

onur-447-spring15-lecture5-uarch-afterlecture

This lecture introduces microarchitecture concepts, focusing on single-cycle microarchitectures and comparing them with multi-cycle architectures. Key topics include instruction processing, the von Neumann model, and the differences in control and data handling between single-cycle and multi-cycle machines. The lecture also discusses the implications of design choices on performance, particularly in relation to the MIPS instruction set architecture.

Uploaded by

Wanni Xie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

18-447

Computer Architecture
Lecture 5: Intro to Microarchitecture:
Single-Cycle

Prof. Onur Mutlu


Carnegie Mellon University
Spring 2015, 1/26/2015
Agenda for Today & Next Few Lectures
 Start Microarchitecture

 Single-cycle Microarchitectures

 Multi-cycle Microarchitectures

 Microprogrammed Microarchitectures

 Pipelining

 Issues in Pipelining: Control & Data Dependence Handling,


State Maintenance and Recovery, …
2
Recap of Two Weeks and Last Lecture
 Computer Architecture Today and Basics (Lectures 1 & 2)
 Fundamental Concepts (Lecture 3)
 ISA basics and tradeoffs (Lectures 3 & 4)

 Last Lecture: ISA tradeoffs continued + MIPS ISA


 Instruction length
 Uniform vs. non-uniform decode
 Number of registers
 Addressing modes
 Aligned vs. unaligned access
 RISC vs. CISC properties
 MIPS ISA Overview

3
Assignment for You
 Not to be turned in

 As you learn the MIPS ISA, think about what tradeoffs the
designers have made
 in terms of the ISA properties we talked about

 And, think about the pros and cons of design choices


 In comparison to ARM, Alpha
 In comparison to x86, VAX

 And, think about the potential mistakes


 Branch delay slot?
 Load delay slot? Look Backward
 No FP, no multiply, MIPS (initial)
4
Food for Thought for You
 How would you design a new ISA?

 Where would you place it?


 What design choices would you make in terms of ISA
properties?

 What would be the first question you ask in this process?


 “What is my design point?”

Look Forward & Up


5
Review: Other Example ISA-level Tradeoffs
 Condition codes vs. not
 VLIW vs. single instruction
 SIMD (single instruction multiple data) vs. SISD
 Precise vs. imprecise exceptions
 Virtual memory vs. not
 Unaligned access vs. not
 Hardware interlocks vs. software-guaranteed interlocking
 Software vs. hardware managed page fault handling
 Cache coherence (hardware vs. software)
 …

Think Programmer vs. (Micro)architect


6
Review: A Note on RISC vs. CISC
 Usually, …

 RISC
 Simple instructions
 Fixed length
 Uniform decode
 Few addressing modes

 CISC
 Complex instructions
 Variable length
 Non-uniform decode
 Many addressing modes
7
Now That We Have an ISA
 How do we implement it?

 i.e., how do we design a system that obeys the


hardware/software interface?

 Aside: “System” can be solely hardware or a combination of


hardware and software
 Remember “Translation of ISAs”
 A virtual ISA can be converted by “software” into an
implementation ISA

 We will assume “hardware” for most lectures

8
Implementing the ISA:
Microarchitecture Basics

9
How Does a Machine Process Instructions?
 What does processing an instruction mean?
 Remember the von Neumann model

AS = Architectural (programmer visible) state before an


instruction is processed

Process instruction

AS’ = Architectural (programmer visible) state after an


instruction is processed

 Processing an instruction: Transforming AS to AS’ according


to the ISA specification of the instruction
10
The “Process instruction” Step
 ISA specifies abstractly what AS’ should be, given an
instruction and AS
 It defines an abstract finite state machine where
 State = programmer-visible state
 Next-state logic = instruction execution specification
 From ISA point of view, there are no “intermediate states”
between AS and AS’ during instruction execution
 One state transition per instruction

 Microarchitecture implements how AS is transformed to AS’


 There are many choices in implementation
 We can have programmer-invisible state to optimize the speed of
instruction execution: multiple state transitions per instruction
 Choice 1: AS  AS’ (transform AS to AS’ in a single clock cycle)
 Choice 2: AS  AS+MS1  AS+MS2  AS+MS3  AS’ (take multiple
clock cycles to transform AS to AS’)
11
A Very Basic Instruction Processing Engine
 Each instruction takes a single clock cycle to execute
 Only combinational logic is used to implement instruction
execution
 No intermediate, programmer-invisible state updates

AS = Architectural (programmer visible) state


at the beginning of a clock cycle

Process instruction in one clock cycle

AS’ = Architectural (programmer visible) state


at the end of a clock cycle

12
A Very Basic Instruction Processing Engine
 Single-cycle machine

AS’ Sequential AS
Combinational
Logic Logic
(State)

 What is the clock cycle time determined by?


 What is the critical path of the combinational logic
determined by?
13
Remember: Programmer Visible (Architectural) State

M[0]
M[1]
M[2]
M[3] Registers
M[4] - given special names in the ISA
(as opposed to addresses)
- general vs. special purpose

M[N-1]
Memory Program Counter
array of storage locations memory address
indexed by an address of the current instruction

Instructions (and programs) specify how to transform


the values of programmer visible state
14
Single-cycle vs. Multi-cycle Machines
 Single-cycle machines
 Each instruction takes a single clock cycle
 All state updates made at the end of an instruction’s execution
 Big disadvantage: The slowest instruction determines cycle time 
long clock cycle time

 Multi-cycle machines
 Instruction processing broken into multiple cycles/stages
 State updates can be made during an instruction’s execution
 Architectural state updates made only at the end of an instruction’s
execution
 Advantage over single-cycle: The slowest “stage” determines cycle time

 Both single-cycle and multi-cycle machines literally follow the


von Neumann model at the microarchitecture level

15
Instruction Processing “Cycle”
 Instructions are processed under the direction of a “control
unit” step by step.
 Instruction cycle: Sequence of steps to process an instruction
 Fundamentally, there are six phases:

 Fetch
 Decode
 Evaluate Address
 Fetch Operands
 Execute
 Store Result

 Not all instructions require all six stages (see P&P Ch. 4)
16
Instruction Processing “Cycle” vs. Machine Clock Cycle

 Single-cycle machine:
 All six phases of the instruction processing cycle take a single
machine clock cycle to complete

 Multi-cycle machine:
 All six phases of the instruction processing cycle can take
multiple machine clock cycles to complete
 In fact, each phase can take multiple clock cycles to complete

17
Instruction Processing Viewed Another Way
 Instructions transform Data (AS) to Data’ (AS’)
 This transformation is done by functional units
 Units that “operate” on data
 These units need to be told what to do to the data

 An instruction processing engine consists of two components


 Datapath: Consists of hardware elements that deal with and
transform data signals
 functional units that operate on data
 hardware structures (e.g. wires and muxes) that enable the flow of
data into the functional units and registers
 storage units that store data (e.g., registers)
 Control logic: Consists of hardware elements that determine
control signals, i.e., signals that specify what the datapath
elements should do to the data
18
Single-cycle vs. Multi-cycle: Control & Data
 Single-cycle machine:
 Control signals are generated in the same clock cycle as the
one during which data signals are operated on
 Everything related to an instruction happens in one clock cycle
(serialized processing)

 Multi-cycle machine:
 Control signals needed in the next cycle can be generated in
the current cycle
 Latency of control processing can be overlapped with latency
of datapath operation (more parallelism)

 We will see the difference clearly in microprogrammed


multi-cycle microarchitectures
19
Many Ways of Datapath and Control Design
 There are many ways of designing the data path and
control logic

 Single-cycle, multi-cycle, pipelined datapath and control


 Single-bus vs. multi-bus datapaths
 See your homework 2 question
 Hardwired/combinational vs. microcoded/microprogrammed
control
 Control signals generated by combinational logic versus
 Control signals stored in a memory structure

 Control signals and structure depend on the datapath


design
20
Flash-Forward: Performance Analysis
 Execution time of an instruction
 {CPI} x {clock cycle time}
 Execution time of a program
 Sum over all instructions [{CPI} x {clock cycle time}]
 {# of instructions} x {Average CPI} x {clock cycle time}

 Single cycle microarchitecture performance


 CPI = 1
 Clock cycle time = long
 Multi-cycle microarchitecture performance
 CPI = different for each instruction Now, we have
 Average CPI  hopefully small two degrees of freedom
to optimize independently
 Clock cycle time = short
21
A Single-Cycle Microarchitecture
A Closer Look

22
Remember…
 Single-cycle machine

AS’ Sequential AS
Combinational
Logic Logic
(State)

23
Let’s Start with the State Elements
 Data and control inputs 5 Read 3
register 1
Read
on Register 5 data 1
Read
numbers register 2
Registers Data AL
PC 5 Write
Instruction Add Sum register
Read
ion Write data 2
ory Data data

RegWrite

uction memory b. Program counter c. Adder


a. Registers b
MemWrite

Instruction
address
Address Read
PC data 16
Sign
Instruction Add Sum
extend
Write Data
Instruction
data memory
memory

MemRead
a. Instruction memory b. Program counter c. Adder

a. Data memory unit 24


b. Sign-exten
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
For Now, We Will Assume
 “Magic” memory and register file

 Combinational read
 output of the read data port is a combinational function of the
register file contents and the corresponding read select port

 Synchronous write
 the selected register is updated on the positive edge clock
transition when write enable is asserted
 Cannot affect read output in between clock edges

 Single-cycle, synchronous memory


 Contrast this with memory that tells when the data is ready
 i.e., Ready bit: indicating the read or write is done
25
Instruction Processing
 5 generic steps (P&H book)
 Instruction fetch (IF)
 Instruction decode and register operand fetch (ID/RF)
 Execute/Evaluate memory address (EX/AG)
 Memory operand fetch (MEM)
 Store/writeback result (WB)

WB
IF Data

Register #
PC Address Instruction Registers ALU Address
Register #
Instruction
memory ID/RF Data
Register # EX/AG memory

Data
MEM
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 26
What Is To Come: The Full MIPS Datapath
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. 27


JAL, JR, JALR omitted
ALL RIGHTS RESERVED.]
Single-Cycle Datapath for
Arithmetic and Logical Instructions

28
R-Type ALU Instructions
 Assembly (e.g., register-register signed addition)
ADD rdreg rsreg rtreg

 Machine encoding

0 rs rt rd 0 ADD R-type
6-bit 5-bit 5-bit 5-bit 5-bit 6-bit

 Semantics

if MEM[PC] == ADD rd rs rt
GPR[rd]  GPR[rs] + GPR[rt]
PC  PC + 4

29
ALU Datapath

Add

4
ALU operation
25:21 Read 3
Read register 1
PC address Read
20:16 Read data 1
register 2 Zero
Instruction
Instruction Registers ALU ALU
15:11 Write result
Instruction register
Read
memory data 2
Write
data

RegWrite
1

IF ID EX MEM WB
if MEM[PC] == ADD rd rs rt
GPR[rd]  GPR[rs] + GPR[rt]
Combinational
PCfrom
**Based on original figure [P&HPC + 4 2004 Elsevier. ALL RIGHTS RESERVED.]
CO&D, COPYRIGHT
state update logic
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 30
I-Type ALU Instructions
 Assembly (e.g., register-immediate signed additions)
ADDI rtreg rsreg immediate16

 Machine encoding

ADDI rs rt immediate I-type


6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC] == ADDI rt rs immediate
GPR[rt]  GPR[rs] + sign-extend (immediate)
PC  PC + 4

31
Datapath for R and I-Type ALU Insts.

Add

4
3 ALU operation
Read
Read 25:21
PC register 1 Mem
address Read
data 1
Read
20:16 Zero
Instruction register 2
Instruction Registers ALU ALU
15:11
Write result Address
Instruction register
Read
memory data 2
Write Data
RegDest data
memo
Write
isItype RegWrite
ALUSrc data
116 32
Sign isItype Mem
extend

IF ID EX MEM WB
if MEM[PC] == ADDI rt rs immediate
GPR[rt]  GPR[rs] + sign-extend (immediate)
Combinational
PC  PC + 4 state update logic 32
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Single-Cycle Datapath for
Data Movement Instructions

33
Load Instructions
 Assembly (e.g., load 4-byte word)
LW rtreg offset16 (basereg)

 Machine encoding
LW base rt offset I-type
6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC]==LW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
GPR[rt]  MEM[ translate(EA) ]
PC  PC + 4

34
LW Datapath

Add
0
4 add
ALU operation MemWrite
Read 3
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero Address Read
Instruction Registers ALU ALU data 16
Write Read
result Address
Instruction register data
Read Data
memory data 2 Write
Write data Data memory
data
memory
RegDest RegWrite Write
data
isItype 116
ALUSrc
MemRead
Sign
32
isItype MemRead
extend
1
a. Data memory unit b. Si

if MEM[PC]==LW rt offset16 (base) IF ID EX MEM WB


EA = sign-extend(offset) + GPR[base]
GPR[rt]  MEM[ translate(EA) ]
Combinational
PC  PC + 4 state update logic 35
Store Instructions
 Assembly (e.g., store 4-byte word)
SW rtreg offset16 (basereg)

 Machine encoding

SW base rt offset I-type


6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC]==SW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
MEM[ translate(EA) ]  GPR[rt]
PC  PC + 4

36
SW Datapath

Add
1
4 add
ALU operation MemWrite
Read 3
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero Address Read
Instruction Registers ALU ALU data 16
Write Read
result Address
Instruction register data
Read Data
memory data 2 Write
Write data Data memory
data
memory
RegDest RegWrite Write
data
isItype 016 ALUSrc MemRead
Sign
32
isItype MemRead
extend
0
a. Data memory unit b. Si

if MEM[PC]==SW rt offset16 (base) IF ID EX MEM WB


EA = sign-extend(offset) + GPR[base]
MEM[ translate(EA) ]  GPR[rt]
Combinational
PC  PC + 4 state update logic 37
Load-Store Datapath

Add

4
add
Read 3 ALU operation isStore
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
Write Data
data
memory
RegDest RegWrite Write
data
isItype !isStore
16 32
ALUSrc
Sign isItype MemRead
extend
isLoad

**Based on original figure from [P&H CO&D, COPYRIGHT 38


2004 Elsevier. ALL RIGHTS RESERVED.]
Datapath for Non-Control-Flow Insts.

Add

Read 3 ALU operation isStore


Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
Write Data
data
memory
RegDest RegWrite Write
data
isItype !isStore
16 32
ALUSrc
Sign isItype MemRead
extend
isLoad

MemtoReg
isLoad
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 39
Single-Cycle Datapath for
Control Flow Instructions

40
Unconditional Jump Instructions
 Assembly
J immediate26

 Machine encoding

J immediate J-type
6-bit 26-bit

 Semantics
if MEM[PC]==J immediate26
target = { PC[31:28], immediate26, 2’b00 }
PC  target

41
Unconditional Jump Datapath

isJ Add
PCSrc
4
XALU operation
Read 3 0
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
concat Write Data
data
memory
? RegWrite Write
data
ALUSrc
0 16 32
Sign X MemRead
extend

**Based on original figure from [P&H CO&D, COPYRIGHT 0


2004 Elsevier. ALL RIGHTS RESERVED.]

if MEM[PC]==J immediate26
PC = { PC[31:28], immediate26, 2’b00 } 42
What about JR, JAL, JALR?
Aside: MIPS Cheat Sheet
 http://www.ece.cmu.edu/~ece447/s15/lib/exe/fetch.php?m
edia=mips_reference_data.pdf

 On the 447 website

43
Conditional Branch Instructions
 Assembly (e.g., branch if equal)
BEQ rsreg rtreg immediate16

 Machine encoding

BEQ rs rt immediate I-type


6-bit 5-bit 5-bit 16-bit

 Semantics (assuming no branch delay slot)


if MEM[PC]==BEQ rs rt immediate16
target = PC + 4 + sign-extend(immediate) x 4
if GPR[rs]==GPR[rt] then PC  target
else PC  PC + 4

44
Conditional Branch Datapath (for you to finish)

watch out
PC + 4 from instruction datapath
Add
PCSrc Add Sum Branch target
4
Shift
left 2
Read
PC address sub
ALU operation
Read 3
Instruction register 1
Read
Instruction data 1
Read
Instruction register 2 To branch
memory Registers ALU bcond
Zero
concat Write control logic
register
Read
data 2
Write
data
RegWrite

16 0 32
Sign
extend

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]

45
How to uphold the delayed branch semantics?
Putting It All Together
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. 46


JAL, JR, JALR omitted
ALL RIGHTS RESERVED.]
Single-Cycle Control Logic

47
Single-Cycle Hardwired Control
 As combinational function of Inst=MEM[PC]
31 26 21 16 11 6 0

0 rs rt rd shamt funct R-type


6-bit 5-bit 5-bit 5-bit 5-bit 6-bit
31 26 21 16 0

opcode rs rt immediate I-type


6-bit 5-bit 5-bit 16-bit
31 26 0

opcode immediate J-type


6-bit 26-bit

 Consider
 All R-type and I-type ALU instructions

 LW and SW

 BEQ, BNE, BLEZ, BGTZ

 J, JR, JAL, JALR

48
Single-Bit Control Signals

When De-asserted When asserted Equation


GPR write select GPR write select opcode==0
RegDest according to rt, i.e., according to rd, i.e.,
inst[20:16] inst[15:11]
2nd ALU input from 2nd 2nd ALU input from sign- (opcode!=0) &&
ALUSrc GPR read port extended 16-bit (opcode!=BEQ) &&
immediate (opcode!=BNE)
Steer ALU result to GPR steer memory load to opcode==LW
MemtoReg
write port GPR wr. port
GPR write disabled GPR write enabled (opcode!=SW) &&
(opcode!=Bxx) &&
RegWrite
(opcode!=J) &&
(opcode!=JR))

49
JAL and JALR require additional RegDest and MemtoReg options
Single-Bit Control Signals

When De-asserted When asserted Equation


Memory read disabled Memory read port opcode==LW
MemRead return load value
Memory write disabled Memory write enabled opcode==SW
MemWrite

According to PCSrc2 next PC is based on 26- (opcode==J) ||


PCSrc1 bit immediate jump (opcode==JAL)
target
next PC = PC + 4 next PC is based on 16- (opcode==Bxx) &&
PCSrc2 bit immediate branch “bcond is satisfied”
target

50
JR and JALR require additional PCSrc options
ALU Control
 case opcode
‘0’  select operation according to funct
‘ALUi’  selection operation according to opcode
‘LW’  select addition
‘SW’  select addition
‘Bxx’  select bcond generation function
__  don’t care

 Example ALU operations


 ADD, SUB, AND, OR, XOR, NOR, etc.
 bcond on equal, not equal, LE zero, GT zero, etc.

51
R-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21]


1
PC
Read
address
Instruction [20– 16]
Read
register 1
Read
data 1
0
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
funct ALU operation
ALU
control
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
52
I-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21]


1
PC
Read
address
Instruction [20– 16]
Read
register 1
Read
data 1
0
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
opcodeALU operation
ALU
control
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 53


Elsevier. ALL RIGHTS RESERVED.]
LW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21]


1
PC
Read
address
Instruction [20– 16]
Read
register 1
Read
data 1
0
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation 1
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 54


Elsevier. ALL RIGHTS RESERVED.]
SW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21]


0
PC
Read
address
Instruction [20– 16]
Read
register 1
Read
data 1
1
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1

X
Instruction u register M data
u M

X
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation 0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 55


Elsevier. ALL RIGHTS RESERVED.]
Branch (Not Taken)
Some control signals are dependent
on the processing of data
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21]


0
PC
Read
address
Instruction [20– 16]
Read
register 1
Read
data 1
0
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction register M data

X
u M
u

X
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcondALU operation
ALU
control
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 56


Elsevier. ALL RIGHTS RESERVED.]
Branch (Taken)
Some control signals are dependent
on the processing of data
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21]


0
PC
Read
address
Instruction [20– 16]
Read
register 1
Read
data 1
0
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction register M data

X
u M
u

X
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcondALU operation
ALU
control
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 57


2004 Elsevier. ALL RIGHTS RESERVED.]
Jump

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

X
u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21]


0
PC
Read
address
Instruction [20– 16]
Read
register 1
Read
data 1
0
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction register M data

X
u M

X
u

X
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU operation 0
X
ALU
control

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 58


2004 Elsevier. ALL RIGHTS RESERVED.]
What is in That Control Box?
 Combinational Logic  Hardwired Control
 Idea: Control signals generated combinationally based on
instruction
 Necessary in a single-cycle microarchitecture…

 Sequential Logic  Sequential/Microprogrammed Control


 Idea: A memory structure contains the control signals
associated with an instruction
 Control Store

59
Evaluating the Single-Cycle
Microarchitecture

60
A Single-Cycle Microarchitecture
 Is this a good idea/design?

 When is this a good design?

 When is this a bad design?

 How can we design a better microarchitecture?

61
A Single-Cycle Microarchitecture: Analysis
 Every instruction takes 1 cycle to execute
 CPI (Cycles per instruction) is strictly 1

 How long each instruction takes is determined by how long


the slowest instruction takes to execute
 Even though many instructions do not need that long to
execute

 Clock cycle time of the microarchitecture is determined by


how long it takes to complete the slowest instruction
 Critical path of the design is determined by the processing
time of the slowest instruction

62
What is the Slowest Instruction to Process?
 Let’s go back to the basics

 All six phases of the instruction processing cycle take a single


machine clock cycle to complete
 Fetch 1. Instruction fetch (IF)
 Decode 2. Instruction decode and
 Evaluate Address register operand fetch (ID/RF)
 Fetch Operands 3. Execute/Evaluate memory address (EX/AG)
4. Memory operand fetch (MEM)
 Execute
5. Store/writeback result (WB)
 Store Result

 Do each of the above phases take the same time (latency)


for all instructions?
63
Single-Cycle Datapath Analysis
 Assume
 memory units (read or write): 200 ps
 ALU and adders: 100 ps
 register file (read or write): 50 ps
 other combinational logic: 0 ps
steps IF ID EX MEM WB
Delay
resources mem RF ALU mem RF

R-type 200 50 100 50 400


I-type 200 50 100 50 400
LW 200 50 100 200 50 600
SW 200 50 100 200 550
Branch 200 50 100 350
Jump 200 200 64
Let’s Find the Critical Path

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 bcond
Zero
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT 2004


Elsevier. ALL RIGHTS RESERVED.]
65
R-Type and I-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add 100ps RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
memory Instruction [15– 11] x
Write 400ps u
x
M
u
1 data 1 350ps Write
Data
memory 0
x

data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
66
LW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add 100ps RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
Instruction
memory Instruction [15– 11]
M
u
x
Write
register

Write
data 2
M
u
x
result Address
data
550ps
1
M
u
1
600ps data 1 350ps Write
Data
memory 0
x

data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
67
SW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add 100ps RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
Write x
1 data 1 350ps 550ps
Write
Data
memory 0
x

data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
68
Branch Taken

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
M M
PC+4 [31– 28]
200ps u u

100ps ALU
Add result
x
1
x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

350ps
MemWrite
ALUSrc
RegWrite

PC
Read
address
Instruction [25– 21] Read
register 1
Read
350ps
200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
250ps bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
69
Jump

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u

100ps ALU
Add result
x
1
x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

200ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
bcond
Zero
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
70
What About Control Logic?
 How does that affect the critical path?

 Food for thought for you:


 Can control logic be on the critical path?
 A note on CDC 5600: control store access too long…

71
What is the Slowest Instruction to Process?
 Memory is not magic

 What if memory sometimes takes 100ms to access?

 Does it make sense to have a simple register to register


add or jump to take {100ms+all else to do a memory
operation}?

 And, what if you need to access memory more than once to


process an instruction?
 Which instructions need this?
 Do you provide multiple ports to memory?

72
Single Cycle uArch: Complexity
 Contrived
 All instructions run as slow as the slowest instruction

 Inefficient
 All instructions run as slow as the slowest instruction
 Must provide worst-case combinational resources in parallel as required
by any instruction
 Need to replicate a resource if it is needed more than once by an
instruction during different parts of the instruction processing cycle

 Not necessarily the simplest way to implement an ISA


 Single-cycle implementation of REP MOVS (x86) or INDEX (VAX)?

 Not easy to optimize/improve performance


 Optimizing the common case does not work (e.g. common instructions)
 Need to optimize the worst case all the time
73
(Micro)architecture Design Principles
 Critical path design
 Find and decrease the maximum combinational logic delay
 Break a path into multiple cycles if it takes too long

 Bread and butter (common case) design


 Spend time and resources on where it matters most
 i.e., improve what the machine is really designed to do
 Common case vs. uncommon case

 Balanced design
 Balance instruction/data flow through hardware components
 Design to eliminate bottlenecks: balance the hardware for the
work
74
Single-Cycle Design vs. Design Principles
 Critical path design

 Bread and butter (common case) design

 Balanced design

How does a single-cycle microarchitecture fare in light of


these principles?

75
Aside: System Design Principles
 When designing computer systems/architectures, it is
important to follow good principles

 Remember: “principled design” from our first lecture


 Frank Lloyd Wright: “architecture […] based upon principle,
and not upon precedent”

76
Aside: From Lecture 1
 “architecture […] based upon principle, and not upon
precedent”

77
Aside: System Design Principles
 We will continue to cover key principles in this course
 Here are some references where you can learn more

 Yale Patt, “Requirements, Bottlenecks, and Good Fortune: Agents for


Microprocessor Evolution,” Proc. of IEEE, 2001. (Levels of
transformation, design point, etc)
 Mike Flynn, “Very High-Speed Computing Systems,” Proc. of IEEE,
1966. (Flynn’s Bottleneck  Balanced design)
 Gene M. Amdahl, "Validity of the single processor approach to achieving
large scale computing capabilities," AFIPS Conference, April 1967.
(Amdahl’s Law  Common-case design)
 Butler W. Lampson, “Hints for Computer System Design,” ACM
Operating Systems Review, 1983.
 http://research.microsoft.com/pubs/68221/acrobat.pdf

78
Aside: One Important Principle
 Keep it simple

 “Everything should be made as simple as possible, but no


simpler.”
 Albert Einstein

 And, do not forget: “An engineer is a person who can do


for a dime what any fool can do for a dollar.”

 For more, see:


 Butler W. Lampson, “Hints for Computer System Design,” ACM
Operating Systems Review, 1983.
 http://research.microsoft.com/pubs/68221/acrobat.pdf
79
Multi-Cycle Microarchitectures

80

You might also like