0% found this document useful (0 votes)
79 views

Computer Architecture: CSCE 350

This document discusses designing a multi-cycle processor datapath. It begins by reviewing the single-cycle processor datapath and control. It then explains some limitations of the single-cycle approach, such as long cycle times. The document proposes partitioning the single-cycle datapath by adding registers between steps to break it into faster pipelined stages. An example multi-cycle datapath is shown and the process of deriving the physical register transfers from the logical operations is described for different instruction types like R-type, load, store, and branch. Finally, the control model for specifying the register transfers for each cycle is discussed.

Uploaded by

kherberos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Computer Architecture: CSCE 350

This document discusses designing a multi-cycle processor datapath. It begins by reviewing the single-cycle processor datapath and control. It then explains some limitations of the single-cycle approach, such as long cycle times. The document proposes partitioning the single-cycle datapath by adding registers between steps to break it into faster pipelined stages. An example multi-cycle datapath is shown and the process of deriving the physical register transfers from the logical operations is described for different instruction types like R-type, load, store, and branch. Finally, the control model for specifying the register transfers for each cycle is discussed.

Uploaded by

kherberos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

CSCE 350

Computer Architecture



Designing a Multi-cycle Processor
Adapted from the lecture notes of John Kubiatowicz (UCB)

Recap: A Single Cycle Datapath
32
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
5 5 5
Rw Ra Rb
32 32-bit
Registers
Rs
Rt
Rt
Rd
RegDst
E
x
t
e
n
d
e
r

M
u
x

Mux
32
16
imm16
ALUSrc
ExtOp
M
u
x

MemtoReg
Clk
Data In
WrEn
32
Adr
Data
Memory
32
MemWr
A
L
U

Instruction
Fetch Unit
Clk
Equal
Instruction<31:0>
0
1
0
1
0 1
<
2
1
:
2
5
>

<
1
6
:
2
0
>

<
1
1
:
1
5
>

<
0
:
1
5
>

Imm16 Rd Rt Rs
nPC_sel
Recap: The Truth Table for the Main Control
R-type ori lw sw beq jump
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
Branch
Jump
ExtOp
ALUop (Symbolic)
1
0
0
1
0
0
0
x
R-type
0
1
0
1
0
0
0
0
Or
0
1
1
1
0
0
0
1
Add
x
1
x
0
1
0
0
1
Add
x
0
x
0
0
1
0
x
Subtract
x
x
x
0
0
0
1
x
xxx
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
ALUop <2> 1 0 0 0
0
x
ALUop <1> 0 1 0 0
0
x
ALUop <0> 0 0 0 0
1
x
Main
Control
op
6
ALU
Control
(Local)
func
3
6
ALUop
ALUctr
3
RegDst
ALUSrc
:
The Big Picture: Where are We Now?
The Five Classic Components of a Computer






Todays Topic: Designing the Datapath for the
Multiple Clock Cycle Datapath
Control
Datapath
Memory
Processor
Input
Output
Abstract View of our single cycle processor
looks like a FSM with PC as state
P
C

N
e
x
t

P
C

R
e
g
i
s
t
e
r

F
e
t
c
h

ALU
R
e
g
.


W
r
t

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

I
n
s
t
r
u
c
t
i
o
n

F
e
t
c
h

R
e
s
u
l
t

S
t
o
r
e

A
L
U
c
t
r

R
e
g
D
s
t

A
L
U
S
r
c

E
x
t
O
p

M
e
m
W
r

E
q
u
a
l

n
P
C
_
s
e
l

R
e
g
W
r

M
e
m
W
r

M
e
m
R
d

Main
Control
ALU
control
op
fun
E
x
t

Whats wrong with our CPI=1 processor?
Long Cycle Time
All instructions take as much time as the slowest
Real memory is not as nice as our idealized memory
cannot always get the job done in one (short) cycle
PC Inst Memory mux ALU Data Mem mux
PC Reg File Inst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
Reducing Cycle Time
Cut combinational dependency graph and insert register / latch
Do same work in two fast cycles, rather than one slow one
May be able to short-circuit path and remove some components
for some instructions!
storage element
Acyclic
Combinational
Logic
storage element
storage element
Acyclic
Combinational
Logic (A)
storage element
storage element
Acyclic
Combinational
Logic (B)

Basic Limits on Cycle Time
Next address logic
PC <= branch ? PC + offset : PC + 4
Instruction Fetch
InstructionReg <= Mem[PC]
Register Access
A <= R[rs]
ALU operation
R <= A + B
P
C

N
e
x
t

P
C

O
p
e
r
a
n
d

F
e
t
c
h

Exec
R
e
g
.


F
i
l
e

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

I
n
s
t
r
u
c
t
i
o
n

F
e
t
c
h

R
e
s
u
l
t

S
t
o
r
e

A
L
U
c
t
r

R
e
g
D
s
t

A
L
U
S
r
c

E
x
t
O
p

M
e
m
W
r

n
P
C
_
s
e
l

R
e
g
W
r

M
e
m
W
r

M
e
m
R
d

Control
Partitioning the CPI=1 Datapath
Add registers between smallest steps









Place enables on all registers
P
C

N
e
x
t

P
C

O
p
e
r
a
n
d

F
e
t
c
h

Exec
R
e
g
.


F
i
l
e

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

I
n
s
t
r
u
c
t
i
o
n

F
e
t
c
h

R
e
s
u
l
t

S
t
o
r
e

A
L
U
c
t
r

R
e
g
D
s
t

A
L
U
S
r
c

E
x
t
O
p

M
e
m
W
r

n
P
C
_
s
e
l

R
e
g
W
r

M
e
m
W
r

M
e
m
R
d

E
q
u
a
l

Example Multicycle Datapath
Critical Path ?
P
C

N
e
x
t

P
C

O
p
e
r
a
n
d

F
e
t
c
h

I
n
s
t
r
u
c
t
i
o
n

F
e
t
c
h

n
P
C
_
s
e
l

I
R

Reg
File
E
x
t

A
L
U

R
e
g
.


F
i
l
e

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

R
e
s
u
l
t

S
t
o
r
e

R
e
g
D
s
t

R
e
g
W
r

M
e
m
W
r

M
e
m
R
d

S
M
M
e
m
T
o
R
e
g

E
q
u
a
l

A
L
U
c
t
r

A
L
U
S
r
c

E
x
t
O
p

A
B
E
Recall: Step-by-step Processor Design
Step 1: ISA => Logical Register Transfers

Step 2: Components of the Datapath

Step 3: RTL + Components => Datapath

Step 4: Datapath + Logical RTs => Physical RTs
Step 5: Physical RTs => Control
Step 4: R-type (add, sub, . . .)
Logical Register Transfer

Physical Register Transfers
inst Logical Register Transfers
ADDU R[rd] < R[rs] + R[rt]; PC < PC + 4
inst Physical Register Transfers
IR < MEM[pc]
ADDU A< R[rs]; B < R[rt]
S < A + B
R[rd] < S; PC < PC + 4
E
x
e
c

R
e
g
.


F
i
l
e

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

S
M
R
e
g

F
i
l
e

P
C

N
e
x
t

P
C

I
R

I
n
s
t
.

M
e
m

T
i
m
e

A
B
E
Step 4: Logical immed
Logical Register Transfer

Physical Register Transfers
inst Logical Register Transfers
ORI R[rt] < R[rs] OR ZExt(Im16); PC < PC + 4
inst Physical Register Transfers
IR < MEM[pc]
ORI A< R[rs]; B < R[rt]
S < A or ZExt(Im16)
R[rt] < S; PC < PC + 4
E
x
e
c

R
e
g
.


F
i
l
e

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

S
M
R
e
g

F
i
l
e

P
C

N
e
x
t

P
C

I
R

I
n
s
t
.

M
e
m

T
i
m
e

A
B
E
Step 4 : Load
Logical Register Transfer

Physical Register Transfers
inst Logical Register Transfers
LW R[rt] < MEM[R[rs] + SExt(Im16)];
PC < PC + 4
inst Physical Register Transfers
IR < MEM[pc]
LW A< R[rs]; B < R[rt]
S < A + SExt(Im16)
M < MEM[S]
R[rd] < M; PC < PC + 4
E
x
e
c

R
e
g
.


F
i
l
e

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

S
M
R
e
g

F
i
l
e

P
C

N
e
x
t

P
C

I
R

I
n
s
t
.

M
e
m

A
B
E
T
i
m
e

Step 4 : Store
Logical Register Transfer

Physical Register Transfers
inst Logical Register Transfers
SW MEM[R[rs] + SExt(Im16)] < R[rt];
PC < PC + 4
inst Physical Register Transfers
IR < MEM[pc]
SW A< R[rs]; B < R[rt]
S < A + SExt(Im16);
MEM[S] < B PC < PC + 4
E
x
e
c

R
e
g
.


F
i
l
e

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

S
M
R
e
g

F
i
l
e

P
C

N
e
x
t

P
C

I
R

I
n
s
t
.

M
e
m

A
B
E
T
i
m
e

Step 4 : Branch
Logical Register Transfer


Physical Register Transfers
inst Logical Register Transfers
BEQ if R[rs] == R[rt]
then PC <= PC + 4+SExt(Im16) || 00
else PC <= PC + 4
E
x
e
c

R
e
g
.


F
i
l
e

M
e
m

A
c
c
e
s
s

D
a
t
a

M
e
m

S
M
R
e
g

F
i
l
e

P
C

N
e
x
t

P
C

I
R

I
n
s
t
.

M
e
m

inst Physical Register Transfers
IR < MEM[pc]
BEQ E< (R[rs] = R[rt])
if !E then PC < PC + 4
else PC <
PC+4+SExt(Im16)||00
A
B
E
T
i
m
e

Alternative data-path (book): Multiple Cycle Datapath
Minimizes Hardware: 1 memory, 1 adder
Ideal
Memory
WrAdr
Din
RAdr
32
32
32
Dout
MemWr
32
A
L
U

32
32
ALUOp
ALU
Control
I
n
s
t
r
u
c
t
i
o
n

R
e
g

32
IRWr
32
Reg File
Ra
Rw
busW
Rb
5
5
32
busA
32 busB
RegWr
Rs
Rt
M
u
x

0
1
Rt
Rd
PCWr
ALUSelA
Mux 0 1
RegDst
M
u
x

0
1
32
PC
MemtoReg
Extend
ExtOp
M
u
x

0
1
32
0
1
2
3
4
16
Imm
32
<< 2
ALUSelB
M
u
x

1
0
Target
32
Zero
Zero
PCWrCond PCSrc BrWr
32
IorD
A
L
U

O
u
t

Our Control Model
State specifies control points for Register Transfer
Transfer occurs upon exiting state (same falling edge)
Control State
Next State
Logic
Output Logic
inputs (conditions)
outputs (control points)
State X
Register Transfer
Control Points
Depends on Input
Step 4 Control Specification for multicycle proc
IR <= MEM[PC]
R-type
A <= R[rs]
B <= R[rt]
S <= A fun B
R[rd] <= S
PC <= PC + 4
S <= A or ZX
R[rt] <= S
PC <= PC + 4
ORi
S <= A + SX
R[rt] <= M
PC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
PC <= PC + 4
BEQ
PC <=
Next(PC,Equal)
SW
instruction fetch
decode / operand fetch
E
x
e
c
u
t
e

M
e
m
o
r
y

W
r
i
t
e
-
b
a
c
k

Traditional FSM Controller
State
6
4
11
next
State
op
Equal
control points
state op cond
next
state
control points
Truth Table
datapath State
Step 5 (datapath + state diagram control)
Translate RTs into control points
Assign states

Then go build the controller
Mapping RTs to Control Points
IR <= MEM[PC]
R-type
A <= R[rs]
B <= R[rt]
S <= A fun B
R[rd] <= S
PC <= PC + 4
S <= A or ZX
R[rt] <= S
PC <= PC + 4
ORi
S <= A + SX
R[rt] <= M
PC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
PC <= PC + 4
BEQ
PC <=
Next(PC,Equal)
SW
instruction fetch
decode
imem_rd, IRen
ALUfun, Sen
RegDst,
RegWr,
PCen
Aen, Ben,
Een
E
x
e
c
u
t
e

M
e
m
o
r
y

W
r
i
t
e
-
b
a
c
k

Assigning States
IR <= MEM[PC]
R-type
A <= R[rs]
B <= R[rt]
S <= A fun B
R[rd] <= S
PC <= PC + 4
S <= A or ZX
R[rt] <= S
PC <= PC + 4
ORi
S <= A + SX
R[rt] <= M
PC <= PC + 4
M <= MEM[S]
LW
S <= A + SX
MEM[S] <= B
PC <= PC + 4
BEQ
PC <= Next(PC)
SW
instruction fetch
decode
0000
0001
0100
0101
0110
0111
1000
1001
1010
0011 1011
1100
E
x
e
c
u
t
e

M
e
m
o
r
y

W
r
i
t
e
-
b
a
c
k

(Mostly) Detailed Control Specification (missing0)
0000 ?????? ? 0001 1
0001 BEQ x 0011 1 1 1
0001 R-type x 0100 1 1 1
0001 ORI x 0110 1 1 1
0001 LW x 1000 1 1 1
0001 SW x 1011 1 1 1

0011 xxxxxx 0 0000 1 0 x 0 x
0011 xxxxxx 1 0000 1 1 x 0 x
0100 xxxxxx x 0101 0 1 fun 1
0101 xxxxxx x 0000 1 0 0 1 1
0110 xxxxxx x 0111 0 0 or 1
0111 xxxxxx x 0000 1 0 0 1 0
1000 xxxxxx x 1001 1 0 add 1
1001 xxxxxx x 1010 1 0 1
1010 xxxxxx x 0000 1 0 1 1 0
1011 xxxxxx x 1100 1 0 add 1
1100 xxxxxx x 0000 1 0 0 1 0
State Op field Eq Next IR PC Ops Exec Mem Write-Back
en sel A B E Ex Sr ALU S R W M M-R Wr Dst

R:
ORi:
LW:
SW:
-all same in Moore machine
BEQ:
Performance Evaluation
What is the average CPI?
state diagram gives CPI for each instruction type
workload gives frequency of each type
Type CPI
i
for type Frequency CPI
i
x freqI
i

Arith/Logic 4 40% 1.6
Load 5 30% 1.5
Store 4 10% 0.4
branch 3 20% 0.6
Average CPI: 4.1
Controller Design
The state digrams that arise define the controller for an
instruction set processor are highly structured
Use this structure to construct a simple
microsequencer
Control reduces to programming this very simple device
microprogramming
sequencer
control
datapath control
micro-PC
sequencer
microinstruction
Our Microsequencer
op-code
Map ROM
Micro-PC
Z I L
datapath control
taken
Microprogram Control Specification
0000 ? inc 1
0001 0 load 1 1

0011 0 zero 1 0
0011 1 zero 1 1
0100 x inc 0 1 fun 1
0101 x zero 1 0 0 1 1
0110 x inc 0 0 or 1
0111 x zero 1 0 0 1 0
1000 x inc 1 0 add 1
1001 x inc 1 0 1
1010 x zero 1 0 1 1 0
1011 x inc 1 0 add 1
1100 x zero 1 0 0 1 0
PC Taken Next IR PC Ops Exec Mem Write-Back
en sel A B Ex Sr ALU S R W M M-R Wr Dst

R:
ORi:
LW:
SW:
BEQ
Overview of Control
Control may be designed using one of several initial
representations. The choice of sequence control, and how logic is
represented, can then be determined independently; the control
can then be implemented with one of several methods using a
structured logic technique.
Initial Representation Finite State Diagram Microprogram

Sequencing Control Explicit Next State Microprogram counter
Function + Dispatch ROMs

Logic Representation Logic Equations Truth Tables

Implementation PLA ROM
Technique hardwired control microprogrammed control
Microprogramming (Maurice Wilkes)
Control is the hard part of processor design
Datapath is fairly regular and well-organized
Memory is highly regular
Control is irregular and global
Microprogramming:

-- A Particular Strategy for Implementing the Control Unit of a
processor by "programming" at the level of register transfer
operations


Microarchitecture:

-- Logical structure and functional capabilities of the hardware as
seen by the microprogrammer


Historical Note:
IBM 360 Series first to distinguish between architecture & organization
Same instruction set across wide range of implementations, each with
different cost/performance
Macroinstruction Interpretation
Main
Memory
execution
unit
control
memory
CPU
ADD
SUB
AND
DATA
.
.
.
User program
plus Data

this can change!
AND microsequence

e.g., Fetch
Calc Operand Addr
Fetch Operand(s)
Calculate
Save Answer(s)
one of these is
mapped into one
of these
sequencer
control
micro-PC
-sequencer:
fetch,dispatch,
sequential
Dispatch
ROM
Opcode
Inputs
Microprogramming
Microprogramming is a fundamental concept
implement an instruction set by building a very simple processor
and interpreting the instructions
essential for very complex instructions and when few register
transfers are possible
overkill when ISA matches datapath 1-1
-Code ROM
To DataPath
Decode Decode
datapath control
microinstruction ()
Designing a Microinstruction Set
1) Start with list of control signals
2) Group signals together that make sense (vs. random): called
fields
3) Place fields in some logical order
(e.g., ALU operation & ALU operands first and
microinstruction sequencing last)
4) To minimize the width, encode operations that will never be
used at the same time
5) Create a symbolic legend for the microinstruction format,
showing name of field values and how they set the control
signals
Use computers to design computers
1&2) Start with list of control signals, grouped into fields
Signal name Effect when deasserted Effect when asserted
ALUSelA 1st ALU operand = PC 1st ALU operand = Reg[rs]
RegWrite None Reg. is written
MemtoReg Reg. write data input = ALU Reg. write data input = memory
RegDst Reg. dest. no. = rt Reg. dest. no. = rd
MemRead None Memory at address is read,
MDR <= Mem[addr]
MemWrite None Memory at address is written
IorD Memory address = PC Memory address = S
IRWrite None IR <= Memory
PCWrite None PC <= PCSource
PCWriteCond None IF ALUzero then PC <= PCSource
PCSource PCSource = ALU PCSource = ALUout
ExtOp Zero Extended Sign Extended
S
i
n
g
l
e

B
i
t

C
o
n
t
r
o
l

Signal name Value Effect
ALUOp 00 ALU adds
01 ALU subtracts
10 ALU does function code
11 ALU does logical OR
ALUSelB 00 2nd ALU input = 4
01 2nd ALU input = Reg[rt]
10 2nd ALU input = extended,shift left 2
11 2nd ALU input = extended
M
u
l
t
i
p
l
e

B
i
t

C
o
n
t
r
o
l

3&4) Microinstruction Format: unencoded vs. encoded fields
Field Name Width Control Signals Set
wide narrow
ALU Control 4 2 ALUOp
SRC1 2 1 ALUSelA
SRC2 5 3 ALUSelB, ExtOp
ALU Destination 3 2 RegWrite, MemtoReg, RegDst
Memory 3 2 MemRead, MemWrite, IorD
Memory Register 1 1 IRWrite
PCWrite Control 3 2 PCWrite, PCWriteCond, PCSource
Sequencing 3 2 AddrCtl
Total width 24 15 bits
5) Legend of Fields and Symbolic Names
Field Name Values for Field Function of Field with Specific Value
ALU Add ALU adds
Subt. ALU subtracts
Func code ALU does function code
Or ALU does logical OR
SRC1 PC 1st ALU input = PC
rs 1st ALU input = Reg[rs]
SRC2 4 2nd ALU input = 4
Extend 2nd ALU input = sign ext. IR[15-0]
Extend0 2nd ALU input = zero ext. IR[15-0]
Extshft 2nd ALU input = sign ex., sl IR[15-0]
rt 2nd ALU input = Reg[rt]
destination rd ALU Reg[rd] = ALUout
rt ALU Reg[rt] = ALUout
rt Mem Reg[rt] = Mem
Memory Read PC Read memory using PC
Read ALU Read memory using ALUout for addr
Write ALU Write memory using ALUout for addr
Memory register IR IR = Mem
PC write ALU PC = ALU
ALUoutCond IF ALU Zero then PC = ALUout
Sequencing Seq Go to sequential instruction
Fetch Go to the first microinstruction
Dispatch Dispatch using ROM.
Quick check: what do these fieldnames mean?
Code Name RegWrite MemToReg RegDest
00 --- 0 X X
01 rd ALU 1 0 1
10 rt ALU 1 0 0
11 rt MEM 1 1 0
Code Name ALUSelB ExtOp
000 --- X X
001 4 00 X
010 rt 01 X
011 ExtShft 10 1
100 Extend 11 1
111 Extend0 11 0
Destination:
SRC2:
Specific Sequencer
Sequencer-based control unit
Called microPC or PC vs. state register
Code Name Effect
00 fetch Next address = 0
01 dispatch Next address = dispatch ROM
10 seq Next address = address + 1

ROM:


Opcode
microPC
1
Address
Select
Logic
Adder
ROM
Mux
0
0 1 2
R-type 000000 0100
BEQ 000100 0011
ori 001101 0110
LW 100011 1000
SW 101011 1011
Microprogram it yourself!
Label ALU SRC1 SRC2 Dest. Memory Mem. Reg. PC Write Sequencing
Fetch: Add PC 4 Read PC IR ALU Seq

Microprogramming Pros and Cons
Ease of design
Flexibility
Easy to adapt to changes in organization, timing, technology
Can make changes late in design cycle, or even in the field
Can implement very powerful instruction sets (just more
control memory)
Generality
Can implement multiple instruction sets on same machine.
Can tailor instruction set to application.
Compatibility
Many organizations, same instruction set
Costly to implement
Slow
Summary
Microprogramming is a fundamental concept
implement an instruction set by building a very simple processor and
interpreting the instructions
essential for very complex instructions and when few register
transfers are possible
Control design reduces to Microprogramming
Design of a Microprogramming language
1. Start with list of control signals
2. Group signals together that make sense (vs. random): called fields
3. Place fields in some logical order (e.g., ALU operation & ALU
operands first and microinstruction sequencing last)
4. To minimize the width, encode operations that will never be used at
the same time
5. Create a symbolic legend for the microinstruction format, showing
name of field values and how they set the control signals

You might also like