Cpu
Cpu
ICS 233
Computer Architecture and Assembly Language
Dr. Aiman El-Maleh
College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals
Outline
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design
Op6: 6-bit opcode of the instruction Rs5, Rt5, Rd5: 5-bit source and destination register numbers sa5: 5-bit shift amount used by shift instructions funct6: 6-bit function field for R-type instructions immediate16: 16-bit immediate value or address offset immediate26: 26-bit target address of the jump instruction
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 5
Jump (J-type): j
This subset does not include all the integer instructions But sufficient to illustrate design of datapath and control Concepts used to implement the MIPS subset are used to construct a broad spectrum of computers
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 6
Meaning
op6 = 0 op6 = 0 op6 = 0 op6 = 0 op6 = 0 op6 = 0 0x08 0x0a 0x0c 0x0d 0x0e 0x23 0x2b 0x04 0x05 0x02 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5
Format
rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rd5 rd5 rd5 rd5 rd5 rd5 0 0 0 0 0 0 im16 im16 im16 im16 im16 im16 im16 im16 im16 0x20 0x22 0x24 0x25 0x26 0x2a
rd, rs, rt addition rd, rs, rt subtraction rd, rs, rt bitwise and rd, rs, rt bitwise or rd, rs, rt exclusive or rd, rs, rt set on less than rt, rs, im16 add immediate rt, rs, im16 slt immediate rt, rs, im16 and immediate rt, rs, im16 or immediate rt, im16 xor immediate rt, im16(rs) load word rt, im16(rs) store word rs, rt, im16 branch if equal rs, rt, im16 branch not equal im26 jump
im26
Muhamed Mudawar slide 7
RTL Description
Reg(Rd) Reg(Rs) + Reg(Rt); Reg(Rd) Reg(Rs) Reg(Rt); Reg(Rt) Reg(Rs) | zero_ext(Im16); Reg(Rt) MEM[Reg(Rs) + sign_ext(Im16)]; MEM[Reg(Rs) + sign_ext(Im16)] Reg(Rt); if (Reg(Rs) == Reg(Rt)) PC PC + 4 + 4 sign_extend(Im16) else PC PC + 4
ICS 233 KFUPM Muhamed Mudawar slide 8
PC PC + 4 PC PC + 4 PC PC + 4 PC PC + 4 PC PC + 4
I-type
BEQ
SW
Jump
concatenation
Registers
32 32-bit general purpose registers, R0 is always zero Read source register Rs Read source register Rt Write destination register Rt or Rd
Program counter PC register and Adder to increment PC Sign and Zero extender for immediate constant ALU for executing instructions
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 11
Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design
32
32
Extend
m u x
1
select
32
A L U
zero
32
ALU result
overflow ALU control
Storage Elements
PC
32 32 32
32
Instruction
32
Data Memory
Address
32
Address
Instruction Memory
32
Data_out Data_in
Registers
5 32
MemRead
MemWrite
RA
5
BusA
32
Clocking methodology
Timing of reads and writes
Single Cycle Processor Design ICS 233 KFUPM
RB
5
BusB BusW
32
RW Clock RegWrite
Register Element
Register
Similar to the D-type Flip-Flop
Data_In n bits Write Enable
Clock
Register
Data_Out n bits
Enable / disable writing of register Negated (0): Data_Out will not change Asserted (1): Data_Out will become Data_In after clock edge
RB
BusA and BusB: 32-bit output busses for reading 2 registers BusW: 32-bit input bus for writing a register when RegWrite is 1 Two registers read and one written in a cycle
RA
5
32
RB
5
32
RW Clock
BusW
32
Clock input
RegWrite
Tri-State Buffers
Allow multiple sources to drive a single bus Two Inputs:
Data signal (data_in)
Enable
Output enable
Data_in
Data_out
Select
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 16
RA 5 Decoder
"0"
RB 5 Decoder
"0"
Tri-state buffer
R1
32
Decoder
32
RW
5
. . .
32
R2
. . .
32
32
32
BusA
BusW R31
32
Clock
Single Cycle Processor Design
RegWrite
ICS 233 KFUPM
BusB
Muhamed Mudawar slide 17
Shifter
c0
sign A d d e r
SLT: ALU does a SUB and check the sign and overflow
32
ALU Result
32
1 2 3
32
32
overflow
0 1
zero
Logic Unit
Logical Operation AND = 00 OR = 01 NOR = 10 XOR = 11
2 3 2
32
32
Address Instruction
Instruction Memory
Data Memory
32
Address Data_out
32
Data_in Clock
MemRead
MemWrite
Clocking Methodology
Clocks are needed in a sequential We assume edgelogic to decide when a state element triggered clocking (register) should be updated All state changes
occur on the same To ensure correctness, a clocking clock edge methodology defines when data can Data must be valid be written and read
Register 1 Register 2
Combinational logic
clock
rising edge
falling edge
ICS 233 KFUPM
Edge-triggered clocking allows a register to be read and written during same clock cycle
Muhamed Mudawar slide 20
Combinational logic
clock writing edge
Tclk-q
Tmax_comb
Ts
Th: hold time that input to a Th register must hold after arrival of clock edge Hold time (Th) is normally satisfied since Tclk-q > Th
Muhamed Mudawar slide 21
Clock Skew
Clock skew arises because the clock signal uses different paths with slightly different delays to reach state elements Clock skew is the difference in absolute time between when two storage elements see a clock edge With a clock skew, the clock cycle time is increased
Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design
4
32 32
A d d
Instruction
30
+1
00
Improved Datapath
32
00
30
PC
32
Instruction Memory
PC
Address
Instruction
32
Address
Instruction Memory
Rd5
sa5
ALUCtrl
funct6
+1
00
30 32
Instruction Memory
Instruction Address
Registers
Rs 5
32
32
RA RB RW
Rt 5 Rd 5
PC
32
A L U
32
ALU result
Control signals
ALUCtrl is derived from the funct field because Op = 0 for R-type RegWrite is used to enable the writing of the ALU result
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 25
immediate16
ALUCtrl
+1
00
30 32
Instruction Memory
Instruction Address
Registers
Rs 5
32
5
32
RA RB
BusA
32
PC
Rt 5
32
A L U
32
RW
ALU result
Imm16
Extender
Control signals
ALUCtrl is derived from the Op field
Second ALU input comes from the extended immediate RB and BusB are not used
RegWrite is used to enable the writing of the ALU result ExtOp is used to control the extension of the 16-bit immediate
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 26
+1
00
ALUCtrl
30 32
Instruction Memory
Instruction Address
Registers
Rs 5
32
32
RA RB RW
BusA
32
Rt 5
PC
m u Rd x
1 5
m u x
1
A L U
32
RegDst Imm16
Another mux selects 2nd ALU input as either source register Rt data on BusB or the extended immediate
Extender
Control signals
ALUCtrl is derived from either the Op or the funct field RegWrite enables the writing of the ALU result
+1
00
ALUCtrl
30 32
Instruction Memory
Instruction Address
Registers
Rs 5
32
32
RA RB
BusA
32
Rt
0
PC
m u Rd x
1 5
RW
m u x
1
A L U
32
RegDst = 1 Imm16
Extender
RegWrite = 1
For R-type ALU instructions, RegDst is 1 to select Rd on RW and ALUSrc is 0 to select BusB as second ALU input. The active part of datapath is shown in green
30
+1
00
ALUCtrl
30 32
Instruction Memory
Instruction Address
Registers
Rs 5
32
32
RA RB
BusA
32
Rt
0
PC
m u Rd x
1 5
RW
m u x
1
A L U
32
RegDst = 0 Imm16
Extender
ICS 233 KFUPM
For I-type ALU instructions, RegDst is 0 to select Rt on RW and ALUSrc is 1 to select Extended immediate as second ALU input. The active part of datapath is shown in green
Muhamed Mudawar slide 28
Control signal ExtOp indicates type of extension Extender Implementation: wiring and one AND gate
ExtOp = 0 Upper16 = 0
ExtOp
Upper 16 bits
Imm16
. . .
. . .
Lower 16 bits
Muhamed Mudawar slide 29
ALUCtrl
32
MemRead
MemWrite
MemtoReg
Extender
RA BusA
30
+1
00
30 32
Instruction Memory
Instruction Address
Rs 5
32
Rt 5 m u Rd x
1 0
Registers
RB RW
PC
BusB
BusW
m u x
1
A L U
32
Data Memory
Address
Data_out Data_in
32
m 32 u x
1
32
RegDs t
RegWrite
A 3rd mux selects data on BusW as either ALU result or memory data_out BusB is connected to Data_in of Data Memory for store instructions
ExtOp = sign
ALUCtrl = ADD
32
MemRead =1
MemWrite =0 MemtoReg =1
Extender
RA
BusA
ALUSrc =1
ALU result
30
+1
00
30 32
Instruction Memory
Instruction Address
Rs 5
32
32
Rt 5 m u Rd x
1 5 0
Registers
RB RW BusB BusW m u x
1 0
PC
A L U
32
Data Memory
Address Data_out Data_in
32
m 32 u x
1
32
RegDst RegWrite =0 =1
MemRead = 1 to read data memory MemtoReg = 1 places the data read from memory on BusW RegWrite = 1 to write the memory data on BusW to register Rt
Muhamed Mudawar slide 31
ALUSrc = 1 selects extended immediate as second ALU input ALUCtrl = ADD to calculate data memory address as Reg(Rs) + sign-extend(Imm16)
Single Cycle Processor Design ICS 233 KFUPM
ExtOp = sign
ALUCtrl = ADD
32
MemRead =0
MemWrite =1 MemtoReg =x
Extender
RA
BusA
ALUSrc =1
ALU result
30
+1
00
30 32
Instruction Memory
Instruction Address
Rs 5
32
32
Rt 5 m u Rd x
1 5 0
Registers
RB RW BusB BusW m u x
1 0
PC
A L U
32
Data Memory
Address Data_out Data_in
32
m 32 u x
1
32
RegDst RegWrite =x =0
ALUSrc = 1 to select the extended immediate as second ALU input ALUCtrl = ADD to calculate data memory address as Reg(Rs) + sign-extend(Imm16)
Single Cycle Processor Design ICS 233 KFUPM
Next PC
Imm16
MemRea d
MemWrite MemtoReg
ALU result
PCSrc
+1
00
30
Instruction Memory
Instruction Address
zero
Rs 5
32
RA
BusA
PC
m u x
1
Rt 5 m u Rd x
1 5 0
Registers
RB RW BusB BusW
Ext
m u x
1 0
A L U
Data Memory
Address Data_out Data_in
32
m 32 u x
1
Next PC computes jump or branch target instruction address For Branch, ALU does a subtraction
Muhamed Mudawar slide 33
Details of Next PC
Branch or Jump Target Address
30
PCSrc
A D D
30
SE
msb 4
m 30 u x
Beq Bne
Imm16
J Zero
Jump target address: upper 4 bits of PC are concatenated with Imm26 PCSrc = J + (Beq . Zero) + (Bne . Zero)
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 34
PCSrc =1
Next PC
Imm16
MemRea d =0
MemWrite =0 MemtoReg =x
ALU result
+1
00
30
Instruction Memory
Instruction Address
zero
Rs 5
32
RA
BusA Ext m u x
1 0
PC
m u x
1
Rt 5 m u Rd x
1 5 0
Registers
RB RW BusB BusW
A L U
Data Memory
Address Data_out Data_in
32
m 32 u x
1
RegDst RegWrite =x =0
ExtOp =x
ALUSrc ALUCtrl J = 1 =x =x
PCSrc =1
Next PC
Imm16
MemRea d =0
MemWrite =0 MemtoReg =x
ALU result
+1
00
30
Instruction Memory
Instruction Address
zero
Rs 5
32
RA
BusA Ext m u x
1 0
PC
m u x
1
Rt 5 m u Rd x
1 5 0
Registers
RB RW BusB BusW
A L U
Data Memory
Address Data_out Data_in
32
m 32 u x
1
RegDst RegWrite =x =0
ExtOp =x
Next PC logic determines PCSrc according to zero flag RegDst = ExtOp = MemtoReg = x
Muhamed Mudawar slide 36
Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design
Datapath
ALUSrc ExtOp MemWrite MemtoReg MemRead funct6 Beq Bne J
A L U
Op6
ALUCtrl
Input: Output:
Main Control
ALUOp
ALU Control
6-bit opcode field from instruction 10 control signals for datapath ALUOp for ALU Control
Input:
6-bit function field from instruction
Output:
ALUCtrl signal for ALU
ICS 233 KFUPM Muhamed Mudawar slide 38
Next PC
Imm16
PCSrc
+1
00
zero
30
Instruction Memory
Instruction Address
Rs 5
32
RA
BusA Ext m u x
1 0
PC
m u x
1
Rt 5 m u Rd x
1 5 0
Registers
RB RW BusB BusW
A L U
Data Memory
Address Data_out Data_in
32
m 32 u x
1
RegDst RegWrite
ExtOp
Op
ALU Ctrl
ALUOp
MemWrite
MemRead MemtoReg
Main Control
Effect when 0
Destination register = Rt None 16-bit immediate is zero-extended
Effect when 1
Destination register = Rd Destination register is written with the data value on BusW 16-bit immediate is sign-extended
Second ALU operand comes from the Second ALU operand comes from second register file output (BusB) the extended 16-bit immediate
None Data memory is read Data_out Memory[address] Data memory is written Memory[address] Data_in BusW = Data_out from Memory PC Branch target address If branch is taken
MemWrite
None
J
ALUOp
PC PC + 4
This multi-bit signal specifies the ALU operation as a function of the opcode
ICS 233 KFUPM Muhamed Mudawar slide 40
R-type 1 = Rd
addi slti andi ori xori lw sw beq bne j 0 = Rt 0 = Rt 0 = Rt 0 = Rt 0 = Rt 0 = Rt x x x x
1
1 1 1 1 1 1 0 0 0 0
0=BusB R-type
ADD SLT AND OR XOR ADD ADD SUB SUB x
0
0 0 0 0 0 0 0 1 0 0
0
0 0 0 0 0 0 0 0 1 0
0
0 0 0 0 0 0 0 0 0 1
0
0 0 0 0 0 1 0 0 0 0
0
0 0 0 0 0 0 1 0 0 0
0
0 0 0 0 0 1 x x x x
1=sign 1=Imm 1=sign 1=Imm 0=zero 1=Imm 0=zero 1=Imm 0=zero 1=Imm 1=sign 1=Imm 1=sign 1=Imm x x x 0=BusB 0=BusB x
Op6
Decoder
R-type addi slti andi ori xori lw sw
Muhamed Mudawar slide 42
MemtoReg <= lw
Beq Bne J
4-bit Encoding 0000 0010 0100 0101 0110 1010 0000 1010 0100 0101 0110 0000 0000 0010 0010 x
The 4-bit encoding for ALUctrl is chosen here to be equal to the last 4 bits of the function field Other binary encodings are also possible. The idea is to choose a binary encoding that will minimize the logic for ALU Control
Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design
longest delay
Instruction Fetch Reg Read ALU ALU ALU
Memory Read
Memory Write
Reg Write
Multicycle Implementation
Break instruction execution into five steps
Instruction fetch
# cycles
4 5
Instruction
Branch Jump
# cycles
3 2
Muhamed Mudawar slide 46
Performance Example
Assume the following operation times for components:
Instruction and data memories: 200 ps
Solution
Instruction Class ALU Load Store Branch Jump Instruction Memory 200 200 200 200 200 Register Read 150 150 150 150 150 ALU Operation 180 180 180 180 200 200 Data Memory Register Write 150 150 Total 680 ps 880 ps 730 ps 530 ps 350 ps
Summary
5 steps to design a processor
Analyze instruction set => datapath requirements Select datapath components & establish clocking methodology Assemble datapath meeting the requirements Analyze implementation of each instruction to determine control signals Assemble the control logic