CG2028 Lecture 4
CG2028 Lecture 4
Acknowledgement / References:
◼ Text by Patterson and Hennessey
◼ Text and by Harris and Harris
Contents
2
CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Write registers
FF 10 ns
Max clock
15 ns FF 10 ns FF = 1/Critical path delay
= 1/40 ns = 25 MHz
FF 25 ns
FF 10 ns FF
Max clock
15 ns 10 ns = 1/Critical path delay
FF
= 1/25 ns = 40 MHz
FF 25 ns FF
4
Architecture vs Microarchitecture
◼ Architecture: programmer’s view of computer
◼ Defined by instructions & operand locations
◼ Assembly language: human-readable format of
instructions
◼ Machine language: computer-readable format
(1’s and 0’s)
◼ Assembly language -> Machine language
conversion is done by the assembler
◼ one to one correspondence
6
DP Register Operand2 Format
X=unused/irrelevant
31:28 27:26 25 24:21 20 19:16 15:12 11:5 4 3:0
X op I cmd S Rn Rd X M Rm
4 bits 2 bits 4 bits 4 bits 7 bits 1 bit 4 bits
funct
6 bits
◼ Operands OP{S} Rd, Rn, Rm
◼ Rn : first source register OP => ADD, SUB, AND, ORR
◼ Rm : second source register
◼ Rd : destination register
◼ Control fields
◼ op : the operation code or opcode
◼ op = 0b00 for data-processing (DP) instructions
◼ funct is composed of cmd, I-bit, and S-bit
◼ cmd = 0b0000 for AND, 0b0010 for SUB, 0b0100 for ADD, 0b1100 for ORR
◼ I = immediate = 0b0 for register Operand2
◼ S = set flags = 0b1 if the suffix S is specified, for example, ADDS, ANDS
◼ M = 0b0
7
DP Register Operand2 Example
X op I cmd S Rn Rd X M Rm
0000 00 0 0100 1 0011 0010 0000000 0 0101
8
DP Immediate Operand2 Format
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:0
X op I cmd S Rn Rd X imm8
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits
X op I cmd S Rn Rd X imm8
0000 00 1 0010 0 0011 0010 0000 10101011
10
Memory Instruction Format
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0
X op X P U X W L Rn Rd X imm8
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits
◼ U : Add
X op X P U X W L Rn Rd X imm8
0000 01 0 1 0 0 0 0 0101 1011 0000 00011010
12
Branch Instruction Format
31:28 27:26 25 24 23 22 21 20 19:8 7:0
cond op X X U X X X X imm8
4 bits 2 bits 12 bits 8 bits
funct
6 bits
B{cond} LABEL
◼ Encodes B{cond} LABEL encoded as #±imm8
taken
◼ imm8 = # of bytes BTA is away from current PC+4
◼ U : add
◼ 0b1 -> BTA = PC+4+imm8; 0b0 -> BTA = PC+4-imm8
13
Branch Instruction Example
0x8040 TEST: LDR R5, [R0, #4] BTA • PC = 0x8050
0x8044 STR R5, [R1, #1] • PC+4 = 0x8054
0x8048 ADD R3, R3, #1 • BTA = TEST = 0x8040
0x804C MOV R5, R4 • offset = 0x8040-0x8054
0x8050 B TEST PC = -0x14,
0x8054 LDR R3, [R1] PC+4 encoded as U = 0b0
0x8058 SUB R4, R3, #9 imm8 = 0x14
◼ B TEST
◼ cond = 0b1110 for unconditional branch
cond op X X U X X X X imm8
1110 10 0 0 0 0 0 0 000000000000 00010100
◼ Processor
◼ Datapath: functional blocks
◼ Control: control signals
◼ Basic styles
◼ Single-cycle: Each instruction executes in a
single cycle
◼ Multicycle: Each instruction is broken up into
series of shorter steps
◼ Pipelined: Each instruction broken up into series
of steps & multiple instructions execute at once
16
Architectural State Elements
◼ Architectural state determines everything about a
processor (state of program execution) CLK
Status
◼ 16 registers (including PC) D Q Z
◼ Memory En
PC+ PC WE3
A A1 RD1 CLK
RD
Memory
Register
Instr
WE
Instr
A2 A RD
File
Memory
RD2
Data
A3
WD3
WD
17
Datapath : LDR – Fetch
CLK
CLK
PC+ PC WE3
A A1 RD1 CLK
RD
Memory
Register
Instr
WE
Instr
A2 A RD
File
Memory
RD2
Data
A3
WD3
WD
18
Datapath : LDR – Read RF
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0
Register
Instr
WE
Instr
A2 A RD
File
Memory
RD2
Data
A3
WD3
WD
19
Datapath : LDR – Extend Immediate
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0
CLK
CLK
Register
Instr
WE
Instr
A2 A RD
File
Memory
RD2
Data
A3
WD3
WD
7:0 (imm8)
Extend ExtImm
20
Datapath : LDR – Data Mem Address
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0
0100 or 0010
ALUControl=
CLK
CLK
SrcA
RD
Memory
Register
Instr
WE
Instr
ALUResult
ALU
A2 A RD
File
Memory
RD2
SrcB
Data
A3
WD3
WD
7:0 (imm8)
Extend ExtImm
21
Datapath : LDR – Read Data Mem
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0
0100 or 0010
ALUControl=
RegWrite=1
CLK
CLK
SrcA
RD
Memory
Register
Instr
WE
Instr
ALUResult
ALU
A2 A RD
File
Memory
ReadData
RD2
SrcB
Data
A3
15:12 (Rd) WD3
WD
7:0 (imm8)
Extend ExtImm
Result
22
Datapath : LDR – PC Increment
0100 or 0010
ALUControl=
RegWrite=1
CLK
CLK
SrcA
RD
Memory
Register
Instr
WE
Instr
ALUResult
ALU
A2 A RD
File
Memory
ReadData
RD2
SrcB
Data
A3
15:12 (Rd) WD3
WD
7:0 (imm8)
Extend ExtImm
+
PCPlus4
4
Result
23
Datapath : STR
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0
0100 or 0010
ALUControl=
RegWrite=0
MemWrite=1
CLK
CLK
SrcA
RD
Memory
Register
Instr
WE
Instr
ALUResult
ALU
A2 A RD
File
Memory
ReadData
RD2
SrcB
Data
A3
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8)
Extend ExtImm
+
PCPlus4
4
Result
24
Datapath : Data Processing (Immediate)
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:0
op I cmd S Rn Rd X imm8
DP immediate
X
OP{S} Rd, Rn, #imm8
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct OP => ADD, AND,..
6 bits
ALUControl=
RegWrite=1
cmd
MemWrite=0
MemtoReg=0
ALUFlags
CLK
CLK
SrcA
RD
Memory
Register
Instr
WE
Instr
ALUResult
ALU
A2 A RD
File
Memory
ReadData
RD2
SrcB
Data
A3
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0
+
PCPlus4
4
Result
25
Datapath : Data Processing (Register)
31:28 27:26 25 24:21 20 19:16 15:12 11:5 4 3:0
op I cmd S Rn Rd X Rm
DP register
X M
OP{S} Rd, Rn, Rm
4 bits 2 bits 4 bits 4 bits 7 bits 1 bit 4 bits
funct OP => ADD, AND,..
6 bits
ALUControl=
RegWrite=1
ALUSrc=0
RegSrc=0
cmd
MemWrite=0
MemtoReg=0
ALUFlags
CLK
CLK
SrcA
RD
Memory
3:0 (Rm)
Register
Instr
WE
Instr
0 ALUResult
ALU
A2 A RD
File
1
Memory
0
ReadData
RD2
SrcB
Data
A3 1
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0
+
PCPlus4
4
Result
26
Datapath : Branch
31:28 27:26 25 24 23 22 21 20 19:8 7:0
X imm8
Branch
cond op X X U X X X
B{cond} LABEL
4 bits 2 bits 12 bits 8 bits
funct LABEL encoded as #±imm8
6 bits
0100 or 0010
ALUControl=
RegWrite=0
ALUSrc=11
PCSrc=1
RegSrc=X
MemWrite=0
MemtoReg=0
ALUFlags
CLK
CLK
1 0
1 PC+ PC 19:16 (Rn) WE3
A A1 RD1 0 CLK
SrcA
0 RD
Memory
3:0 (Rm) 1
Register
Instr
WE
Instr
0 ALUResult
ALU
A2 A RD
File
1
Memory
0
ReadData
RD2
SrcB
Data
A3 1
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0
+
PCPlus4
4
Result
27
Single-Cycle Processor with Control
Conditional
PCSrc Unit
Condition
Check
31:28 (cond) CLK
Status
PCS
D Q Z
FlagWrite
27:26 op En
MemtoReg
Decoder
MemWrite
25:20 ALUControl
funct
ALUSrc
RegWrite
ALUFlags
RegSrc
CLK
CLK
1 0
1 PC+ PC 19:16 (Rn) WE3
A A1 RD1 0 CLK
SrcA
0 RD
Memory
3:0 (Rm) 1
Register
Instr
WE
Instr
0 ALUResult
ALU
A2 A RD
File
1
Memory
0
ReadData
RD2
SrcB
Data
A3 1
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0
+
PCPlus4
4
Result
28
Control Unit Design
Decoder All expressions use C syntax; all values
are in binary (0b prefix not explicitly
◼ PCS = (op==10) written for convenience)
◼ op = Instr[27:26]
◼ Asserted only for branch, to write branch target to PC. Passed
through conditional unit before being used in the datapath
◼ FlagWrite = (op==00) && (S==1)
◼ S = funct[0] = Instr[20]
◼ Asserted for DP with S suffix, as only they modify flags
◼ MemtoReg = (op==01) && (L==1)
◼ L = funct[0] = Instr[20]
◼ Asserted only for load, as the destination register gets data read
from the data memory
◼ MemWrite = (op==01) && (L==0)
◼ Asserted only for store, as store alone writes to the data memory
29
Control Unit Design
32
Single-Cycle Performance
Conditional
PCSrc Unit
Condition
Check
31:28 (cond) CLK
Status
PCS
D Q Z
FlagWrite
27:26 op En
MemtoReg
Decoder
MemWrite
25:20 ALUControl
funct
ALUSrc
RegWrite
ALUFlags
RegSrc
CLK
CLK
1 0
1 PC+ PC 19:16 (Rn) WE3
A A1 RD1 0 CLK
SrcA
0 RD
Memory
3:0 (Rm) 1
Register
Instr
WE
Instr
0 ALUResult
ALU
A2 A RD
File
1
Memory
0
ReadData
RD2
SrcB
Data
A3 1
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0
+
PCPlus4
4
Result
◼ Four loads:
◼ Speedup
= 8/3.5 = 2.3
◼ Non-stop:
◼ Speedup
= 2n/(0.5n + 1.5)
≈ 4 = # of stages
Other Formats : Self Reading
Note :
◼ No need to memorize. Required info will be given in the exam if need be
◼ Some bits which were left as don’t cares are used in the following slides
DP Operations : cmd
cmd Instruction Operation
0000 AND Logical AND
0001 EOR Logical Exclusive OR
0010 SUB Subtract
0011 RSB Reverse Subtract
0100 ADD Add
0101 ADC Add with Carry
0110 SBC Subtract with Carry
0111 RSC Reverse Subtract with Carry
1000 TST Test Update flags after AND
1001 TEQ Test Equivalence Update flags after EOR
1010 CMP Compare Update flags after SUB
1011 CMN Compare Negated Update flags after ADD
1100 ORR Logical OR
1101 MOV Move
1110 BIC Bit Clear
1111 MVN Move Not
Note : Multiplication is not one of the 16 ALU operations, though it is considered a DP operation.
Multiplication is done in a separate multiplication unit and is a bit different from other DP operations.
37
Multiply Instruction Format
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:5 4 3:0
X op I cmd S Rn Rd Rs X M Rm
4 bits 2 bits 4 bits 4 bits 4 bits 3 bits 1 bit 4 bits
funct
6 bits
38
Memory Instruction Format
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0
X op X P U X W L Rn Rd X imm8
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits
Note: PC relative mode is identical
to offset mode, with Rn=R15 and
offset computed automatically by
◼ funct the assembler. Note that R15 is
always read as PC+4. However,
◼ U : Add the processor we designed does
◼ 0b1 -> positive offset; 0b0 -> negative offset not support reads from R15
except for branch instruction.
◼ L : Load
◼ 0b1 -> load; 0b0 -> store
◼ P : Preindex
◼ W : Writeback
◼ PW = 0b00 -> postindex 0b01 -> unsupported
0b10 -> offset 0b11 -> preindex
39
Branch Condition Codes (cond)
Condition
cond Mnemonic Name Checked
◼ Flags are set by 0000 EQ Equal 𝑍
instructions with 0001 NE Not equal 𝑍ҧ
Interpretation based
on SUBS/CMP
40