0% found this document useful (0 votes)
55 views

CG2028 Lecture 4

The document provides an overview of the key components of a single cycle processor design, including: 1) It describes three instruction formats - data processing, memory access, and branch instructions. 2) It explains the datapath and controller design that are needed to implement the instruction set in hardware. 3) It discusses extending the design to support additional instruction formats and operations.

Uploaded by

Christopher
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

CG2028 Lecture 4

The document provides an overview of the key components of a single cycle processor design, including: 1) It describes three instruction formats - data processing, memory access, and branch instructions. 2) It explains the datapath and controller design that are needed to implement the instruction set in hardware. 3) It discusses extending the design to support additional instruction formats and operations.

Uploaded by

Christopher
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

CG2028 Lecture 4 :

Single Cycle Processor Design

Rajesh Panicker, ECE, NUS

Acknowledgement / References:
◼ Text by Patterson and Hennessey
◼ Text and by Harris and Harris
Contents

◼ ARM-like instruction encoding


◼ The design and encoding in this chapter is not compliant
with any version of ARM Architecture, though it is closer to
ARMv3 than ARMv7M covered in chapter 3
◼ We will be looking at a simplified version
(different from all books/references, but somewhat similar
to the one in Harris and Harris)
◼ Microarchitecture : Datapath design

◼ Microarchitecture : Controller design

◼ Extensions for other formats / instructions

2
CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Write registers

◼ Clock period: duration of a clock cycle


◼ e.g., 500ps = 0.5ns = 500×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 2.0GHz = 2000MHz = 2.0×109Hz
3
Critical Path

◼ Critical path = combinational path with maximum delay


◼ Determines the max. clock
◼ assuming FFs have negligible setup time and propagation delay

FF 10 ns
Max clock
15 ns FF 10 ns FF = 1/Critical path delay
= 1/40 ns = 25 MHz
FF 25 ns

FF 10 ns FF
Max clock
15 ns 10 ns = 1/Critical path delay
FF
= 1/25 ns = 40 MHz
FF 25 ns FF

4
Architecture vs Microarchitecture
◼ Architecture: programmer’s view of computer
◼ Defined by instructions & operand locations
◼ Assembly language: human-readable format of
instructions
◼ Machine language: computer-readable format
(1’s and 0’s)
◼ Assembly language -> Machine language
conversion is done by the assembler
◼ one to one correspondence

(except for pseudo-instructions)


◼ Microarchitecture: digital designer’s view of
the computer
◼ How to implement an architecture in hardware
◼ Different designs possible to execute the same code
◼ Price-performance trade-off
5
Architecture : Instruction Formats

◼ We will be looking at 3 instruction formats, with


restrictions
◼ Data-processing (DP) - ADD, SUB, AND, ORR

◼ 2 variants – register and immediate Operand2


◼ Other DP operations not supported
◼ Memory - LDR, STR
◼ Only offset mode
◼ Magnitude of the offset is 8 bits
◼ Branch - B, BEQ
◼ Only unconditional and EQ
◼ Magnitude of the offset from PC+4 is 8 bits
◼ No reads or writes to/from PC/R15 other than by branch instruction

6
DP Register Operand2 Format
X=unused/irrelevant
31:28 27:26 25 24:21 20 19:16 15:12 11:5 4 3:0

X op I cmd S Rn Rd X M Rm
4 bits 2 bits 4 bits 4 bits 7 bits 1 bit 4 bits
funct
6 bits
◼ Operands OP{S} Rd, Rn, Rm
◼ Rn : first source register OP => ADD, SUB, AND, ORR
◼ Rm : second source register
◼ Rd : destination register
◼ Control fields
◼ op : the operation code or opcode
◼ op = 0b00 for data-processing (DP) instructions
◼ funct is composed of cmd, I-bit, and S-bit
◼ cmd = 0b0000 for AND, 0b0010 for SUB, 0b0100 for ADD, 0b1100 for ORR
◼ I = immediate = 0b0 for register Operand2
◼ S = set flags = 0b1 if the suffix S is specified, for example, ADDS, ANDS
◼ M = 0b0
7
DP Register Operand2 Example

◼ ADDS R2, R3, R5 <OP Rd, Rn, Rm>


◼ op = 0b00 for DP instructions
◼ cmd = 0b0100 for ADD
◼ Operand2 is a register so I=0b0
◼ Sets the flags, so S=0b1
◼ Rd = 2, Rn = 3
◼ Rm = 5
◼ Assume X’s are 0s

31:28 27:26 25 24:21 20 19:16 15:12 11:5 4 3:0

X op I cmd S Rn Rd X M Rm
0000 00 0 0100 1 0011 0010 0000000 0 0101

ADDS R2, R3, R5 => 0x00932005

8
DP Immediate Operand2 Format
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:0

X op I cmd S Rn Rd X imm8
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

◼ Operands OP{S} Rd, Rn, #imm8


OP => ADD, SUB, AND, ORR
◼ Rn : first source register
◼ imm8 : 8-bit unsigned immediate
◼ Rd : destination register
◼ Control fields
◼ op : the operation code or opcode
◼ op = 0b00 for data-processing (DP) instructions
◼ funct is composed of cmd, I-bit, and S-bit
◼ cmd = 0b0000 for AND, 0b0010 for SUB, 0b0100 for ADD, 0b1100 for ORR
◼ I = immediate = 0b1 for immediate Operand2
◼ S = set flags = 0b1 if the suffix S is specified, for example, ADDS, ANDS
9
DP Immediate Operand2 Example

◼ SUB R2, R3, #0xAB <OP Rd, Rn, #imm8>


◼ op = 0b00 for DP instructions
◼ cmd = 0b0010 for SUB
◼ Operand2 is an immediate so I=0b1
◼ Doesn’t set the flags, so S=0b0
◼ Rd = 2, Rn = 3
◼ imm8 = 0xAB
◼ Assume X’s are 0s

31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:0

X op I cmd S Rn Rd X imm8
0000 00 1 0010 0 0011 0010 0000 10101011

SUB R2, R3, #0xAB => 0x024320AB

10
Memory Instruction Format
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0

X op X P U X W L Rn Rd X imm8
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

OP Rd, [Rn, #±imm8]


◼ Encodes LDR, STR OP => LDR, STR
◼ op : 0b01 for memory instructions

◼ funct : 6 control bits

◼ U : Add

◼ 0b1 -> offset is positive, i.e., effective address = Rn + imm8


◼ 0b0 -> the offset is negative, i.e., effective address = Rn – imm8
◼ L = 0b1 for load; 0b0 for store
◼ PW = 0b10
◼ Rn : base register
◼ Rd : destination (load), source (store)
◼ imm8 : magnitude of offset
11
Memory Instruction Example

<OP Rd, [Rn, #±imm8]>


◼ STR R11, [R5, #-26]
◼ Operation: mem[R5-26] <= R11;
◼ op = 0b01 for memory instruction
◼ U = 0b0 (offset is negative, subtract)
◼ L = 0b0 (store)
◼ Rd = 11, Rn = 5, imm8 = 26 = 0x1A

31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0

X op X P U X W L Rn Rd X imm8
0000 01 0 1 0 0 0 0 0101 1011 0000 00011010

STR R11, [R5, #-26] => 0x0505B01A

12
Branch Instruction Format
31:28 27:26 25 24 23 22 21 20 19:8 7:0

cond op X X U X X X X imm8
4 bits 2 bits 12 bits 8 bits
funct
6 bits
B{cond} LABEL
◼ Encodes B{cond} LABEL encoded as #±imm8

◼ cond : condition to be true for the branch to be taken


◼ EQ = 0b0000
◼ AL (always a.k.a unconditional) = 0b1110
◼ op = 0b10 for branch instructions
◼ imm8 : 8-bit immediate encoding Branch Target Address (BTA)
◼ BTA = address corresponding to LABEL = Next PC when branch

taken
◼ imm8 = # of bytes BTA is away from current PC+4

◼ U : add
◼ 0b1 -> BTA = PC+4+imm8; 0b0 -> BTA = PC+4-imm8

13
Branch Instruction Example
0x8040 TEST: LDR R5, [R0, #4] BTA • PC = 0x8050
0x8044 STR R5, [R1, #1] • PC+4 = 0x8054
0x8048 ADD R3, R3, #1 • BTA = TEST = 0x8040
0x804C MOV R5, R4 • offset = 0x8040-0x8054
0x8050 B TEST PC = -0x14,
0x8054 LDR R3, [R1] PC+4 encoded as U = 0b0
0x8058 SUB R4, R3, #9 imm8 = 0x14

◼ B TEST
◼ cond = 0b1110 for unconditional branch

◼ op = 0b10 for branch instructions

◼ U = 0b0 (subtract); imm8 = 0x14

31:28 27:26 25 24 23 22 21 20 19:8 7:0

cond op X X U X X X X imm8
1110 10 0 0 0 0 0 0 000000000000 00010100

B TEST => 0xE8000014


14
Review : Instruction Formats

31:28 27:26 25 24:21 20 19:16 15:12 11:5 4 3:0 DP register


X op I cmd S Rn Rd X M Rm OP{S} Rd, Rn, Rm
OP => ADD, AND,..
4 bits 2 bits 4 bits 4 bits 7 bits 1 bit 4 bits
funct
6 bits

31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:0 DP immediate


X op I cmd S Rn Rd X imm8 OP{S} Rd, Rn, #imm8
OP => ADD, AND,..
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0 Memory


X op X P U X W L Rn Rd X imm8 OP Rd, [Rn, #±imm8]
OP => LDR, STR
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

31:28 27:26 25 24 23 22 21 20 19:8 7:0


Branch
B{cond} LABEL
cond op X X U X X X X imm8 LABEL encoded as
4 bits 2 bits 12 bits 8 bits #±imm8
funct
6 bits

Take a printout of this page and keep it for


quick reference for the rest of this chapter 15
Microarchitecture

◼ Processor
◼ Datapath: functional blocks
◼ Control: control signals

◼ Basic styles
◼ Single-cycle: Each instruction executes in a
single cycle
◼ Multicycle: Each instruction is broken up into
series of shorter steps
◼ Pipelined: Each instruction broken up into series
of steps & multiple instructions execute at once

16
Architectural State Elements
◼ Architectural state determines everything about a
processor (state of program execution) CLK

Status
◼ 16 registers (including PC) D Q Z
◼ Memory En

◼ Status register (flags)


◼ Changed by all instructions, at the clock edge
following it
CLK
CLK

PC+ PC WE3
A A1 RD1 CLK
RD
Memory

Register
Instr

WE
Instr

A2 A RD

File

Memory
RD2

Data
A3
WD3
WD

15 registers (i.e., all except PC) shown as a register file

17
Datapath : LDR – Fetch

◼ STEP 1: Fetch instruction


◼ Assume memories can be read combinationally – no clock

CLK
CLK

PC+ PC WE3
A A1 RD1 CLK
RD
Memory

Register
Instr

WE
Instr

A2 A RD

File

Memory
RD2

Data
A3
WD3
WD

18
Datapath : LDR – Read RF
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0

X op X P U X W L Rn Rd X imm8 LDR Rd, [Rn, #±imm8]


4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

◼ STEP 2: Read first source operand (Rn) from Register


File
◼ Reading register doesn’t need clock
CLK
CLK

PC+ PC 19:16 (Rn) WE3


A A1 RD1 CLK
RD
Memory

Register
Instr

WE
Instr

A2 A RD

File

Memory
RD2

Data
A3
WD3
WD

19
Datapath : LDR – Extend Immediate
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0

X op X P U X W L Rn Rd X imm8 LDR Rd, [Rn, #±imm8]


4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

◼ STEP 3: Extend the immediate to make it 32 bit


◼ Insert 24 zeros in front (24 MSBs hard-wired to 0s)

CLK
CLK

PC+ PC 19:16 (Rn) WE3


A A1 RD1 CLK
RD
Memory

Register
Instr

WE
Instr

A2 A RD

File

Memory
RD2

Data
A3
WD3
WD
7:0 (imm8)
Extend ExtImm

20
Datapath : LDR – Data Mem Address
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0

X op X P U X W L Rn Rd X imm8 LDR Rd, [Rn, #±imm8]


4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

◼ STEP 4: Compute the data memory address


◼ ALU performs add or sub depending on the offset sign

0100 or 0010
ALUControl=
CLK
CLK

PC+ PC 19:16 (Rn) WE3


A A1 RD1 CLK

SrcA
RD
Memory

Register
Instr

WE
Instr

ALUResult

ALU
A2 A RD

File

Memory
RD2

SrcB

Data
A3
WD3
WD
7:0 (imm8)
Extend ExtImm

21
Datapath : LDR – Read Data Mem
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0

X op X P U X W L Rn Rd X imm8 LDR Rd, [Rn, #±imm8]


4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

◼ STEP 5: Read data from memory and write it back to


register file
◼ Writing register needs clock

0100 or 0010
ALUControl=
RegWrite=1
CLK
CLK

PC+ PC 19:16 (Rn) WE3


A A1 RD1 CLK

SrcA
RD
Memory

Register
Instr

WE
Instr

ALUResult

ALU
A2 A RD

File

Memory

ReadData
RD2

SrcB

Data
A3
15:12 (Rd) WD3
WD
7:0 (imm8)
Extend ExtImm

Result

22
Datapath : LDR – PC Increment

◼ STEP 6: Determine address of next instruction


◼ LDR Done!

0100 or 0010
ALUControl=
RegWrite=1
CLK
CLK

PC+ PC 19:16 (Rn) WE3


A A1 RD1 CLK

SrcA
RD
Memory

Register
Instr

WE
Instr

ALUResult

ALU
A2 A RD

File

Memory

ReadData
RD2

SrcB

Data
A3
15:12 (Rd) WD3
WD
7:0 (imm8)
Extend ExtImm
+
PCPlus4
4

Result

23
Datapath : STR
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0

X op X P U X W L Rn Rd X imm8 STR Rd, [Rn, #±imm8]


4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits

◼ Write data in Rd to memory


◼ Note that Rd is a source operand for STR
◼ Writing memory needs a clock

0100 or 0010
ALUControl=
RegWrite=0

MemWrite=1
CLK
CLK

PC+ PC 19:16 (Rn) WE3


A A1 RD1 CLK

SrcA
RD
Memory

Register
Instr

WE
Instr

ALUResult

ALU
A2 A RD

File

Memory

ReadData
RD2

SrcB

Data
A3
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8)
Extend ExtImm
+
PCPlus4
4

Result

24
Datapath : Data Processing (Immediate)
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:0

op I cmd S Rn Rd X imm8
DP immediate
X
OP{S} Rd, Rn, #imm8
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct OP => ADD, AND,..
6 bits

◼ With immediate Operand2


◼ Read from Rn and ExtImm
◼ Write ALUResult to register file (Rd )

ALUControl=
RegWrite=1

cmd

MemWrite=0

MemtoReg=0
ALUFlags
CLK
CLK

PC+ PC 19:16 (Rn) WE3


A A1 RD1 CLK

SrcA
RD
Memory

Register
Instr

WE
Instr

ALUResult

ALU
A2 A RD

File

Memory

ReadData
RD2

SrcB

Data
A3
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0

+
PCPlus4
4

Result

25
Datapath : Data Processing (Register)
31:28 27:26 25 24:21 20 19:16 15:12 11:5 4 3:0

op I cmd S Rn Rd X Rm
DP register
X M
OP{S} Rd, Rn, Rm
4 bits 2 bits 4 bits 4 bits 7 bits 1 bit 4 bits
funct OP => ADD, AND,..
6 bits

◼ With register Operand2


◼ Read from Rn and Rm (instead of ExtImm)

ALUControl=
RegWrite=1

ALUSrc=0
RegSrc=0

cmd

MemWrite=0

MemtoReg=0
ALUFlags
CLK
CLK

PC+ PC 19:16 (Rn) WE3


A A1 RD1 CLK

SrcA
RD
Memory

3:0 (Rm)

Register
Instr

WE
Instr

0 ALUResult

ALU
A2 A RD

File
1

Memory
0

ReadData
RD2

SrcB

Data
A3 1
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0

+
PCPlus4
4

Result

26
Datapath : Branch
31:28 27:26 25 24 23 22 21 20 19:8 7:0

X imm8
Branch
cond op X X U X X X
B{cond} LABEL
4 bits 2 bits 12 bits 8 bits
funct LABEL encoded as #±imm8
6 bits

◼ Calculate branch target address


◼ BTA = (PC + 4) +/- (ExtImm)

0100 or 0010
ALUControl=
RegWrite=0

ALUSrc=11
PCSrc=1

RegSrc=X

MemWrite=0

MemtoReg=0
ALUFlags
CLK
CLK
1 0
1 PC+ PC 19:16 (Rn) WE3
A A1 RD1 0 CLK

SrcA
0 RD
Memory

3:0 (Rm) 1

Register
Instr

WE
Instr

0 ALUResult

ALU
A2 A RD

File
1

Memory
0

ReadData
RD2

SrcB

Data
A3 1
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0

+
PCPlus4
4

Result

27
Single-Cycle Processor with Control
Conditional
PCSrc Unit

Condition
Check
31:28 (cond) CLK

Status
PCS
D Q Z
FlagWrite
27:26 op En
MemtoReg

Decoder
MemWrite
25:20 ALUControl
funct
ALUSrc
RegWrite

ALUFlags
RegSrc
CLK
CLK
1 0
1 PC+ PC 19:16 (Rn) WE3
A A1 RD1 0 CLK

SrcA
0 RD
Memory

3:0 (Rm) 1

Register
Instr

WE
Instr

0 ALUResult

ALU
A2 A RD

File
1

Memory
0

ReadData
RD2

SrcB

Data
A3 1
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0

+
PCPlus4
4

Result

28
Control Unit Design
Decoder All expressions use C syntax; all values
are in binary (0b prefix not explicitly
◼ PCS = (op==10) written for convenience)

◼ op = Instr[27:26]
◼ Asserted only for branch, to write branch target to PC. Passed
through conditional unit before being used in the datapath
◼ FlagWrite = (op==00) && (S==1)
◼ S = funct[0] = Instr[20]
◼ Asserted for DP with S suffix, as only they modify flags
◼ MemtoReg = (op==01) && (L==1)
◼ L = funct[0] = Instr[20]
◼ Asserted only for load, as the destination register gets data read
from the data memory
◼ MemWrite = (op==01) && (L==0)
◼ Asserted only for store, as store alone writes to the data memory
29
Control Unit Design

◼ ALUControl = (op==00)? cmd : (U? 0100:0010)


◼ U = funct[3] = Instr[23]
◼ 0100 – ALUControl for addition, 0010 – ALUControl for subtraction
◼ For DP, ALUControl is cmd. For memory and branch, U bit decides
whether imm8 is added or subtracted (i.e., whether the offset is
positive or negative)
◼ ALUSrc[0] = !( (op==00) && (I==0) )
◼ I = funct[5] = Instr[25]
◼ For all except DP with register as Operand2, ALU_SrcB is
immediate
◼ ALUSrc[1] = PCS
◼ ALU_SrcA is PCPlus4 only for branch (doesn’t matter whether
branch is taken or not. ALUResult is discarded when the branch is
not taken anyway)
30
Control Unit Design

◼ RegWrite = (op==00) || ( (op==01) && (L==1) )


◼ All DP instructions and load write to a destination register, branch
and store doesn’t
◼ RegSrc = MemWrite
◼ For store, RA2 = Rd. For all other instructions reading a second
register, RA2 is Rm
Condition Check
◼ PCSrc = PCS && ( (cond==0000) ? (Z==1) : 1 )

◼ For a branch instruction


◼ When the condition specified is EQ (0000) and when Z flag is set, branch is
taken
◼ When the condition specified is AL (1110), branch is taken irrespective of the
flags. For simplicity, we just ignore flags if the condition specified is not EQ
◼ This will cause ALUResult (PCPlus4+/-imm8) to be written to PC
instead of PCPlus4
31
Extended Functionality : CMP

◼ Recall : CMP does subtraction, sets the flags and discards


the result
◼ Has S = 1
◼ cmd = 1010
◼ Like SUBS, but the result is not written to a register. So we need
to modify RegWrite signal
◼ RegWrite = ( (op==00) && !NoWrite ) || ( (op==01) &&
(L==1) )
◼ Where NoWrite = (cmd==1010)
◼ NoWrite can be easily extended to accommodate other DP
instructions (TST, TEQ, CMN) which do not write the result

◼ No change to datapath needed for implementation!

32
Single-Cycle Performance
Conditional
PCSrc Unit

Condition
Check
31:28 (cond) CLK

Status
PCS
D Q Z
FlagWrite
27:26 op En
MemtoReg

Decoder
MemWrite
25:20 ALUControl
funct
ALUSrc
RegWrite

ALUFlags
RegSrc
CLK
CLK
1 0
1 PC+ PC 19:16 (Rn) WE3
A A1 RD1 0 CLK

SrcA
0 RD
Memory

3:0 (Rm) 1

Register
Instr

WE
Instr

0 ALUResult

ALU
A2 A RD

File
1

Memory
0

ReadData
RD2

SrcB

Data
A3 1
15:12 (Rd) WD3
WD
WriteData
7:0 (imm8) 1
Extend ExtImm 0

+
PCPlus4
4

Result

◼ TC (and hence, performance) limited by critical path (LDR)


33
Single Cycle Design Summary

◼ Single-cycle - fetch, decode and execute each instruction


in one clock cycle
◼ (+) simple
◼ (–) no datapath resource can be used more than once per
instruction, so some must be duplicated
◼ separate memories for instruction and data
◼ 2 adders/ALUs
◼ (–) cycle time limited by longest instruction (LDR)

◼ How Can We Make It Faster?


◼ Pipelining
◼ Superscalar
◼ …… stay tuned for Chapter 6!
34
Pipelining Analogy. Coming Soon!
◼ Pipelined laundry: overlapping execution
◼ Parallelism improves performance

◼ Four loads:
◼ Speedup
= 8/3.5 = 2.3
◼ Non-stop:
◼ Speedup
= 2n/(0.5n + 1.5)
≈ 4 = # of stages
Other Formats : Self Reading

Note :
◼ No need to memorize. Required info will be given in the exam if need be
◼ Some bits which were left as don’t cares are used in the following slides
DP Operations : cmd
cmd Instruction Operation
0000 AND Logical AND
0001 EOR Logical Exclusive OR
0010 SUB Subtract
0011 RSB Reverse Subtract
0100 ADD Add
0101 ADC Add with Carry
0110 SBC Subtract with Carry
0111 RSC Reverse Subtract with Carry
1000 TST Test Update flags after AND
1001 TEQ Test Equivalence Update flags after EOR
1010 CMP Compare Update flags after SUB
1011 CMN Compare Negated Update flags after ADD
1100 ORR Logical OR
1101 MOV Move
1110 BIC Bit Clear
1111 MVN Move Not

Note : Multiplication is not one of the 16 ALU operations, though it is considered a DP operation.
Multiplication is done in a separate multiplication unit and is a bit different from other DP operations.
37
Multiply Instruction Format
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:5 4 3:0

X op I cmd S Rn Rd Rs X M Rm
4 bits 2 bits 4 bits 4 bits 4 bits 3 bits 1 bit 4 bits
funct
6 bits

◼ MUL Rd, Rm, Rs (Rd = Rm*Rs)


◼ MLA Rd, Rm, Rs, Rn (Rd = Rn + Rm*Rs)
◼ cmd = 0b0000 for MUL, 0b0001 for MLA
◼ M = 0b0 -> usual DP instructions such as ADD, AND,..
0b1 -> MUL and MLA
◼ MUL does not use Rn
◼ Assume MUL and MLA does not set any flags (S bit is 0b0) and cannot
take immediate operands (I bit is 0b0)
◼ Design Hint : You will need a register file with 3 read ports to
implement MLA, as 3 registers need to be read simultaneously

38
Memory Instruction Format
31:28 27:26 25 24 23 22 21 20 19:16 15:12 11:8 7:0

X op X P U X W L Rn Rd X imm8
4 bits 2 bits 4 bits 4 bits 4 bits 8 bits
funct
6 bits
Note: PC relative mode is identical
to offset mode, with Rn=R15 and
offset computed automatically by
◼ funct the assembler. Note that R15 is
always read as PC+4. However,
◼ U : Add the processor we designed does
◼ 0b1 -> positive offset; 0b0 -> negative offset not support reads from R15
except for branch instruction.
◼ L : Load
◼ 0b1 -> load; 0b0 -> store
◼ P : Preindex
◼ W : Writeback
◼ PW = 0b00 -> postindex 0b01 -> unsupported
0b10 -> offset 0b11 -> preindex

39
Branch Condition Codes (cond)

Condition
cond Mnemonic Name Checked
◼ Flags are set by 0000 EQ Equal 𝑍
instructions with 0001 NE Not equal 𝑍ҧ

suffix S 0010 CS / HS Carry set / Unsigned higher or same 𝐶


𝐶ҧ
◼ Example : ADDS 0011 CC / LO Carry clear / Unsigned lower
𝑁
affects flags, ADD 0100 MI Minus / Negative

𝑁
doesn’t 0101 PL Plus / Positive of zero
0110 VS Overflow / Overflow set 𝑉
◼ Exceptions : CMP,
0111 VC No overflow / Overflow clear 𝑉ത
CMN, TST, TEQ which ҧ
1000 HI Unsigned higher 𝑍𝐶
are used only to set
1001 LS Unsigned lower or same 𝑍 𝑂𝑅 𝐶ҧ
flags (result is
discarded) 1010 GE Signed greater than or equal 𝑁⊕𝑉
1011 LT Signed less than 𝑁⊕𝑉

1100 GT Signed greater than ҧ ⊕ 𝑉)


𝑍(𝑁
1101 LE Signed less than or equal 𝑍 𝑂𝑅 (𝑁 ⊕ 𝑉)

1110 AL (or none) Always / unconditional ignored

Interpretation based
on SUBS/CMP

40

You might also like