0% found this document useful (0 votes)
7 views

L03

The lecture focuses on the RISC-V RV32I Instruction Set Architecture, outlining its structure, including the concepts of architecture, microarchitecture, and realization. It discusses the stored program architecture, instruction set architecture (ISA), and the various instruction classes such as arithmetic, data movement, and control flow operations. Additionally, it covers specific assembly programming examples, instruction encodings, and the organization of load/store instructions within the RV32I framework.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

L03

The lecture focuses on the RISC-V RV32I Instruction Set Architecture, outlining its structure, including the concepts of architecture, microarchitecture, and realization. It discusses the stored program architecture, instruction set architecture (ISA), and the various instruction classes such as arithmetic, data movement, and control flow operations. Additionally, it covers specific assembly programming examples, instruction encodings, and the organization of load/store instructions within the RV32I framework.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

18-447 Lecture 3:

RISC-V Instruction Set Architecture

James C. Hoe
Department of ECE
Carnegie Mellon University

18-447-S24-L03-S1, James C. Hoe, CMU/ECE/CALCM, ©2024


Housekeeping
• Your goal today
– get bootstrapped on RISC-V RV32I to start Lab 1
(will return to visit general ISA issues on 4th meeting)
• Notices
– Check Canvas and Piazza regularly
– Student survey (on Canvas), due next Wed
– H02: Lab 1, Part A, due Week 3
– H03: Lab 1, Part B, due Week 4
• Readings
– P&H Ch2 (for today)
– P&H Ch4.1~4.4 (for next time)
18-447-S24-L03-S2, James C. Hoe, CMU/ECE/CALCM, ©2024
What we mean by “architecture”?
(with quotes)

18-447-S24-L03-S3, James C. Hoe, CMU/ECE/CALCM, ©2024


How to specify a clock design?
function/performance/implementation
Must understand all to design a good clock

• “Architecture”

conceptual
– a clock has an hour hand and a minute hand, .....

Can read a clock w.o. knowing how it keeps time


Can make a clock w.o. knowing how time is used

physical
• Microarchitecture (think blueprint)
– a particular clockwork has a certain set of gears
arranged in a certain configuration
• Realization
– machined alloy gears vs stamped sheet metal

18-447-S24-L03-S4, James C. Hoe, CMU/ECE/CALCM, ©2024 [Computer Architecture, Blaauw and Brooks, 1997]
How to specify a computer design?
• “Architecture”

conceptual
– a computer does ….????….
Computer Architecture

Can read a clock w.o. knowing how it keeps time


Can make a clock w.o. knowing how time is used

physical
• Microarchitecture (think blueprint)
– a particular computer design has a certain
datapath and a certain control logic

• Realization
– CMOS vs ECL vs vacuum tubes
18-447-S24-L03-S5, James C. Hoe, CMU/ECE/CALCM, ©2024 [Computer Architecture, Blaauw and Brooks, 1997]
Stored Program Architecture
a.k.a. von Neumann
• Memory holds both program and data
– instructions and data in a linear memory array
– instructions can be modified as data
• Sequential instruction processing
1. program counter (PC) identifies current instruction
2. fetch instruction from memory
3. update state (e.g. PC and memory) as a function of
current state according to instruction
4. repeat program counter

0 1 2 3 4 5...

Dominant paradigm since its conception
18-447-S24-L03-S6, James C. Hoe, CMU/ECE/CALCM, ©2024
Instruction Set Architecture (ISA):
A Concrete Specification

[images from Wikipedia]


18-447-S24-L03-S7, James C. Hoe, CMU/ECE/CALCM, ©2024
“ISA” in a nut shell
• A stable programming target (to last for decades)
– binary compatibility for SW investments
– permits adoption of foreseeable technology
Better to compromise immediate optimality for
future scalability and compatibility
• Dominant paradigm has been “von Neumann”
– programmer-visible state: mem, registers, PC, etc.
– instructions to modified state; each prescribes
• which state elements are read
• which state elementsincluding PCupdated
• how to compute new values of update state
Atomic, sequential, in-order
18-447-S24-L03-S8, James C. Hoe, CMU/ECE/CALCM, ©2024
3 Instruction Classes (as convention)
• Arithmetic and logical operations
– fetch operands from specified locations
– compute a result as a function of the operands
– store result to a specified location
– update PC to next sequential instruction address
• Data “movement” operations (no compute)
– fetch operands from specified locations
– store operand values to specified locations
– update PC to next sequential instruction address
• Control flow operations (affects only PC)
– fetch operands from specified locations
– compute a branch condition and a target address
– if “branch condition is true” then PC  target address
else PC  next seq. inst addr
18-447-S24-L03-S9, James C. Hoe, CMU/ECE/CALCM, ©2024
Complete “ISA” Picture
• User-level ISA
– state and instructions available to user programs
– single-user abstraction on top a “virtualization”
For this course and for now, RV32I of RISC-V
• “Virtual Environment” Architecture
– state and instructions to control virtualization
(e.g., caches, sharing)
– user-level, but for need-to-know uses
• “Operating Environment” Architecture
– state and instructions to implement virtualization
– privileged/protected access reserved for OS system
18-447-S24-L03-S10, James C. Hoe, CMU/ECE/CALCM, ©2024 arch
RV32I Programmer-Visible State
program counter
32-bit “byte” address **note** x0=0
x1
of current instruction x2
general purpose
M[0] register file
M[1] 32x 32-bit words
M[2] named x0...x31
M[3]
M[4]

232 by 8-bit locations (4 GBytes)


indexed using 32-bit “byte” addresses
M[N-1]
(take this literally for now; magic to come)
18-447-S24-L03-S11, James C. Hoe, CMU/ECE/CALCM, ©2024
Register-Register ALU Instructions
• Assembly (e.g., register-register addition)
ADD rd, rs1, rs2
• Machine encoding: R-type
0000000 rs2 rs1 000 rd 0110011
7-bit 5-bit 5-bit 3-bit 5-bit 7-bit
• Semantics
– GPR[rd]  GPR[rs1] + GPR[rs2]
– PC  PC + 4
• Exceptions: none (ignore carry and overflow)
• Variations
– Arithmetic: {ADD, SUB}
– Compare: {signed, unsigned} set if less than
– Logical: {AND, OR, XOR}
– Shift: {Left, Right-Logical, Right-Arithmetic}
18-447-S24-L03-S12, James C. Hoe, CMU/ECE/CALCM, ©2024
R-Type Reg-Reg Instruction Encodings

32-bit R-type ALU

[The RISC-V Instruction Set Manual]


18-447-S24-L03-S13, James C. Hoe, CMU/ECE/CALCM, ©2024
Assembly Programming 101
• Break down high-level program expressions into a
sequence of elemental operations
• E.g. High-level Code
f = ( g + h ) – ( i + j )

• Assembly Code
– suppose f, g, h, i, j are in rf, rg, rh, ri, rj
– suppose rtemp is a free register
add rtemp rg rh # rtemp = g+h
add rf ri rj # rf = i+j
sub rf rtemp rf # f = rtemp – rf
18-447-S24-L03-S14, James C. Hoe, CMU/ECE/CALCM, ©2024
Reg-Immediate ALU Instructions
• Assembly (e.g., reg-immediate additions)
ADDI rd, rs1, imm12
• Machine encoding: I-type
imm[11:0] rs1 000 rd 0010011
12-bit 5-bit 3-bit 5-bit 7-bit
• Semantics
– GPR[rd]  GPR[rs1] + sign-extend (imm)
– PC  PC + 4
• Exceptions: none (ignore carry and overflow)
• Variations
– Arithmetic: {ADDI, SUBI}
– Compare: {signed, unsigned} set if less than
– Logical: {ANDI, ORI, XORI}
– **Shifts by unsigned imm[4:0]: {SLLI, SRLI, SRAI}
18-447-S24-L03-S15, James C. Hoe, CMU/ECE/CALCM, ©2024
I-Type Reg-Immediate ALU Inst. Encodings

sign-extended immediate

unsigned matches 32-bit I-type ALU


R-type encoding
Note: SLTIU does unsigned compare with sign-extended immediate
[The RISC-V Instruction Set Manual]
18-447-S24-L03-S16, James C. Hoe, CMU/ECE/CALCM, ©2024
Load-Store Architecture
• RV32I ALU instructions
– operates only on register operands
– next PC always PC+4
• A distinct set of load and store instructions
– dedicated to copying data between register and
memory
– next PC always PC+4
• Another set of “control flow” instructions
– dedicated to manipulating PC (branch, jump, etc.)
– does not affect memory or other registers

18-447-S24-L03-S17, James C. Hoe, CMU/ECE/CALCM, ©2024


Load Instructions
• Assembly (e.g., load 4-byte word) rs1
LW rd, offset12(base)
• Machine encoding: I-type
offset[11:0] base 010 rd 0000011
12-bit 5-bit 3-bit 5-bit 7-bit
• Semantics
– byte_address32 = sign-extend(offset12) + GPR[base]
– GPR[rd]  MEM32[byte_address]
– PC  PC + 4
• Exceptions: none for now
• Variations: LW, LH, LHU, LB, LBU
e.g., LB :: GPR[rd]  sign-extend(MEM8[byte_address])
LBU :: GPR[rd]  zero-extend(MEM8[byte_address])
RV32I is byte-addressable, little-endian (until v20191213)
18-447-S24-L03-S18, James C. Hoe, CMU/ECE/CALCM, ©2024
When data size > address granularity
• 32-bit signed or unsigned integer word is 4 bytes
• By convention we “write” MSB on left: 0x40:49:0f:db
MSB LSB
(most significant) 8-bit 8-bit 8-bit 8-bit (least significant)

• On a byte-addressable machine . . . . . . .
MSB Big Endian LSB MSB Little Endian LSB
byte 0 byte 1 byte 2 byte 3 byte 3 byte 2 byte 1 byte 0
byte 4 byte 5 byte 6 byte 7 byte 7 byte 6 byte 5 byte 4
byte 8 byte 9 byte 10 byte 11 byte 11 byte 10 byte 9 byte 8
byte 12 byte 13 byte 14 byte 15 byte 15 byte 14 byte 13 byte 12
byte 16 byte 17 byte 18 byte 19 byte 19 byte 18 byte 17 byte 16
pointer points to the big end pointer points to the little end
• What difference does it make?
18-447-S24-L03-S19, James C. Hoe, CMU/ECE/CALCM, ©2024
Load/Store Data Alignment
MSB byte-3 byte-2 byte-1 byte-0 LSB
byte-7 byte-6 byte-5 byte-4
• Access granularity not same as addressing granularity
– physical implementations of memory and memory
interface optimize for natural alignment boundaries
(i.e., return an aligned 4-byte word per access)
– unaligned loads or stores would require 2 separate
accesses to memory
• Common for RISC ISAs to disallow misaligned
loads/stores; if necessary, use a code sequence of
aligned loads/stores and shifts
• RV32I (until v20191213) allowed misaligned loads/
stores but warns it could be very slow; if necessary, . . .
18-447-S24-L03-S20, James C. Hoe, CMU/ECE/CALCM, ©2024
Store Instructions
• Assembly (e.g., store 4-byte word)
SW rs2, offset12(base)
• Machine encoding: S-type
offset[11:5] rs2 base 010 ofst[4:0] 0100011
7-bit 5-bit 5-bit 3-bit 5-bit 7-bit
• Semantics
– byte_address32 = sign-extend(offset12) + GPR[base]
– MEM32[byte_address]  GPR[rs2]
– PC  PC + 4
• Exceptions: none for now
• Variations: SW, SH, SB
e.g., SB:: MEM8[byte_address]  (GPR[rs2])[7:0]
18-447-S24-L03-S21, James C. Hoe, CMU/ECE/CALCM, ©2024
Assembly Programming 201
• E.g. High-level Code
A[ 8 ] = h + A[ 0 ]
where A is an array of integers (4 bytes each)
• Assembly Code
– suppose &A, h are in rA, rh
– suppose rtemp is a free register
LW rtemp 0(rA) # rtemp = A[0]
add rtemp rh rtemp # rtemp = h + A[0]
SW rtemp 32(rA) # A[8] = rtemp
# note A[8] is 32 bytes
# from A[0]
18-447-S24-L03-S22, James C. Hoe, CMU/ECE/CALCM, ©2024
Load/Store Encodings
• Both needs 2 register operands and 1 12-bit
immediate

[The RISC-V Instruction Set Manual]


18-447-S24-L03-S23, James C. Hoe, CMU/ECE/CALCM, ©2024
RV32I Immediate Encoding
• Most RISC ISAs use 1 register-immediate format
opcode rs rt immediate
6-bit 5-bit 5-bit 16-bit
– rt field used as a source (e.g., store) or dest (e.g., load)
– also common to opt for bigger 16-bit immediate
• RV32I adopts 2 different register-immediate formats
(I vs S) to keep rs2 operand at inst[24:20] always
• RV32I encodes immediate in non-consecutive bits

18-447-S24-L03-S24, James C. Hoe, CMU/ECE/CALCM, ©2024


RV32I Instruction Formats
• All instructions 4-byte long and 4-byte aligned in mem
• R-type: 3 register operands

• I-type: 2 register operands (with dest) and 12-bit imm

• S/B-type: 2 register operands (no dest) and 12-bit imm

• U/J-type, 1 register operand (dest) and 20-bit imm

Aimed to simplify decoding and field extraction


18-447-S24-L03-S25, James C. Hoe, CMU/ECE/CALCM, ©2024
Control Flow Instructions
Assembly Code
Control Flow Graph
• C-Code (linearized)
code A code A

{ code A } if X==Y if X==Y


if X==Y then True False goto
code B code C
{ code B } code C

else goto
{ code C } code B
{ code D } code D

code D

basic blocks (1-way in, 1-way out, all or nothing)


18-447-S24-L03-S26, James C. Hoe, CMU/ECE/CALCM, ©2024
(Conditional) Branch Instructions
• Assembly (e.g., branch if equal)
BEQ rs1, rs2, imm13 Note: implicit imm[0]=0
Note: real assembler expects a target label or address
• Machine encoding: B-type
imm[12|10:5] rs2 rs1 000 imm[4:1|11] 1100011
7-bit 5-bit 5-bit 3-bit 5-bit 7-bit
• Semantics
– target = PC + sign-extend(imm13)
– if GPR[rs1]==GPR[rs2] then PC  target
else PC  PC + 4
How far can you jump?
• Exceptions: misaligned target (4-byte) if taken
• Variations
– BEQ, BNE, BLT, BGE, BLTU, BGEU
18-447-S24-L03-S27, James C. Hoe, CMU/ECE/CALCM, ©2024
Assembly Programming 301
• E.g. High-level Code
fork
if (i == j) then
then
e = g
else
e = h else
f = e
join
• Assembly Code
– suppose e, f, g, h, i, j are in re, rf, rg, rh, ri, rj
bne ri rj L1 # L1 and L2 are addr labels
# assembler computes offset
add re rg x0 # e = g
beq x0 x0 L2 # goto L2 unconditionally
L1: add re rh x0 # e = h
L2: add rf re x0 # f = e
18-447-S24-L03-S28, James C. Hoe, CMU/ECE/CALCM, ©2024
Assembly Programming 302
• If you write C code:
for (int i=0; i<16; i++) {
sum+=A[i];
}
• GCC –O generates code for:
for (int* a=&A[0]; a<&A[16]; a++) {
sum+=*a;
}
• Assembly Code (suppose sum, A, a are in rsum, rA, ra)
addi ra rA 0 # a=&A[0]
L1: lw rtmp 0(ra) # sum+=*a
add rsum rsum rtmp
addi ra ra 4 # a++
addi rtmp rA 64 # tmp=&A[16]
bltu ra rtmp L1
18-447-S24-L03-S29, James C. Hoe, CMU/ECE/CALCM, ©2024
Function Call and Return
......
A: BEQ x0, x0, F
......
......
B: BEQ x0, x0, F
......

F: ......
?
......
......
BEQ x0, x0, A+4

A function return need to 1. jump back to different callers


2. know where to jump back to
18-447-S24-L03-S30, James C. Hoe, CMU/ECE/CALCM, ©2024
Jump and Link Instruction
• Assembly
JAL rd imm21 Note: implicit imm[0]=0
Note: real assembler expects a target label or address
• Machine encoding: J-type
imm[20|10:1|11|19:12] rd 1101111
20-bit 5-bit 7-bit
• Semantics
– target = PC + sign-extend(imm21)
– GPR[rd]  PC + 4
– PC  target How far can you jump?
• Exceptions: misaligned target (4-byte)

18-447-S24-L03-S31, James C. Hoe, CMU/ECE/CALCM, ©2024


Jump Indirect Instruction
• Assembly
JALR rd, rs1, imm12
• Machine encoding: I-type
imm[11:0] rs1 000 rd 1100111
12-bit 5-bit 3-bit 5-bit 7-bit
• Semantics
– target = GPR[rs1] + sign-extend(imm12)
– target &= 0xffff_fffe
– GPR[rd]  PC + 4
– PC  target How far can you jump?
• Exceptions: misaligned target (4-byte)

18-447-S24-L03-S32, James C. Hoe, CMU/ECE/CALCM, ©2024


Assembly Programming 401
Caller Callee
... code A ... _myfxn: ... code B ...
JAL x1, _myfxn JALR x0,x1,0
... code C ...
JAL x1, _myfxn
... code D ...

• ..... A call B return C call B return D .....


• How do you pass argument between caller and callee?
• If A set x10 to 1, what is the value of x10 when B returns
to C?
• What registers can B use?
• What happens to x1 if B calls another function
18-447-S24-L03-S33, James C. Hoe, CMU/ECE/CALCM, ©2024
Caller and Callee Saved Registers
• Callee-Saved Registers
– caller says to callee, “The values of these registers
should not change when you return to me.”
– callee says, “If I need to use these registers, I
promise to save the old values to memory first and
restore them before I return to you.”
• Caller-Saved Registers
– caller says to callee, “If there is anything I care
about in these registers, I already saved it myself.”
– callee says to caller, “Don’t count on them staying
the same values after I am done.
• Unlike endianness, this is not arbitrary
18-447-S24-L03-S34, James C. Hoe, CMU/ECE/CALCM, ©2024
When to use which?
RISC-V Register Usage Convention

[The RISC-V Instruction Set Manual]


18-447-S24-L03-S35, James C. Hoe, CMU/ECE/CALCM, ©2024
Memory Usage Convention
high address
stack space

grow down

free space stack pointer


GPR[x2]
grow up

dynamic data

static data
binary executable
text

reserved
low address
18-447-S24-L03-S36, James C. Hoe, CMU/ECE/CALCM, ©2024
Basic Calling Convention
1. caller saves caller-saved registers
2. caller loads arguments into a0~a7 (x10~x17)
3. caller jumps to callee using JAL x1

prologue
4. callee allocates space on the stack (dec. stack pointer)
5. callee saves callee-saved registers to stack
function

....... body of callee (can “nest” additional calls) .......

6. callee loads results to a0, a1 (x10, x11)

epilogue
7. callee restores saved register values
8. JALR x0, x1

9. caller continues with return values in a0, a1


18-447-S24-L03-S37, James C. Hoe, CMU/ECE/CALCM, ©2024
Terminologies
• Instruction Set Architecture
– machine state and functionality as observable and
controllable by the programmer
• Instruction Set
– set of commands supported
• Machine Code
– instructions encoded in binary format
– directly consumable by the hardware
• Assembly Code
– instructions in “textual” form, e.g. add r1, r2, r3
– converted to machine code by an assembler
– one-to-one correspondence with machine code
(mostly true: compound instructions, labels ....)
18-447-S24-L03-S38, James C. Hoe, CMU/ECE/CALCM, ©2024
We didn’t talk about
• Privileged Modes
– user vs. supervisor
• Exception Handling
– trap to supervisor handling routine and back
• Virtual Memory
– each process has 4-GBytes of private, large, linear
and fast memory?
• Floating-Point Instructions

18-447-S24-L03-S39, James C. Hoe, CMU/ECE/CALCM, ©2024

You might also like