0% found this document useful (0 votes)
2 views

Slide 3

The lecture introduces RISC-V ISA, a fifth-generation open instruction set architecture developed at UC Berkeley, emphasizing its design principles and applications. Key aspects include data storage, memory addressing modes, and instruction encoding, with a focus on the RV32I architecture. The lecture also covers the importance of simplicity and efficiency in instruction design, highlighting trade-offs in performance and complexity.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Slide 3

The lecture introduces RISC-V ISA, a fifth-generation open instruction set architecture developed at UC Berkeley, emphasizing its design principles and applications. Key aspects include data storage, memory addressing modes, and instruction encoding, with a focus on the RV32I architecture. The lecture also covers the importance of simplicity and efficiency in instruction design, highlighting trade-offs in performance and complexity.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

ELT3047 Computer Architecture

Lecture 4: RISC-V ISA (1)

Hoang Gia Hung


Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Last lecture review
❑ ISA design is hard
➢ Adhere to 4 qualitative principles
➢ Applying quantitative method

❑ Five aspects of ISA design


➢ Data Storage choices: GPR (load/store, register-memory), Stack, Register-
memory, Accumulator.
➢ Common addressing modes: displacement, immediate, register indirect
➢ Most important operations are simple instructions (96% of the instructions
executed) → make the common case fast.
➢ Instruction encoding: performance vs code size trade-off (fixed- vs
variable-length)
➢ To support the compiler performance: at least 16 (preferably 32) GPR’s,
aim for a minimalist instruction set, & ensure all addressing modes apply to
all data transfer instructions.

❑ Today’s lecture: Introduction to RISC-V ISA


➢ Showing how it follows previously covered design principles.
Overview

❑ Development
➢ Fifth generation of RISC developed at UC Berkeley as open ISA in 2010
➢ Now managed by the RISC-V Foundation & experiencing rapid uptake in
both industry (60+ foundation members)

❑ Why study RISC-V?


➢ Good architectural model for study: elegant and easy to understand
➢ A high-quality, license-free, royalty-free RISC ISA specification
➢ Appropriate for all levels of computing system, from microcontrollers to
supercomputers
❖ 32-bit, 64-bit, and 128-bit variants (we’re using 32-bit in class)

❑ What will be covered?


➢ Application of ISA design principles to 5 aspects of RV32I architecture
➢ Illustrations of SW-HW interface via assembly language
1. Data Storage

2. Memory Addressing Modes

3. Operations in the Instruction Set

4. Encoding the Instruction Set

5. The role of compilers


RV32I storage model
❑ General-Purpose Register (GPR) with Load/Store design
➢ Recap: what are the trade-off? E.g. Stack/Accumulator vs GPR,
Load/Store vs Memory-Memory

❑ Quantitative design
➢ How many GPR & trade-off? 32
➢ What is the GPR width & trade-off? 32 bit
RV32I storage model
Word address
Processor 0 M[0]
4 M[1]
Program Counter 8 M[2]
⁞ M[3]
**Note** x0=0
x1 M[4]
x2 Addr. bus
32 GPRs Read data
numbered
⁞ Write data
x0...x31 Data bus
Register File W/R Enable

Control bus M[N-1]


x31

32-bit words Processor- Memory


Memory Byte addressable
interface 32-bit address (4 GB)
Originally little-endian
❖ Note: textbook uses the 64-bit variant RV64I Not require words alignment
1. Data Storage

2. Memory Addressing Modes

3. Operations in the Instruction Set

4. Encoding the Instruction Set

5. The role of compilers


Addressing mode
❑ Recap:
Addressing mode Example Meaning
Register Add R4,R3 R4  R4+R3
RISC-V uses only Immediate Add R4,#3 R4  R4+3
the first 3 modes
Displacement Add R4,100(R1) R4  R4+Mem[100+R1]
Register indirect Add R4,(R1) R4  R4+Mem[R1]
Indexed / Base Add R3,(R1+R2) R3  R3+Mem[R1+R2]
Direct or absolute Add R1,(1001) R1  R1+Mem[1001]
Memory indirect Add R1,@(R3) R1  R1+Mem[Mem[R3]]
Auto-increment Add R1,(R2)+ R1  R1+Mem[R2]; R2  R2+d
Auto-decrement Add R1,–(R2) R2  R2-d; R1  R1+Mem[R2]
Scaled Add R1,100(R2)[R3] R1  R1+Mem[100+R2+R3*d]

❑ More modes trade off:


✓ Better support programming constructs (arrays, pointer-based accesses) →
reduced number of instructions and code size
 More work for the compiler
 More complicated HW implementation
1. Data Storage

2. Memory Addressing Modes

3. Operations in the Instruction Set

4. Encoding the Instruction Set

5. The role of compilers


Operations in the instruction set
❑ Only includes a few basic operations, complex high-level
instructions are compiled into a sequence of basic instructions.
➢ Review: trade off vs CISC?

❑ This course: not a full RISC-V instruction set


➢ Just a subset of real RV32I, sufficient for later implementation process.

Operator type Examples

Integer arithmetic and logical operations: add,


Arithmetic and Logical
and, subtract, or
Load/Store (move instructions on machines
Data Transfer
with memory addressing)

Control Branch, jump, procedure call and return

❖ Design principles #2, #3: “smaller is faster”, “make the common case fast”
Levels of Representation
High Level Language temp = v[k];
Program (e.g., C) v[k] = v[k+1];
v[k+1] = temp;
Compiler
lw t0, 0(x2)
Assembly Language lw t1, 4(x2)
Program (e.g., RISC-V) sw t1, 0(x2)
sw t0, 4(x2)
Assembler 0000 1001 1100 0110 1010 1111 0101 1000
Machine Language 1010 1111 0101 1000 0000 1001 1100 0110
Program (RISC-V) 1100 0110 1010 1111 0101 1000 0000 1001
0101 1000 0000 1001 1100 0110 1010 1111
Machine
Interpretation
Hardware Architecture Description
(e.g., block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams)
RISC-V Assembly Language
❑ A symbolic representation of what the processor understands
➢ Interface between HLL and machine code
➢ Human-readable format of instructions
➢ Usually has a counterpart in high level programming languages like C, Java

❑ Architectural representative
➢ The instruction semantic describes the data flow at the register-transfer
level, e.g. lw t1, 4(x2) describes the data flow t1 ← Mem[x2+4].

❑ Each line of assembly code contains at most 1 instruction


➢ Mnemonic: operation to perform, e.g. lw means load a word from a
memory location to a register
➢ Operands:
▪ source - on which the operation is performed, e.g. the data stored at
the memory location 4(x2)
▪ destination - to which the result is written, e.g. the register t1
➢ # (hex-sign) is used for comments
1. Data Storage

2. Memory Addressing Modes

3. Operations in the Instruction Set

4. Encoding the Instruction Set

5. The role of compilers


RV32I instruction encoding
❑ Fixed instruction length at 32 bit
➢ Trade-off vs variable length?

❑ Use a rigid format for instructions in the same class


➢ Design principles #1: “simplicity favors regularity” - regularity makes
hardware implementation simpler → higher performance at lower cost.
➢ E.g., arithmetic instructions add x18,x19,x20 & sub x5,x6,x7 both have 1
operator, 3 operands in 3 registers following a fixed order

❑ Arithmetic instructions’ encoding: R-type


7 bits 5 bits 5 bits 3 bits 5 bits 7 bits
funct7 rs2 rs1 funct3 rd op
➢ Semantics: [rd] ← [rs1] op [rs2]
➢ Note the difference in the operand order btw machine code & assembly.
➢ Examples:
[x18]←[x19]+[x20] 0000 000 10100 10011 000 10010 011 0011
[x5]←[x6]-[x7] 0100 000 00111 00110 000 00101 011 0011
Support for constant operands?
❑ ISA designers often receive many requests for additional
instructions that, in theory, will make the ISA better in some way.
➢ Apply quantitative approach to judge the tradeoffs between cost and benefits.

❑ Example: many programs use small constants frequently →


should we support them in ALU instructions?
➢ Quantitative analysis: simulate the impact of the ISA augmented with this
feature by running benchmark programs.
▪ >50% of executed arithmetic instructions (e.g. loop
increments, scaling indices)
▪ >80% of executed compare instructions (e.g. loop
termination condition)
▪ >25% of executed load instructions (e.g. offsets into data
structures)
❖ constant operands = common case → make it fast!
➢ Trade-off: saves registers, instructions but requires more complex control &
datapath logic → Good design demands good compromises.
Instructions with immediate operands
❑ Encoding: I-type
12 bits 5 bits 3 bits 5 bits 7 bits
imm rs1 funct3 rd op

➢ Keep the format as similar as possible to that of the R-type: same bits =
same meaning (op, rd, funct3, rs1) → simplicity favors regularity.
➢ The constant ranges from [-211 to 211-1].

❑ Semantics:
➢ [rd] ← [rs1] op [sign-extend(imm)]
➢ Pseudo-instruction: mv (move) = add instruction with zero immediate

❑ Assembly instructions use same mnemonics, but with an “i”


suffix to indicate the second operand is a constant, e.g. addi
➢ Why don’t we need subi? (“smaller is faster”)
Logical Operations

Logical operation C operator Java operator RISC-V operator

Shift Left << << sll


Shift right >> >>, >>> srl
Bitwise AND & & and, andi
Bitwise OR | | or, ori
Bitwise NOT ~ ~ nor
Bitwise XOR ^ ^ xor, xori

❑ New perspective:
➢ View register as 32 raw bits rather than as a single 32-bit number (as
arithmetic instructions) → operate on individual bits or bytes within a word.
➢ Share the same encoding with arithmetic instructions (R- & I- types).
Logical shift operations
❑ Semantics: move all the bits in a word to the left/right by a
number of positions; fill the emptied positions with zeroes.
❑ R-type encoding: shift amount is in (lower 5 bits of) a register
➢ sll (shift left logical): sll t0, t1, t2 # t0 = t1 << t2
➢ srl (shift right logical): srl t0, t1, t2 # t0 = t1 >> t2

❑ I-type encoding: is an immediate between 0 to 31


➢ slli (shift left logical immediate): slli t0, t1, 4 # t0 = t1 << 4
➢ srli (shift right logical immediate): srli t0, t1, 4 # t0 = t1 >> 4

❑ What’s the equivalent operation for shifting left/right n bits?


➢ Ans: Multiply/divide by 2n (unsigned). Shifting is faster than multiplication/
division → good compiler translates such operations into shift instructions
➢ Example: slli t0, t1, 4
t1 0000 1000 0000 0000 0000 0000 0000 1001
t0 1000 0000 0000 0000 0000 0000 1001 0000
Arithmetic shift operations
❑ Semantics: move all the bits in a word to the right by a number
of positions; fill the emptied positions with sign bit.
❑ R-type encoding: shift amount is in (lower 5 bits of) a register
➢ sra (shift right arithmetic): sra t0, t1, t2 # t0 = t1 >> t2

❑ I-type encoding: shift amount is an immediate between 0 to 31


➢ srai (shift right arithmetic immediate): srai t0, t1, 4 # t0 = t1 >> 4
➢ Example: srai x10, x11, 4

x11 1111 1111 1111 1111 1111 1111 1110 0111 (-2510)
x10 1111 1111 1111 1111 1111 1111 1111 1110 (-210)
Unfortunately, this is NOT same as dividing by 24
▪ Fails for odd negative numbers
Logical Operations: Bitwise AND

Mnemonic: and ( bitwise AND )


Bitwise operation that leaves a 1 only if both the bits
of the operands are 1

❑ E.g.: and x9, x10, x11

x10 0110 0011 0010 1111 0000 1101 0101 1001


mask x11 0000 0000 0000 0000 0011 1100 0000 0000
x9 0000 0000 0000 0000 0000 1100 0000 0000

❑ and can be used for masking operation:


➢ Place 0s into the positions to be ignored → bits will turn into 0s
➢ Place 1s for interested positions → bits will remain the same as the original.
Exercise: Bitwise AND
❑ We are interested in the last 12 bits of the word in register x10.
Result to be stored in x9.
➢ Q: What’s the mask to use?

x10 0000 1001 1100 0011 0101 1101 1001 1100

mask 0000 0000 0000 0000 0000 1111 1111 1111

x9 0000 0000 0000 0000 0000 1101 1001 1100

Notes:
The and instruction has an immediate version, andi
Logical Operations: Bitwise OR

Mnemonic: or ( bitwise OR )
Bitwise operation that that places a 1 in the result if
either operand bit is 1
Example: or x9, x10, x11

➢ The or instruction has an immediate version ori


➢ Can be used to force certain bits to 1s
➢ E.g.: ori x9, x10, 0xFFF

x9 0000 1001 1100 0011 0101 1101 1001 1100


0xFFF 0000 0000 0000 0000 0000 1111 1111 1111
x10 0000 1001 1100 0011 0101 1111 1111 1111
Logical Operations: Bitwise NOR
❑ Strange fact 1:
➢ There is no not instruction in RISC-V to toggle the bits (1 → 0, 0 → 1)

❑ However, a nor instruction is provided:

Mnemonic: nor ( bitwise nor )


Example: nor x9, x10, x11

❑ Question: How do we get a not operation?


nor x9, x10, zero
❑ Question: Why do you think is the reason for not providing a
not instruction?
Design principles #3: smaller is faster.
Logical Operations: Bitwise XOR

Mnemonic: xor ( bitwise xor )


Example: xor x9, x10, x11

❑ Question: Can we also get not operation from xor?

Yes, let x11 contain all 1s:


xor x9, x9, x11

❑ Strange Fact 2:
➢ There is no nori, but there is xori in RISC-V
➢ Why?
Load instructions
❑ Encoding: I-type
12 bits 5 bits 3 bits 5 bits
imm rs1 funct3 rd op
010: lw 0000011
❑ Semantics
➢ lw rd, imm(rs1) #rd ← Mem[rs1+imm], load a word from Memory
➢ Address calculation is like addi: base (stored in rs1) + offset (stored in
instruction) → base addressing, a special case of displacement addressing.

❑ Make common cases fast


➢ lh rd, imm(rs1) #rd ← Mem[rs1+imm][0:15], load a halfword
➢ lhu rd, imm(rs1) #rd ← Mem[rs1+imm][0:15], load an (u) h.w
➢ lb rd, imm(rs1) # rd ← Mem[rs1+imm][0:7], load a byte
➢ lbu rd, imm(rs1) # rd ← Mem[rs1+imm][0:7], load an (u) byte

❑ Two variations of base addressing:


➢ If rs1 = zero/x0 = 0, address = imm (absolute addressing)
➢ If imm = 0, address = rs1 (register indirect addressing)
Store instructions
❑ Assembly: sw rs2, imm(rs1) #store a word in Memory
➢ Semantics: rs2 → Mem[rs1+imm]. Needs to read rs1 (for base register) &
rs2 (for the value to be stored); there’s no write to register file → no rd.
➢ Address calculation is like addi instruction

❑ Encoding: S-type
➢ Keep rs1 and rs2 fields in same place as register names are more critical
than immediate bits in hardware design.
7 bits 5 bits 5 bits 3 bits 5 bit 7 bits
imm[11:5] rs2 rs1 funct3 imm[4:0] op
010: sw 0100011
❑ Make common cases fast
➢ sh rs2, imm(rs1) #rs2 → Mem[rs1+imm][0:15], store a halfword
➢ sb rs2, imm(rs1) #rs2 → Mem[rs1+imm] [0:7], store a byte
➢ Why there’re no unsigned store instructions shu, sbu?
A load byte example
❑ Suppose x10 initially contains 0x23456789. After the following
code runs, what is the value of x14
sw x10, 0(x0)
lb x14, 1(x0)
a. in a big-endian system?
b. in a little-endian system?

❑ Solution:
a. 0x00000045
b. 0x00000067

Big-Endian Little-Endian
Word
Byte Address 0 1 2 3 Address 3 2 1 0 Byte Address
Data Value 23 45 67 89 0 23 45 67 89 Data Value
MSB LSB MSB LSB
Instructions for 32-bit constants
❑ Most constants are small
➢ 12-bit immediate is sufficient
➢ Occationally need 32-bit constants → upper-immediate instructions

❑ Encoding: U-type
20 bits 5 bit 7 bits
imm[31:12] rd op

➢ lui rd, constant # rd ← imm << 12 (Load upper immediate)


➢ Can create any 32-bit value in a register using two instructions, e.g., lui
and ori (to set the 12 bits low).

❑ Example: write assembly code to load below constant into x8


0000 0000 0111 1101 0000 1001 0000 0000

lui x8, 2000 0000 0000 0111 1101 0000 0000 0000 0000

ori x8, x8, 2304 0000 0000 0000 0000 0000 1001 0000 0000
Branch Instructions
❑ Change control flow depending on outcome of comparison
➢ E.g., beq x1, x2, Label # branch to Label if x1 equals x2
➢ Branches read two registers but don’t write a register (similar to stores)

❑ Encoding: B-type (formerly called SB-type)


1 bit 6 bits 5 bits 5 bits 3 bits 4 bit 1 bit 7 bits
imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1] imm[11] op
➢ Encoding the label: use imm as a 2’s-complement offset to PC (PC-relative
addressing) → can specify ± 211 addresses from the PC.
➢ Branch offset scaling: let imm be the number of half-words to jump, either
forward (+) or backward (-) → imm[0] is always 0, thus is not stored.
➢ Why do they put imm[11] in such unusual position?

❑ Semantics
➢ Label ← PC + sign-extend(imm) x 2
if [rs] == [rt] then PC ← Label else PC ← PC + 4
➢ Variations: bne, blt, bge , bltu, bgeu. Why isn’t there a ble or bgt?
Branch encoding example
Loop: beq x19,x10,End
add x18,x18,x10 1 Count
addi x19,x19,-1 2 instructions
j Loop 3 from branch
End: # target instruction 4

❑ Branch offset = 4×32-bit instructions = 16 bytes


➢ imm[12:1] is offset in half-word → imm[12:1] = 8.

13-bit immediate, imm[12:0], with value 16

0 0 000000 1000 0 imm[0] discarded,


imm[4:1] always zero
imm[12] imm[10:5] imm[11]
??????? 01010 10011 000 ????? 1100011
imm rs2=10 rs1=19 BEQ imm BRANCH
Jump instructions
❑ Assembly: jal rd, offset #jump and link
➢ Semantics: rd ← PC+4; PC = PC + offset # PC-relative jump

❑ Encoding: J-type
1 bit 10 bits 1 bit 8 bits 5 bits 7 bits
imm[20] imm[10:1] imm[11] imm[19:12] rd op

➢ Immediate encoded optimized similarly to branch instruction to reduce


hardware cost.
➢ Branch-like immediate encoding: offset = signed immediate * 2
➢ How far can we jump?

❑ Pseudo instruction: j offset #jump to PC+offset


➢ uses jal but sets rd=x0 to discard return address
Other RV32I instructions
❑ auipc rd, imm # add upper immediate to PC
➢ Semantics: rd ← PC + sign-extend(imm[31:12]<<12)
➢ Encoding: U-type

❑ jalr rd, imm(rs1) # jump and link register


➢ Semantics: writes PC+4 to rd (return address), then sets PC = rs1 + sign-
extend(imm) → uses same immediates as arithmetic and loads.
➢ Encoding: I-type
➢ No multiplication by 2 before adding to rs to form the new PC → byte offset
NOT halfword offset as in branches and jal.

❑ Performing very long jumps: use a sequence of instructions


➢ Jump to any 32-bit address:
lui x1, <hi20bits> #write high bits to a temp. register
jalr ra, x1, <lo12bits> #add low bits & jump to the sum
➢ Jump PC-relative with 32-bit offset :
auipc x1, <hi20bits> #add high bits to PC, sum saved to x1
jalr x0, x1, <lo12bits> #add low bits to x1 & jump to the sum
Summary (1)
❑ Applications of ISA design principles to 5 aspects of RISC-V
➢ Data Storage: 32 GPRs with load/store design for optimal support for compiler
➢ Addressing modes:
Displacement addressing (RV64I)
Summary (2)
❑ Operations in the Instruction Set: 3 types - Arithmetic and Logical,
Data transfer, and Control.
❑ Encoding the Instruction Set:

➢ Self-study: RISC-V green card (uploaded to the course website).


➢ Self-practice: Venus Online RISC V Simulator https://venus.cs61c.org/

❑ Next lecture: RISC-V ISA’s support for high level programming

You might also like