L03
L03
James C. Hoe
Department of ECE
Carnegie Mellon University
• “Architecture”
conceptual
– a clock has an hour hand and a minute hand, .....
physical
• Microarchitecture (think blueprint)
– a particular clockwork has a certain set of gears
arranged in a certain configuration
• Realization
– machined alloy gears vs stamped sheet metal
18-447-S24-L03-S4, James C. Hoe, CMU/ECE/CALCM, ©2024 [Computer Architecture, Blaauw and Brooks, 1997]
How to specify a computer design?
• “Architecture”
conceptual
– a computer does ….????….
Computer Architecture
physical
• Microarchitecture (think blueprint)
– a particular computer design has a certain
datapath and a certain control logic
• Realization
– CMOS vs ECL vs vacuum tubes
18-447-S24-L03-S5, James C. Hoe, CMU/ECE/CALCM, ©2024 [Computer Architecture, Blaauw and Brooks, 1997]
Stored Program Architecture
a.k.a. von Neumann
• Memory holds both program and data
– instructions and data in a linear memory array
– instructions can be modified as data
• Sequential instruction processing
1. program counter (PC) identifies current instruction
2. fetch instruction from memory
3. update state (e.g. PC and memory) as a function of
current state according to instruction
4. repeat program counter
0 1 2 3 4 5...
…
Dominant paradigm since its conception
18-447-S24-L03-S6, James C. Hoe, CMU/ECE/CALCM, ©2024
Instruction Set Architecture (ISA):
A Concrete Specification
• Assembly Code
– suppose f, g, h, i, j are in rf, rg, rh, ri, rj
– suppose rtemp is a free register
add rtemp rg rh # rtemp = g+h
add rf ri rj # rf = i+j
sub rf rtemp rf # f = rtemp – rf
18-447-S24-L03-S14, James C. Hoe, CMU/ECE/CALCM, ©2024
Reg-Immediate ALU Instructions
• Assembly (e.g., reg-immediate additions)
ADDI rd, rs1, imm12
• Machine encoding: I-type
imm[11:0] rs1 000 rd 0010011
12-bit 5-bit 3-bit 5-bit 7-bit
• Semantics
– GPR[rd] GPR[rs1] + sign-extend (imm)
– PC PC + 4
• Exceptions: none (ignore carry and overflow)
• Variations
– Arithmetic: {ADDI, SUBI}
– Compare: {signed, unsigned} set if less than
– Logical: {ANDI, ORI, XORI}
– **Shifts by unsigned imm[4:0]: {SLLI, SRLI, SRAI}
18-447-S24-L03-S15, James C. Hoe, CMU/ECE/CALCM, ©2024
I-Type Reg-Immediate ALU Inst. Encodings
sign-extended immediate
• On a byte-addressable machine . . . . . . .
MSB Big Endian LSB MSB Little Endian LSB
byte 0 byte 1 byte 2 byte 3 byte 3 byte 2 byte 1 byte 0
byte 4 byte 5 byte 6 byte 7 byte 7 byte 6 byte 5 byte 4
byte 8 byte 9 byte 10 byte 11 byte 11 byte 10 byte 9 byte 8
byte 12 byte 13 byte 14 byte 15 byte 15 byte 14 byte 13 byte 12
byte 16 byte 17 byte 18 byte 19 byte 19 byte 18 byte 17 byte 16
pointer points to the big end pointer points to the little end
• What difference does it make?
18-447-S24-L03-S19, James C. Hoe, CMU/ECE/CALCM, ©2024
Load/Store Data Alignment
MSB byte-3 byte-2 byte-1 byte-0 LSB
byte-7 byte-6 byte-5 byte-4
• Access granularity not same as addressing granularity
– physical implementations of memory and memory
interface optimize for natural alignment boundaries
(i.e., return an aligned 4-byte word per access)
– unaligned loads or stores would require 2 separate
accesses to memory
• Common for RISC ISAs to disallow misaligned
loads/stores; if necessary, use a code sequence of
aligned loads/stores and shifts
• RV32I (until v20191213) allowed misaligned loads/
stores but warns it could be very slow; if necessary, . . .
18-447-S24-L03-S20, James C. Hoe, CMU/ECE/CALCM, ©2024
Store Instructions
• Assembly (e.g., store 4-byte word)
SW rs2, offset12(base)
• Machine encoding: S-type
offset[11:5] rs2 base 010 ofst[4:0] 0100011
7-bit 5-bit 5-bit 3-bit 5-bit 7-bit
• Semantics
– byte_address32 = sign-extend(offset12) + GPR[base]
– MEM32[byte_address] GPR[rs2]
– PC PC + 4
• Exceptions: none for now
• Variations: SW, SH, SB
e.g., SB:: MEM8[byte_address] (GPR[rs2])[7:0]
18-447-S24-L03-S21, James C. Hoe, CMU/ECE/CALCM, ©2024
Assembly Programming 201
• E.g. High-level Code
A[ 8 ] = h + A[ 0 ]
where A is an array of integers (4 bytes each)
• Assembly Code
– suppose &A, h are in rA, rh
– suppose rtemp is a free register
LW rtemp 0(rA) # rtemp = A[0]
add rtemp rh rtemp # rtemp = h + A[0]
SW rtemp 32(rA) # A[8] = rtemp
# note A[8] is 32 bytes
# from A[0]
18-447-S24-L03-S22, James C. Hoe, CMU/ECE/CALCM, ©2024
Load/Store Encodings
• Both needs 2 register operands and 1 12-bit
immediate
else goto
{ code C } code B
{ code D } code D
code D
F: ......
?
......
......
BEQ x0, x0, A+4
grow down
dynamic data
static data
binary executable
text
reserved
low address
18-447-S24-L03-S36, James C. Hoe, CMU/ECE/CALCM, ©2024
Basic Calling Convention
1. caller saves caller-saved registers
2. caller loads arguments into a0~a7 (x10~x17)
3. caller jumps to callee using JAL x1
prologue
4. callee allocates space on the stack (dec. stack pointer)
5. callee saves callee-saved registers to stack
function
epilogue
7. callee restores saved register values
8. JALR x0, x1