KTMT
KTMT
Chapter 1
Computer Abstractions and
Technology
Binh Tran-Thanh
[email protected]
The Computer Revolution
▪ Progress in computer technology
▪ Underpinned by Moore’s Law
▪ Makes novel applications feasible
▪ Computers in automobiles
▪ Cell phones
▪ Human genome project
▪ World Wide Web
▪ Search Engines
▪ Computers are pervasive
Source: internet
15-Aug-23 Faculty of Computer Science and Engineering 7
Classes of Computers
▪ Personal computers
▪ General purpose, variety of software
▪ Subject to cost/performance trade-off
▪ Embedded computers
▪ Hidden as components of systems
▪ Stringent power/performance/cost constraints
15-Aug-23 Faculty of Computer Science and Engineering 8
Classes of Computers
▪ Server computers
▪ Network based
▪ High capacity, performance, reliability
▪ Range from small servers to building sized
▪ Supercomputers
▪ High end scientific and engineering calculations
▪ Highest capability but represent a small fraction of
the overall computer market
Computer board
15-Aug-23 Faculty of Computer Science and Engineering Thi s Photo by Unknown Author i s licensed under CC 24
BY-ND
Networks
▪ Communication, resource sharing, nonlocal
access
▪ Local area network (LAN): Ethernet
▪ Wide area network (WAN): the Internet
▪ Wireless network: WiFi, Bluetooth
BAC/Sud BAC/Sud
132 4000
Concorde Concorde
Douglas DC- Douglas DC-
146 8720
8-50 8-50
BAC/Sud BAC/Sud
1350 178200
Concorde Concorde
Douglas DC- Douglas DC-
544 79424
8-50 8-50
Data transfer
and computation
Update state
B = I 600ps = 1.2
CPU Time
…by this much
CPU Time I 500ps
A
15-Aug-23 Faculty of Computer Science and Engineering 39
CPI in More Detail
▪ If different instruction classes take different
numbers of cycles n
Clock Cycles = (CPI Instructio n Count )
i i
i=1
Relative frequency
Clock rate
▪ CPI varies between programs on a given CPU
53
Computer Architecture
Faculty of Computer Science & Engineering - HCMUT
Chapter 2
Instructions: Language of the
Computer
Binh Tran-Thanh
[email protected]
Objectives
swap(int v[], int k){ swap: multi $2, $5, 4
int temp; add $2, $4, $2
temp = v[k]; lw $15, 0($2)
v[k] = v[k+1]; Compiler lw $16, 4($2)
v[k+1] = temp; sw $16, 0($2)
} sw $15, 4($2)
jr $31
00000000101000100000000100011000
00000000100000100001000000100001
10001101111000100000000000000000
10001110000100100000000000000100 Assembler
10101110000100100000000000000000
10101101111000100000000000000100
00000011111000000000000000001000
Instruction
Data memory Control unit
memory
Input/
output
ALU
I/O BR
data
data
I/O Module …
data
PC: Program Counter
IR: Instruction Register
MAR: Memory Address Register
MBR: Memory Buffers Register
I/O AR: Input Output Address Register
Buffers I/O BR: Input Output Buffer Register
8/13/2021 Faculty of Computer Science and Engineering 6
Instruction execution process
Fetch Stage Execute Stage
Smaller is faster
▪ Example: negate +2
▪ +2 = 0000 0000 … 00102
▪ –2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
8/13/2021 Faculty of Computer Science and Engineering 27
Sign Extension
▪ Representing a number using more bits
▪ Preserve the numeric value
▪ In MIPS instruction set
▪ addi: extend immediate value
▪ lb, lh: extend loaded byte/halfword
▪ beq, bne: extend the displacement
▪ Replicate the sign bit to the left
▪ c.f. unsigned values: extend with 0s (ZERO extend)
▪ Examples: extend 8-bit to 16-bit for signed number
▪ +2: 0000 0010 => 0000 0000 0000 0010
▪ –2: 1111 1110 => 1111 1111 1111 1110
8/13/2021 Faculty of Computer Science and Engineering 28
Exercise(1/2)
Given a piece of MIPS code as below:
.data
int_a: .word 0xCA002021
.text
la $s0, int_a # load address
lb $t1, 0($s0)
lbu $t2, 0($s0)
lb $t3, 3($s0)
lbu $t4, 3($s0)
What are values of t1, t2, t3, t4?
How about little endian?
8/13/2021 Faculty of Computer Science and Engineering 29
Exercise
Given a piece of MIPS code as below:
.data
var_A: .byte 0xCA
var_B: .half 0xBEEF
var_C: .word 0xBAD0BABE
.text
la $s0, var_A
la $s1, var_B
la $s2, var_C
Assume that .data segment begins at 0x40000000 address.
What is value of $s0, $s1, $s2
▪ Instruction fields
▪ op: operation code (opcode)
▪ rs: first source register number
▪ rt: second source register number
▪ rd: destination register number
▪ shamt: shift amount (00000 for now)
▪ funct: function code (extends opcode)
8/13/2021 Faculty of Computer Science and Engineering 32
R-format Example
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
0 17 18 8 0 32
000000100011001001000000001000002 = 0232402016
C compiler
(machine code) ▪ Programs can operate on programs
▪ e.g., compilers, linkers, …
Payroll data
▪ Binary compatibility allows
compiled programs to work on
Book text
different computers
Source code in C ▪ Standardized ISAs
for editor program
$sp $sp
$fp Saved argument
register (if any)
Saved return
address
Saved saved
register (if any)
Local arrays and
$sp structure (if any)
low address
5. Pseudo-direct addressing
op address
Memory
PC Word
8/13/2021 Faculty of Computer Science and Engineering 73
Synchronization
▪ Two processors sharing an area of memory
▪ P1 writes, then P2 reads
▪ Data race if P1 and P2 don’t synchronize
▪ Result depends of order of accesses
▪ Hardware support required
▪ Atomic read/write memory operation
▪ No other access to the location allowed between the read
and write
▪ Could be a single instruction
▪ E.g., atomic swap of register memory
▪ Or an atomic pair of instructions
Static linking
Indirection table
Linker/loader code
Dynamically
mapped code
Compiles
Interprets
bytecodes of
bytecodes
“hot” methods
into native
code for host
machine
1.5
1
1
0.5 0.12
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
0.5 0.29
0.05
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
500 338
0
C/none C/O1 C/O2 C/O3 Java/int Java/JIT
Chapter 3
Arithmetic for Computers
Binh Tran-Thanh
[email protected]
Arithmetic for Computers
▪ Operations on integers
▪ Addition and subtraction
▪ Multiplication and division
▪ Dealing with overflow
▪ Floating-point real numbers
▪ Representation and operations
Multiplier
2. Shift the Multiplicand register left 1 bit 64-bit ALU Shift right
32 bits
3. Shift the Multiplier register right 1 bit
32-bit ALU
32 bits 32 bits
32 bits
2a. Shift the Quotient register to the left, 2b. Restore the original value by adding
setting the new rightmost bit to 1 the Divisor register to the Remainder
register and placing the sum in the
Quotient
Remainder register. Also shift the 64-bit ALU Shift left
Quotient register to the left, setting the
new least significant bit to 0
32 bits
Remainder Control
3. Shift the Divisor register right 1 bit
Write test
64 bits
No: < 33 repetitions
33rd repetition?
Initially dividend
Yes: 33 repetitions
8/15/2023 Done
Facutly of Computer Science and Engineering 12
Optimized Divider
▪ One cycle per partial-remainder subtraction
▪ Looks a lot like a multiplier!
▪ Same hardware can be used for both
Divisor
32 bits
32-bit ALU
Compare
Small ALU
exponents
Exponent
difference
Step 1
0 1 0 1 0 1
Shift smaller
Control Shift right
number right
Big ALU
Add
Step 2
0 1 0 1
Add Add
Data
Register #
PC Address Instruction Registers ALU Address
Register #
Data
Instruction
memory
memory Register #
Data
Data
Register #
PC Address Instruction Registers ALU Address
Register #
Data
Instruction
memory
memory Register #
Data
M
u
x
Add Add M
u
x
ALU operation
Data
MemWrite
Register #
PC Address Instruction Registers ALU Address
Register # M Zero
u Data
Instruction
x memory
memory Register # RegWrite
Data
MemRead
Control
D Q Write
Clock cycle
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
op address
6 bits 26 bits
▪ Note:
▪ All MIPS instructions are 32-bit wise
▪ All MIPS instructions contain 6-bit OP
(most significant)
8/15/2023 Faculty of Computer Science and Engineering 17
Instruction Fetch
Add
Read Increment by
PC
address 4 for next
instruction
Instruction
Add
4
add $t1, $s0, $t0
lbu $t2, 0($t1) Increment by
PC add $t3, $s0, $a0
sb $t2, 0($t3)
4 for next
beq $t2, $zero, exit instruction
addi $s0, $s0, 1
j loop
...
a. Registers b. ALU
Read
Address
data
16 Sign- 32
Data extend
Write memory
data
MemRead
a. Data memory unit b. Sign extension unit
8/15/2023 Faculty of Computer Science and Engineering 23
Branch Instructions
▪ Read register operands
▪ Compare operands
▪ Use ALU, subtract and check Zero output
▪ Calculate target address
▪ Sign-extend displacement
▪ Shift left 2 places (word displacement)
▪ Add to PC + 4
▪ Already calculated by instruction fetch
16 Sign- 32
extend
Sign-bit wire
replicated
8/15/2023 Faculty of Computer Science and Engineering 25
Composing the Elements
▪ First-cut data path does an instruction in
one clock cycle
▪ Each datapath element can only do one
function at a time
▪ Hence, we need separate instruction and data
memories
▪ Use multiplexers where alternate data
sources are used for different instructions
8/15/2023 Faculty of Computer Science and Engineering 26
R-Type/Load/Store Datapath
▪ add $S0, $a0, $t0
Read ALU operation
register 1 4
Read MemWrite
data 1
Read MemtoReg
register 2 Zero
Instruction ALUSrc
Registers Read ALU ALU Read
Write 0 Address 1
data 2 result data M
register M
u u
x x
Write 1 0
data Data
Write memory
RegWrite data
16 32 MemRead
Sign-
extend
16 32 MemRead
Sign-
extend
16 32 MemRead
Sign-
extend
M
Add u
x
ALU
4 Add result
Shift
left 2
Read
PC
Read register 1
ALUSrc 4 ALU operation
address Read MemWrite
Read data 1
Zero MemtoReg
register 2
Instruction Registers Read ALU ALU Read
Write Address
Instruction data 2 M result data M
register u
memory u
x x
Write
data
Write Data
RegWrite data memory
16 32 MemRead
Sign-
extend
0 0 01 OR 1
Result
0 0 10 ADD b
0 + 2
0 1 10 SUB 1
0 1 11 SLT CarryOut
1 1 00 NOR
Without SLT implementation
0 0 01 OR 1
Result
0 0 10 ADD b
0 + 2
0 1 10 SUB 1
Less 3
0 1 11 SLT
Set
1 1 00 NOR Overflow
detection Overflow
With SLT implementation
1-bit ALU [0] 1-bit ALU [31]
16
Ins truction [15–0] Sign 32 ALU
extend control
16
Ins truction [15–0] Sign 32
extend ALU
control
16
Ins truction [15–0] Sign 32
extend ALU
control
16
Ins truction [15–0] Sign 32
extend ALU
control
▪ Non-stop:
▪ Speedup
= 2n/(0.5n + 1.5) ≈ 4
= number of stages
8/15/2023 Faculty of Computer Science and Engineering 45
MIPS Pipeline
▪ Five stages, one step per stage
▪ IF: Instruction fetch from memory
▪ ID: Instruction decode & register read
▪ EX: Execute operation or calculate address
▪ MEM: Access memory operand
▪ WB: Write result back to register
Instruction Data
lw $2, 200($0) 800 ps Reg ALU Reg
fetch access
800 ps
Instruction Data
lw $2, 200($0) 200 ps Reg ALU Reg
fetch access
Instruction Data
beq $1, $2, 40 fetch
Reg ALU
access
Reg
200 ps
Program
execution
200 400 600 800 1000 1200 1400
order Time
(in instructions )
Instruction Data
add $4, $5, $6 fetch
Reg ALU
access
Reg
Prediction Instruction Data
beq $1, $2, 40 Reg ALU Reg
incorrect 200 ps fetch access
Lw IF ID E M WB
Sw IF ID E M WB
add IF ID M E WB
Single clock cycle: 3 cycles, cycle time = 5 secs
Lw IF ID E M WB
Sw IF ID E M
add IF ID E WB
Multi clock cycle: 5 + 4 + 4 = 13 cycles, cycle time = 1 secs
Lw IF ID E M WB
Sw IF ID E M WB
add IF ID E M WB
8/15/2023 Pipeline : 7 cycles, cycles time = 1 secs 68
Multiple clock cycle
Instruction #cycles
Load 5 IF ID EXE MEM WB
Store 4 IF ID EXE MEM
Branch 3 IF ID EXE
Arithmetic/logical 4 IF ID EXE WB
Jump 2 IF ID
Add
4 Add
ADD
result
Shift
left 2
0
Read Read
M register 1
Address data 1 Zero
u PC
x Read ALU ALU
1 register 2 Address
Instruction result Read
Registers 0 1
data
MEM Write Read
M
u
Data M
Instruction register data 2 m emory u
m emory x x
Write 1
data 0
Write
Right-to-left data
flow leads to WB 16 32
Sign-
hazards extend
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory result Address data 0
Write data 2 0
M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Instruction fetch
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Instruction decode
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Execution
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Memory
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Write back
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
Wrong extend
register
number
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Execution
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Memory
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Write back
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory 0 result Address data 0
Write data 2 M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
lw $13, 24($1)
IM REG ALU DM REG
Add
4 Add Add
result
Shift
left 2
Instruction
0
M
u PC Address Read
x register 1 Read
1
data 1
Read Zero
Instruction
register 2 ALU
Read ALU Read
memory result Address data 0
Write data 2 0
M
register M
Data u
Registers u
Write memory x
data x 1
1
Write
data
16 Sign- 32
extend
Add
Add
4 Shift result Branch
Left 2
Instruction
0 RegWrite
M
u
PC Address Read Read
x register 1 MemWrite
1 data 1 MemtoReg
Read ALUSrc Zero
register 2 Add ALU
Instruction Read 0 result Address
Read
data
1
M
memory Write
register
data 2 M
u Data u
Write
data
Registers x
1 memory
x
0
Write
Instruction data
RegDst
Instruction
Control M WB
EX M WB
ID/EX
WB
EX/MEM
Control M WB
MEM/WB
EX M WB
IF/ID
Add Add
Add
4
Instruction
RegWrite
Shift result Branch
Left 2
ALUSrc
MemtoReg
MemWrite
0
M
u PC Read
Address register 1 Read
x
1 data 1
Read Zero
register 2 Add ALU
Instruction
Read 1
Read 0
result Address data
Write data 2 M M
memory register u Data u
Write
data
Registers x
1 memory
x
0
Write
Instruction data
Registers
ALU
Data
memory M
u
x
a. No forwarding
M
u
x
Registers ForwardA
ALU
M Data
u M
x memory
u
x
ForwardB
Rs
Rt
Rt EX/MEM.RegisterRd
Rd
M
u
x
Forwarding
MEM/WB.RegisterRd
unit
b. With forwarding
8/15/2023 Faculty of Computer Science and Engineering 92
Forwarding Conditions
▪ EX hazard
▪ if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10
▪ if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10
IF/ID EX M WB
M
U
Instruction
X
Registers M
Add
Instruction U
PC
memory M
Data X
U
memory
X
IF/ID.RegisterRs Rs
IF/ID.RegisterRt Rt
IF/ID.RegisterRt Rt EX/MEM.RegisterRd
M
IF/ID.RegisterRd Rd
U
X
Forwarding MEM/WB.RegisterRd
unit
lw $2, 20($1)
IM REG ALU DM REG Need to stall
for one cycle
IM REG ALU DM REG
and $4, $2, $5
IF/ID.Write
ID/EXE
PCWrite
WB EXE/MEM
M
Control U M WB MEM/WB
0
X
IF/ID M WB
EX
M
U
Instruction
X
Registers
M
Instruction Add
U
PC memory M X
U Data
X memory
IF/ID.RegisterRs
IF/ID.RegisterRt
IF/ID.RegisterRt Rt
M
IF/ID.RegisterRd Rd
U
ID/EXE.RegisterRt X
Rs Forwarding
Rt
unit
PC
Hazard
detection
unit
ID/EXE
WB EXE/MEM
M
Control MEM/WB
28 U M WB
IF/ID X
+ 72 M WB
44 EX
48
+ Shift M
$4
4 Left 2 $1 U
X
Registers
M
= M
U Instruction $3 ALU U
PC 44 memory M $8 Data
X 72 X
7 U memory
X
Sign-
extend
10
Forw arding
unit
Hazard
detection
unit
ID/EXE
WB EXE/MEM
M
Control MEM/WB
U M WB
IF/ID X
+ M WB
72 EX
76
+ Shift M
$1
4 Left 2 U
X
Registers
M
= M
U Instruction ALU U
PC 72 memory M $3 Data
X 76 X
U memory
X
Sign-
extend
Forw arding
unit
… IF ID EX MEM WB
beq stalled IF ID
beq stalled IF ID
beq stalled ID
Not taken
Predict taken Predict taken
Taken
Not taken
Predict not taken Predict not taken
Taken
Not taken
8/15/2023 Faculty of Computer Science and Engineering 113
Calculating the Branch Target
▪ Even with predictor, still need to calculate
the target address
▪ 1-cycle penalty for a taken branch
▪ Branch target buffer
▪ Cache of target addresses
▪ Indexed by PC when instruction fetched
▪ If hit and instruction is branch predicted taken,
can fetch target immediately
ID.Flush
Hazard
IF.Flush
detection
unit
M
ID/EXE U
X EXE/MEM
WB 0
M M
Control U M WB MEM/WB
Cause U
IF/ID X X 1
+ 0 EX EPC 0 M WB
+ Shift
M
4 Left 2 U
X
Registers = ALU M
M U
Instruction
U PC M X
80000180 memory Data
X U
memory
X
Sign-
extend
M
U
X
Forw arding
unit
13 12
M
15 $1 U
X
Forw arding
unit
13
M
U
X
Forw arding
unit
M
M Registers u
80000180 u Instruction x
PC
x memory
Write
data
Sign- ALU
Data
extend Sign-
memory
extend
Address
Hold pending
Reservation Reservation Reservation Reservation
station station ... station station operands
Source: internet
11/13/2023 Faculty of Computer Science and Engineering 9
Static RAM (SRAM) cell
Source: https://commons.wikimedia.org/wiki/File:SRAM_Cell_(6_Transistors).svg
11/13/2023 Faculty of Computer Science and Engineering 10
DRAM Technology
▪ Data stored as a charge in a capacitor
▪ Single transistor used to access the charge
▪ Must periodically be refreshed
▪ Read contents and write back
▪ Performed on a DRAM “row”
Bank
Column
Rd/Wr
Act
Pre
11/13/2023
Row
Faculty of Computer Science and Engineering 11
Advanced DRAM Organization
▪ Bits in a DRAM are organized as a rectangular
array
▪ DRAM accesses an entire row
▪ Burst mode: supply successive words from a row
with reduced latency
▪ Double data rate (DDR) DRAM
▪ Transfer on rising and falling clock edges
▪ Quad data rate (QDR) DRAM
▪ Separate DDR inputs and outputs
▪ Cache memory X1 X1
present? X3 X3
000
111
001
010
011
100
101
110
▪ Direct mapped: only one choice
▪ (Block address) modulo (#Blocks in cache)
▪ #Blocks is a power of 2
▪ Use low-order address bits
MEMORY
Given 8-byte block (line)
→ offset of x = 2
→ offset of a = 5 …
x …
a …
…
…
x …
a …
…
Idx 0 Idx 1 Idx 2 Idx 3 Idx 4 Idx 5 Idx 6 Idx 7 Idx 0 Idx 7
11/13/2023 Faculty of Computer Science and Engineering 26
Addressing - Tag
31 0
Tag Index Offset
# bits # bits # bits
Cache
▪ Tag Tag 1 Tag 0
▪ Determined which block ID is
stored at specific index in a x
cache a
Miss/Hit ratio
▪ After executing the given piece Idx 0 Idx 1 Idx 2 Idx 3 Idx 4 Idx 5 Idx 6 Idx 7
of C code. What is miss/hit ratio?
MEMORY
▪ a = b + c; Tag 0 Tag 0 Tag 0 Tag 0 Tag 0 Tag 0 Tag 0 Tag 0 Tag 1 Tag 1 … Tag n
▪ d = a + i; a z …
d c i …
▪ g= h + z;
b g …
▪ b = a + c; h …
▪ d = i + h; Idx 0 Idx 1 Idx 2 Idx 3 Idx 4 Idx 5 Idx 6 Idx 7 Idx 0 Idx 7
▪ 26
▪ 22 Index V Tag Data
▪ 26 000 N
001 N
▪ 16 010 N
▪ 3 011 N
100 N
▪ 16 101 N
▪ 18 110 Y 10 Mem[10110]
111 N
▪ 26
▪ 22 Index V Tag Data
▪ 26 000 N
001 N
▪ 16 010 Y 11 Mem[11010]
▪ 3 011 N
100 N
▪ 16 101 N
▪ 18 110 Y 10 Mem[10110]
111 N
▪ 22 22
26
10 110
11 010
Hit
Hit
110
010
▪ 26
▪ 22 Index V Tag Data
▪ 26 000 N
001 N
▪ 16 010 Y 11 Mem[11010]
▪ 3 011 N
100 N
▪ 16 101 N
▪ 18 110 Y 10 Mem[10110]
111 N
▪ 22 16
3
10 000
00 011
Miss
Miss
000
011
▪ 26 16 10 000 Hit 000
▪ 26 000 Y 10 Mem[10000]
001 N
▪ 16 010 Y 11 Mem[11010]
▪ 3 011 Y 00 Mem[00011]
100 N
▪ 16 101 N
▪ 18 110 Y 10 Mem[10110]
111 N
▪ 26
▪ 22 Index V Tag Data
▪ 26 000 Y 10 Mem[10000]
001 N
▪ 16 010 Y 10 Mem[10010]
▪ 3 011 Y 00 Mem[00011]
100 N
▪ 16 101 N
▪ 18 110 Y 10 Mem[10110]
111 N
1021
1022
1023
20 32
18 8 4 Byte Data
Hit Tag offset
Index Block offset
18 bits 512bits
V Tag Data
256
entries
18 32 32 32
=
Mux
32
11/13/2023 Faculty of Computer Science and Engineering 50
Measuring Cache Performance
▪ Components of CPU time
▪ Program execution cycles
▪ Includes cache hit time
▪ Memory stall cycles
▪ Mainly from cache misses
▪ With simplifying assumptions:
Memory stall cycles
Memory accesses
= Miss rate Miss penalty
Program
Instructio ns Misses
= Miss penalty
Program Instructio n
11/13/2023 Faculty of Computer Science and Engineering 51
I-cache, D-cache
Data
Instruction Data
memory memory
IM REG ALU DM
Stall Stall …. Stall DM REG
1 1 1
Tag 2 Tag 2
Tag 2
0 set associative
1 Set Tag Data Tag Data Tag Data Tag Data
2 0
3 1
4 Eight-way set associative (fully associative)
5
data
data
data
data
data
data
data
Tag
Tag
Tag
Tag
Tag
Tag
Tag
Tag
Tag
6
7
11/13/2023 Faculty of Computer Science and Engineering 60
Associativity Example
▪ Compare 4-block caches
▪ Direct mapped, 2-way set associative,
fully associative
▪ Block access sequence: 0, 8, 0, 6, 8
▪ Direct mapped (100% Miss)
Block Cache Hit/miss Cache content after access
address index 0 1 2 3
0 0 miss Mem[0]
8 0 miss Mem[8]
0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
11/13/2023 Faculty of Computer Science and Engineering 61
Associativity Example
▪ 2-way set associative (80% Miss)
Block Cache Hit/miss Cache content after access
address index Set 0 Set 1
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
22 8
Tag
Index
Index V Tag Data V Tag Data V Tag Data V Tag Data
0
1
2
253
254
255
22 32
= = = =
4-to-1 multiplexor
Hit Data
Application
Application
Application
Application
Application
Application
Application
Application
Guest Operating System Guest Operating System
(Linux) (Window)
VMWare Hypervisor
Physical Hardware
Disk addresses
Virtual address
31 30 29 28 27 15 14 13 12 11 10 9 8 321 0
Virtual page number Page offset
20 12
Valid Physical page number
Page table
18
If 0 then page is not
present in memory
29 28 27 15 14 13 12 11 10 9 8 321 0
Physical page number Page offset
11/13/2023 80
Physical address
Mapping Pages to Storage
Virtual page
Page table
number
Physical page or
Physical memory
Valid disk address
1
1
1
1
0
1
1
0
1 Disk storage
1
0
1
1 0 1
1 1 1
1 1 1
Physical memory
1 0 1
0 0 0
1 0 1
Page table
Physical page
Valid Dirty Ref or disk address
1 0 1
1 0 0
1 0 0 Disk storage
1 0 1
0 0 0
1 0 1
1 0 1
0 0 0
1 1 1
1 1 1
0 0 0
1 1 1
11/13/2023 Faculty of Computer Science and Engineering 84
TLB Misses
▪ If page is in memory
▪ Load the PTE from memory and retry
▪ Could be handled in hardware
▪ Can get complex for more complicated page table
structures
▪ Or in software
▪ Raise a special exception, with optimized handler
▪ If page is not in memory (page fault)
▪ OS handles fetching the page and updating the page
table
▪ Then restart the faulting instruction
11/13/2023 Faculty of Computer Science and Engineering 85
TLB Miss Handler
▪ TLB miss indicates
▪ Page present, but PTE not in TLB
▪ Page not preset
▪ Must recognize TLB miss before destination
register overwritten
▪ Raise exception
▪ Handler copies PTE from memory to TLB
▪ Then restarts instruction
▪ If page not present, page fault will occur
physical address
Virtual page number Page offset
20 12
▪ Need to translate
Valid Dirty Tag Physical page number
TLB =
=
▪ Complications due to 18 8 4 2
aliasing 8
12 Data
physical address
=
Cache hit
11/13/2023 32 88
Memory Protection
▪ Different tasks can share parts of their virtual
address spaces
▪ But need to protect against errant access
▪ Requires OS assistance
▪ Hardware support for OS protection
▪ Privileged supervisor mode (aka kernel mode)
▪ Privileged instructions
▪ Page tables and other state information only
accessible in supervisor mode
▪ System call exception (e.g., syscall in MIPS)
Ready Ready
Multiple cycles
per access