12-caches-notes
12-caches-notes
CS 3410
Computer System Organization & Programming
These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer.
Programs 101
C Code MIPS Assembly
int main (int argc, char* main: addiu $sp,$sp,-
argv[ ]) { 48
int i; sw
int m = n; $31,44($sp)
sw
int sum = 0; $fp,40($sp)
for (i = 1; i <= m; i++) { move $fp,$sp
sum += i; sw
} $4,48($fp)
printf (“...”, n, sum); sw
} $5,52($fp)
la $2,n
Load/Store Architectures: lw $2,0($2)
sw
• Read data from memory $2,28($fp)
sw
(put in registers) $0,32($fp)
li $2,1
• Manipulate it
sw
Instructions
$2,24($fp) that read
from
• Store it back to memory $L2: lw
$2,24($fp)
or write to memory…
2
1 Cycle Per Stage: the Biggest Lie (So Far)
Code Stored in Memory
(also, data and stack)
compute
jump/branch
targets
A
memory
register
D
ALU
file
B
+4
addr
PC
inst
din dout
M
control
B
memory
imm
extend
new
forward
pc detect unit
Stack, Data, Code
hazard Stored in Memory
ctrl
ctrl
Fetch Decode Execute Memory Back
5
The Need for Speed
CPU Pipeline
Instruction speeds:
• add,sub,shift: 1 cycle
• mult: 3 cycles
• load/store: 100 cycles
off-chip 50(-70)ns
2(-3) GHz processor 0.5 ns clock
6
The Need for Speed
CPU Pipeline
7
What’s the solution?
Caches !
Level 1
Data $
Level 2 $
Level 1
Insn $
9
What’s the solution?
Caches !
Level 1
Data $
Level 2 $
Level 1
Insn $
What lucky
data gets to go
here?
Intel Pentium 3, 1999 10
Locality Locality Locality
total += a[i];
return total;
11
Clicker Questions
This highlights the 1 total = 0;
4 total += a[i];
4 total += a[i];
4 total += a[i];
4 total += a[i];
Last Called
Speed Dial
Favorites
Contacts
Google/Facebook/email
16
Your life is full of Locality
17
The Memory Hierarchy
Small, Fast
1 cycle,
Registers 128 bytes
4 cycles,
L1 Caches 64 KB
12 cycles,
L2 Cache 256 KB
36 cycles,
L3 Cache 2-20 MB
50-70 ns,
Main Memory 512 MB – 4 GB
Big, Slow
5-20 ms
Disk 16GB – 4 TB,
L1 Caches
I$ D$
L2 Cache
L2
L3 Cache
Main
Main Memory
Memory
Disk
Disk
21
Multi-Core Memory Hierarchy
ON CHIP
I$ D$ I$ D$ I$ D$ I$ D$
L2 L2 L2 L2
L3
Main Memory
Disk
22
Memory Hierarchy by the Numbers
CPU clock rates ~0.33ns – 2ns (3GHz-500MHz)
24
MEMORY
16 Byte Memory addr
0000
data
A
0001 B
0010 C
0011 D
load 1100 r1 0100 E
0101 F
0110 G
0111 H
1000 J
• Byte-addressable memory 1001 K
• 4 address bits 16 bytes total 1010 L
1011 M
• b addr bits 2b bytes in memory 1100 N
1101 O
1110 P
1111 Q 25
MEMORY
4-Byte, Direct Mapped Cache addr data
0000 A
CACHE 0001 B
index index data 0010 C
XXXX 00 A Cache entry 0011 D
01 B 0100 E
= row
10 C 0101 F
= (cache) line
11 D 0110 G
= (cache) block
0111 H
Block Size: 1 byte
1000 J
1001 K
Direct mapped: 1010 L
• Each address maps to 1 cache block 1011 M
• 4 entries 2 index bits (2n n bits) 1100 N
1101 O
Index with LSB: 1110 P
• Supports spatial locality 1111 Q 26
Analogy to a Spice Rack
Spice Rack Spice Wall
(Cache) (Memory)
index spice
A
B
C
D
E
F
…
Z
…
Z
28
MEMORY
4-Byte, Direct Mapped addr data
Cache 0000 A
0001 B
tag|index 0010 C
CACHE 0011 D
XXXX
index tag data 0100 E
00 00 A 0101 F
01 00 B 0110 G
10 00 C 0111 H
11 00 D 1000 J
1001 K
Tag: minimalist label/address 1010 L
address = tag + index 1011 M
1100 N
1101 O
1110 P
1111 Q 29
MEMORY
4-Byte, Direct Mapped Cache addr
0000
data
A
0001 B
0010 C
CACHE 0011 D
index V tag data 0100 E
00 0 00 X 0101 F
01 0 00 X 0110 G
10 0 00 X 0111 H
11 0 00 X 1000 J
1001 K
One last tweak: valid bit 1010 L
1011 M
1100 N
1101 O
1110 P
1111 Q 30
MEMORY
Simulation #1 addr data
of a 4-byte, DM Cache 0000 A
0001 B
tag|index 0010 C
CACHE 0011 D
XXXX
index V tag data 0100 E
00 0 11 X 0101 F
01 0 11 X 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
• Index into $ 1011 M
1100 N
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 31
MEMORY
Simulation #1 addr data
of a 4-byte, DM Cache 0000 A
0001 B
tag|index 0010 C
CACHE 0011 D
XXXX
index V tag data 0100 E
00 1 11 N 0101 F
01 0 xx X 0110 G
10 0 xx X 0111 H
11 0 xx X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
• Index into $ 1011 M
1100 N
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 32
MEMORY
Simulation #1 addr data
of a 4-byte, DM Cache 0000 A
0001 B
tag|index 0010 C
CACHE 0011 D
XXXX
index V tag data 0100 E
00 1 11 N 0101 F
01 0 11 X 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
... • Index into $ 1011 M
load 1100 Hit! 1100 N
• Check tag
1101 O
Awesome! • Check valid bit 1110 P
1111 Q 33
Block Diagram
4-entry, direct mapped Cache
tag|index
CACHE
1101
V tag data
2 2 1 00 1111 0000
1 11 1010 0101
0 01 1010 1010 Great!
1 11 0000 0000
Are we done?
2 8
= 1010 0101
data
Hit!
36
MEMORY
Simulation #2: addr data
0000 A
4-byte, DM Cache 0001 B
Clicker: CACHE
0010 C
A) Hit 0011 D
B) Miss index V tag data 0100 E
00 0 11 X 0101 F
01 0 11 X 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 • Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 37
MEMORY
Simulation #2: addr data
4-byte, DM Cache 0000 A
0001 B
0010 C
CACHE 0011 D
index V tag data 0100 E
00 1 11 N 0101 F
01 0 11 X 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 • Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 38
MEMORY
Simulation #2: addr data
4-byte, DM Cache 0000 A
0001 B
Clicker: CACHE
0010 C
0011 D
A) Hit
index V tag data 0100 E
B) Miss 00 1 11 N 0101 F
01 0 11 X 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Miss
• Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 39
MEMORY
Simulation #2: addr data
4-byte, DM Cache 0000 A
0001 B
tag|index 0010 C
CACHE 0011 D
XXXX
index V tag data 0100 E
00 1 11 N 0101 F
01 1 11 O 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Miss
• Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 40
MEMORY
Simulation #2: addr data
4-byte, DM Cache 0000 A
0001 B
Clicker: CACHE
0010 C
0011 D
A) Hit
index V tag data 0100 E
B) Miss 00 1 11 N 0101 F
01 1 11 O 0110 G
10 0 xx X 0111 H
11 0 xx X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Miss
• Index into $ 1011 M
load 0100 Miss 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 41
MEMORY
Simulation #2: addr data
4-byte, DM Cache 0000 A
0001 B
0010 C
CACHE 0011 D
index V tag data 0100 E
00 1 01 E 0101 F
01 1 11 O 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Miss
• Index into $ 1011 M
load 0100 Miss 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 42
MEMORY
Simulation #2: addr data
4-byte, DM Cache 0000 A
0001 B
Clicker: CACHE
0010 C
0011 D
A) Hit
index V tag data 0100 E
B) Miss 00 1 01 E 0101 F
01 1 11 O 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Miss
• Index into $ 1011 M
load 0100 Miss 1100 N
load 1100
• Check tag
Miss 1101 O
• Check valid bit 1110 P
1111 Q 43
MEMORY
Simulation #2: addr data
4-byte, DM Cache 0000 A
0001 B
tag|index 0010 C
CACHE 0011 D
XXXX
index V tag data 0100 E
00 1 11 N 0101 F
01 1 11 O 0110 G
10 0 11 X 0111 H
11 0 11 X 1000 J
1001 K
load 1100 Miss cold 1010 L
Disappointed!
load 1101 Miss cold 1011 M
load
load
0100
1100
Miss
Miss
cold
1100
1101
N
O
1110 P
1111 Q 44
Reducing Cold Misses
by Increasing Block Size
Leveraging Spatial Locality
45
MEMORY
Increasing Block Size addr
0000
data
A
0001 B
CACHE
offset 0010 C
index V tag data 0011 D
XXXX
00 0 x A | B 0100 E
01 0 x C | D 0101 F
10 0 x E | F 0110 G
11 0 x G | H 0111 H
1000 J
• Block Size: 2 bytes 1001 K
1010 L
• Block Offset: least significant bits 1011 M
indicate where you live in the block 1100 N
1101 O
• Which bits are the index? tag?
1110 P
1111 Q 46
MEMORY
Simulation #3: addr data
index 8-byte, DM Cache 0000 A
0001 B
CACHE
tag| |offset 0010 C
index V tag data 0011 D
XXXX
00 0 x X | X 0100 E
01 0 x X | X 0101 F
10 0 x X | X 0110 G
11 0 x X | X 0111 H
1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 • Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 47
MEMORY
Simulation #3: addr data
index 8-byte, DM Cache 0000 A
0001 B
CACHE
tag| |offset 0010 C
index V tag data 0011 D
XXXX
00 0 x X | X 0100 E
01 0 x X | X 0101 F
10 1 1 N | O 0110 G
11 0 x X | X 0111 H
1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 • Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 48
MEMORY
Simulation #3: addr data
index 8-byte, DM Cache 0000 A
0001 B
CACHE
tag| |offset 0010 C
index V tag data 0011 D
XXXX
00 0 x X | X 0100 E
01 0 x X | X 0101 F
10 1 1 N | O 0110 G
11 0 x X | X 0111 H
1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Hit!
• Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag
1101 O
• Check valid bit 1110 P
1111 Q 49
MEMORY
Simulation #3: addr data
index 8-byte, DM Cache 0000 A
0001 B
CACHE
tag| |offset 0010 C
index V tag data 0011 D
XXXX
00 0 x X | X 0100 E
01 0 x X | X 0101 F
10 1 1 N | O 0110 G
11 0 x X | X 0111 H
1000 J
1001 K
Miss
load 1100 Lookup: 1010 L
load 1101 Hit!
• Index into $ 1011 M
load 0100 Miss 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 50
MEMORY
Simulation #3: addr data
index 8-byte, DM Cache 0000 A
0001 B
CACHE
tag| |offset 0010 C
index V tag data 0011 D
XXXX
00 0 x X | X 0100 E
01 0 x X | X 0101 F
10 1 0 E | F 0110 G
11 0 x X | X 0111 H
1000 J
1001 K
Miss
load 1100 Lookup: 1010 L
load 1101 Hit!
• Index into $ 1011 M
load 0100 Miss 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
1111 Q 51
MEMORY
Simulation #3: addr data
index 8-byte, DM Cache 0000 A
0001 B
CACHE
tag| |offset 0010 C
index V tag data 0011 D
XXXX
00 0 x X | X 0100 E
01 0 x X | X 0101 F
10 1 0 E | F 0110 G
11 0 x X | X 0111 H
1000 J
1001 K
Miss
load 1100 Lookup: 1010 L
load 1101 Hit!
• Index into $ 1011 M
load 0100 Miss 1100 N
load 1100
• Check tag
Miss 1101 O
• Check valid bit 1110 P
1111 Q 52
MEMORY
Simulation #3: addr data
8-byte, DM Cache 0000 A
0001 B
CACHE
0010 C
index V tag data 0011 D
00 0 x X | X 0100 E
01 0 x X | X 0101 F
10 1 0 E | F 0110 G
11 0 x X | X 0111 H
1000 J
1001 K
load 1100 Miss cold 1010 L
1 hit, 3 misses
load 1101 Hit!
3 bytes don’t fit in 1011 M
load 0100 Miss cold
a 4 entry cache? 1100 N
load 1100 Miss conflict 1101 O
1110 P
1111 Q 53
Removing Conflict Misses
with Fully-Associative Caches
54
MEMORY
8 byte, fully-associative Cache addr
0000
data
A
0001 B
XXXX 0010 C
tag|offset
offset 0011 D
CACHE 0100 E
0101 F
V tag data V tag data V tag data V tag data 0110 G
0 xxx X | X 0 xxx X | X 0 xxx X | X 0 xxx X | X 0111 H
1000 J
What should the offset be? Clicker: 1001 K
= 9 cycles = 9 cycles 60
Pros & Cons
Direct Mapped Fully Associative
Tag Size Smaller Larger
SRAM Overhead Less More
Controller Logic Less More
Speed Faster Slower
Price Less More
Scalability Very Not Very
# of conflict misses Lots Zero
Hit Rate Low High
Pathological Cases Common ?
Reducing Conflict Misses
with Set-Associative Caches
Not too conflict-y. Not too slow.
… Just Right!
62
MEMORY
8 byte, 2-way addr data
set associative Cache 0000 A
0001 B
XXXX 0010 C
tag||offset
offset
index
0011 D
CACHE 0100 E
index V tag data V tag data 0101 F
0 0 xx E | F 0 xx N | O 0110 G
1 0 xx C | D 0 xx P | Q 0111 H
1000 J
1001 K
What should the offset be?
1010 L
What should the index be? 1011 M
1100 N
What should the tag be? 1101 O
1110 P
1111 Q 63
Clicker Question
5 bit address
XXXXX
2 byte block size
24 byte, 3-Way Set Associative CACHE
index V tag data V tag data V tag data
00 0 ? X | Y 0 ? X’ | Y’ 0 ? X’’ | Y’’
01 0 ? X | Y 0 ? X’ | Y’ 0 ? X’’ | Y’’
10 0 ? X | Y 0 ? X’ | Y’ 0 ? X’’ | Y’’
11 0 ? X | Y 0 ? X’ | Y’ 0 ? X’’ | Y’’
A) 0
B) 1
How many tag bits?
C) 2
D) 3
64
E) 4
Clicker Question
5 bit address
XXXXX
2 byte block size
24 byte, 3-Way Set Associative CACHE
index V tag data V tag data V tag data
00 0 ? X | Y 0 ? X’ | Y’ 0 ? X’’ | Y’’
01 0 ? X | Y 0 ? X’ | Y’ 0 ? X’’ | Y’’
10 0 ? X | Y 0 ? X’ | Y’ 0 ? X’’ | Y’’
11 0 ? X | Y 0 ? X’ | Y’ 0 ? X’’ | Y’’
A) 0
B) 1
How many tag bits?
C) 2
D) 3
65
E) 4
MEMORY
8 byte, 2-way addr data
set associative Cache 0000 A
0001 B
XXXX 0010 C
tag||offset
index
0011 D
CACHE 0100 E
index V tag data V tag data 0101 F
0 0 xx X | X 0 xx X | X 0110 G
1 0 xx X | X 0 xx X | X 0111 H
1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 • Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
LRU Pointer 1111 Q 66
MEMORY
8 byte, 2-way addr data
set associative Cache 0000 A
0001 B
XXXX 0010 C
tag||offset
index
0011 D
CACHE 0100 E
index V tag data V tag data 0101 F
0 1 11 N | O 0 xx X | X 0110 G
1 0 xx X | X 0 xx X | X 0111 H
1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Hit!
• Index into $ 1011 M
load 0100 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
LRU Pointer 1111 Q 67
MEMORY
8 byte, 2-way addr data
set associative Cache 0000 A
0001 B
XXXX 0010 C
tag||offset
index
0011 D
CACHE 0100 E
index V tag data V tag data 0101 F
0 1 11 N | O 0 xx X | X 0110 G
1 0 xx X | X 0 xx X | X 0111 H
1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Hit!
• Index into $ 1011 M
load 0100 Miss 1100 N
load 1100
• Check tag 1101 O
• Check valid bit 1110 P
LRU Pointer 1111 Q 68
MEMORY
8 byte, 2-way addr data
set associative Cache 0000 A
0001 B
XXXX 0010 C
tag||offset
index
0011 D
CACHE 0100 E
index V tag data V tag data 0101 F
0 1 11 N | O 1 01 E | F 0110 G
1 0 xx X | X 0 xx X | X 0111 H
1000 J
1001 K
load 1100 Miss Lookup: 1010 L
load 1101 Hit!
• Index into $ 1011 M
load 0100 Miss 1100 N
load 1100 Hit! • Check tag 1101 O
• Check valid bit 1110 P
LRU Pointer 1111 Q 69
Eviction Policies
71
Miss Rate vs. Block Size
72
Block Size Tradeoffs
• For a given total cache size,
Larger block sizes mean….
– fewer lines
– so fewer tags, less overhead
– and fewer cold misses (within-block “prefetching”)
• But also…
– fewer blocks available (for scattered accesses!)
– so more conflicts
– can decrease performance if working set can’t fit in $
– and larger miss penalty (time to fetch block)
Miss Rate vs. Associativity
74
Clicker Question
75
Clicker Question
76
ABCs of Caches
tavg = thit + %miss* tmiss
+ Associativity:
⬇conflict misses
⬆hit time
+ Block Size:
⬇cold misses
⬆conflict misses
+ Capacity:
⬇capacity misses
⬆hit time
77
Which caches get what properties?
tavg = thit + %miss* tmiss
Design with
Fast
speed in mind
L1 Caches
More Associative
L2 Cache Bigger Block Sizes
Larger Capacity
L3 Cache
Design with miss
Big rate in mind
78
Roadmap
• Things we have covered:
– The Need for Speed
– Locality to the Rescue!
– Calculating average memory access time
– $ Misses: Cold, Conflict, Capacity
– $ Characteristics: Associativity, Block Size, Capacity
• Things we will now cover:
– Cache Figures
– Cache Performance Examples
– Writes
79
2-Way Set Associative Cache (Reading)
= =
line select
64bytes
word select
32bits 80
hit? data
3-Way Set Associative Cache (Reading)
= = =
line select
64bytes
word select
32bits 81
hit? data
How Big is the Cache?
Tag Index Offset
Memory
Instructions:
0 78
LB $1 M[ 1 ] 1 29
LB $2 M[ 7 ] 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V tag data 4 71
LB $2 M[ 10 ] 1 0 5 150
6 162
SB $1 M[ 5 ] 0 0 7 173
SB $1 M[ 10 ] 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 12 19
Misses: 0 13 200
$2
$3 Hits: 0 14 210
Reads: 0 15 225
Writes: 0
Write-Through (REF 1)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V tag data 4 71
LB $2 M[ 10 ] 0 1 000 78 5 150
29 6 162
SB $1 M[ 5 ] 1 0 7 173
SB $1 M[ 10 ] 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
Misses: 1 13 200
$2
$3 Hits: 0 14 210
Reads: 2 15 225
Writes: 0
Write-Through (REF 2)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V tag data 4 71
LB $2 M[ 10 ] 0 1 000 78 5 150
29 6 162
SB $1 M[ 5 ] 1 0 7 173
SB $1 M[ 10 ] 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
Misses: 1 13 200
$2
$3 Hits: 0 14 210
Reads: 2 15 225
Writes: 0
Write-Through (REF 2)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V tag data 4 71
LB $2 M[ 10 ] 0 1 000 78 5 150
29 6 162
SB $1 M[ 5 ] 1 1 011 162 7 173
SB $1 M[ 10 ] 173 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
173
Misses: 2 13 200
$2
$3 Hits: 0 14 210
Reads: 4 15 225
Writes: 0
Write-Through (REF 3)
CLICKER:
(A) HIT Memory
Instructions:
LB $1 M[ 1 ] M (B) MISS 0 78
1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V tag data 4 71
LB $2 M[ 10 ] 1 1 000 78 5 150
29 6 162
SB $1 M[ 5 ] 0 1 011 162 7 173
SB $1 M[ 10 ] 173 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
173
Misses: 2 13 200
$2
$3 Hits: 0 14 210
Reads: 4 15 225
Writes: 0
Write-Through (REF 3)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] lru V tag data 4 71
LB $2 M[ 10 ] 0 1 000 173 5 150
29 6 162
SB $1 M[ 5 ] 1 1 011 162 7 173
SB $1 M[ 10 ] 173 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
173
Misses: 2 13 200
$2
$3 Hits: 1 14 210
Reads: 4 15 225
Writes: 1
Write-Through (REF 4)
write-allocate
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V tag data 4 71
LB $2 M[ 10 ] 0 1 000 173 5 150
29 6 162
SB $1 M[ 5 ] 1 1 011
010 7
162
71 173
SB $1 M[ 10 ] 150
173 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
173
Misses: 2 13 200
$2
$3 Hits: 1 14 210
Reads: 4 15 225
Writes: 1
Write-Through (REF 4)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V tag data 4 71
LB $2 M[ 10 ] 1 1 000 173 5 150
29
29 6 162
SB $1 M[ 5 ] 0 1 010 71 7 173
SB $1 M[ 10 ] 150
29
150 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
173
Misses: 3 13 200
$2
$3 Hits: 1 14 210
Reads: 6 15 225
Writes: 2
Write-Through (REF 5)
CLICKER:
(A) HIT Memory
Instructions:
LB $1 M[ 1 ] (B) MISS 0 173
M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V tag data 4 71
LB $2 M[ 10 ] 1 1 000 173 5 29
29 6 162
SB $1 M[ 5 ] 0 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
173
Misses: 3 13 200
$2
$3 Hits: 1 14 210
Reads: 6 15 225
Writes: 2
Write-Through (REF 5)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V tag data 4 71
LB $2 M[ 10 ] 0 1 101 33 5 29
M
28 6 162
SB $1 M[ 5 ] 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
33
Misses: 4 13 200
$2
$3 Hits: 1 14 210
Reads: 8 15 225
Writes: 2
Write-Through (REF 6)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V tag data 4 71
LB $2 M[ 10 ] 0 1 101 33 5 29
M
28 6 162
SB $1 M[ 5 ] 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
33
Misses: 4 13 200
$2
$3 Hits: 1 14 210
Reads: 8 15 225
Writes: 2
Write-Through (REF 6)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V tag data 4 71
LB $2 M[ 10 ] 0 1 101 33 5 29
M
28 6 162
SB $1 M[ 5 ] Hit 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
33
Misses: 4 13 200
$2
$3 Hits: 2 14 210
Reads: 8 15 225
Writes: 3
Write-Through (REF 7)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V tag data 4 71
LB $2 M[ 10 ] 0 1 101 33 5 29
M
28 6 162
SB $1 M[ 5 ] Hit 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
$1 29 12 19
33
Misses: 4 13 200
$2
$3 Hits: 2 14 210
Reads: 8 15 225
Writes: 3
Write-Through (REF 7)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V tag data 4 71
LB $2 M[ 10 ] 0 1 101 29
33 5 29
M
28 6 162
SB $1 M[ 5 ] Hit 1 1 010 71 7 173
SB $1 M[ 10 ] Hit 29 8 18
9 21
Register File Cache 10 29
33
$0 11 28
$1 29 12 19
33
Misses: 4 13 200
$2
$3 Hits: 3 14 210
Reads: 8 15 225
Writes: 4
Summary: Write Through
Memory
Instructions:
0 78
LB $1 M[ 1 ] 1 29
LB $2 M[ 7 ] 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V d tag data 4 71
LB $2 M[ 10 ] 1 0 5 150
6 162
SB $1 M[ 5 ] 0 0 7 173
SB $1 M[ 10 ] 8 18
9 21
Register File Cache 10 33
$0 11 28
Misses: 0 12 19
$1
$2 Hits: 0 13 200
$3 Reads: 0 14 210
Writes: 0 15 225
Write-Back (REF 1)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V d tag data 4 71
LB $2 M[ 10 ] 0 1 0 000 78 5 150
29 6 162
SB $1 M[ 5 ] 1 0 7 173
SB $1 M[ 10 ] 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 1 12 19
$1
$2 Hits: 0 13 200
$3 Reads: 2 14 210
Writes: 0 15 225
Write-Back (REF 2)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V d tag data 4 71
LB $2 M[ 10 ] 0 1 0 000 78 5 150
29 6 162
SB $1 M[ 5 ] 1 0 7 173
SB $1 M[ 10 ] 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 1 12 19
$1
$2 Hits: 0 13 200
$3 Reads: 2 14 210
Writes: 0 15 225
Write-Back (REF 2)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V d tag data 4 71
LB $2 M[ 10 ] 1 1 0 000 78 5 150
29 6 162
SB $1 M[ 5 ] 0 1 0 011 162 7 173
SB $1 M[ 10 ] 173 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 2 12 19
$1
$2 173 Hits: 0 13 200
$3 Reads: 4 14 210
Writes: 0 15 225
Write-Back (REF 3)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] 3 123
SB $1 M[ 5 ] lru V d tag data 4 71
LB $2 M[ 10 ] 1 1 0 000 78 5 150
29 6 162
SB $1 M[ 5 ] 0 1 0 011 162 7 173
SB $1 M[ 10 ] 173 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 2 12 19
$1
$2 173 Hits: 0 13 200
$3 Reads: 4 14 210
Writes: 0 15 225
Write-Back (REF 3)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] lru V d tag data 4 71
LB $2 M[ 10 ] 0 1 1 000 173 5 150
29 6 162
SB $1 M[ 5 ] 1 1 0 011 162 7 173
SB $1 M[ 10 ] 173 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 2 12 19
$1
$2 173 Hits: 1 13 200
$3 Reads: 4 14 210
Writes: 0 15 225
Write-Back (REF 4)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] lru V d tag data 4 71
LB $2 M[ 10 ] 0 1 1 000 173 5 150
29 6 162
SB $1 M[ 5 ] 1 1 0 011 162 7 173
SB $1 M[ 10 ] 173 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 2 12 19
$1
$2 173 Hits: 1 13 200
$3 Reads: 4 14 210
Writes: 0 15 225
Write-Back (REF 4)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] lru V d tag data 4 71
LB $2 M[ 10 ] 0 1 1 000 173 5 150
29 6 162
SB $1 M[ 5 ] 1 1 0 011 162 7 173
SB $1 M[ 10 ] 173 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 3 12 19
$1
$2 173 Hits: 1 13 200
$3 Reads: 6 14 210
Writes: 0 15 225
Write-Back (REF 4)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V d tag data 4 71
LB $2 M[ 10 ] 1 1 1 000 173 5 150
29 6 162
SB $1 M[ 5 ] 0 1 1 010 71 7 173
SB $1 M[ 10 ] 150
29 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 3 12 19
$1
$2 173 Hits: 1 13 200
$3 Reads: 6 14 210
Writes: 0 15 225
Write-Back (REF 5)
Memory
Instructions:
0 78
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V d tag data 4 71
LB $2 M[ 10 ] 1 1 1 000 173 5 150
29 6 162
SB $1 M[ 5 ] 0 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 3 12 19
$1
$2 173 Hits: 1 13 200
$3 Reads: 6 14 210
Writes: 0 15 225
Write-Back (REF 5)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V d tag data 4 71
LB $2 M[ 10 ] M 0 1 0 101 33 5 150
28 6 162
SB $1 M[ 5 ] 1 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 4 12 19
$1
$2 33 Hits: 1 13 200
$3 Reads: 8 14 210
Writes: 2 15 225
Write-Back (REF 6)
CLICKER:
(A) HIT Memory
Instructions:
LB $1 M[ 1 ] M (B) MISS 0 173
1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V d tag data 4 71
LB $2 M[ 10 ] M 0 1 0 101 33 5 150
28 6 162
SB $1 M[ 5 ] 1 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 4 12 19
$1
$2 33 Hits: 1 13 200
$3 Reads: 8 14 210
Writes: 2 15 225
Write-Back (REF 6)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V d tag data 4 71
LB $2 M[ 10 ] M 1 1 0 101 33 5 150
28 6 162
SB $1 M[ 5 ] Hit 0 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 4 12 19
$1
$2 33 Hits: 2 13 200
$3 Reads: 8 14 210
Writes: 2 15 225
Write-Back (REF 7)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V d tag data 4 71
LB $2 M[ 10 ] M 1 1 0 101 33 5 150
28 6 162
SB $1 M[ 5 ] Hit 0 1 1 010 71 7 173
SB $1 M[ 10 ] 29 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 4 12 19
$1
$2 33 Hits: 2 13 200
$3 Reads: 8 14 210
Writes: 2 15 225
Write-Back (REF 7)
Memory
Instructions:
0 173
LB $1 M[ 1 ] M 1 29
LB $2 M[ 7 ] M 2 120
SB $2 M[ 0 ] Hit 3 123
SB $1 M[ 5 ] M lru V d tag data 4 71
LB $2 M[ 10 ] M 0 1 1 101 29 5 150
28 6 162
SB $1 M[ 5 ] Hit 1 1 1 010 71 7 173
SB $1 M[ 10 ] Hit 29 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 4 12 19
$1
$2 33 Hits: 3 13 200
$3 Reads: 8 14 210
Writes: 2 15 225
Write-Back (REF 8,9)
M
M Memory
Instructions:
0 173
... Hit
1 29
SB $1 M[ 5 ] M 2 120
LB $2 M[ 10 ] M 3 123
SB $1 M[ 5 ] Hit lru V d tag data 4 71
SB $1 M[ 10 ] Hit 0 1 1 101 29 5 150
28 6 162
SB $1 M[ 5 ] Hit
1 1 1 010 71 7 173
SB $1 M[ 10 ] Hit 29 8 18
9 21
Register File Cache 10 33
$0 11 28
29 Misses: 4 12 19
$1
$2 33 Hits: 3 13 200
$3 Reads: 8 14 210
Writes: 2 15 225
How Many Memory References?
Write-back performance
• How many reads?
– Each miss (read or write) reads a block from mem
– 4 misses 8 mem reads
• How many writes?
– Some evictions write a block to mem
– 1 dirty eviction 2 mem writes
– (+ 2 dirty evictions later +4 mem writes)
Write-back vs. Write-through Example
Assume: large associative cache, 16-byte lines
N 4-byte words
• Write-through is slower
– But simpler (memory always consistent)
YOUR CACHE
H
MIND
137
Clicker Question
Choose the best block size for your cache among the
choices given. Assume that integers and pointers are all 4
bytes each and that the scores array is 4-byte aligned.
(a) 1 byte (b) 4 bytes (c) 8 bytes (d) 16 bytes (e) 32 bytes
138
Clicker Question
Choose the best block size for your cache among the
choices given. Assume integers and pointers are 4 bytes.
(a) 1 byte (b) 4 bytes (c) 8 bytes (d) 16 bytes (e) 32 bytes
typedef struct item_t {
int value;
struct item_t *next;
char *name;
} item_t;
int sum = 0;
item_t *curr = list_head;
while (curr != NULL) {
sum += curr->value;
curr = curr->next;
} 139
Clicker Question
Choose the best block size for your cache among the
choices given. Assume integers and pointers are 4 bytes.
(a) 1 byte (b) 4 bytes (c) 8 bytes (d) 16 bytes (e) 32 bytes
typedef struct item_t {
int value;
struct item_t *next;
char *name;
} item_t;
int sum = 0;
item_t *curr = list_head;
while (curr != NULL) {
sum += curr->value;
curr = curr->next;
} 140
By the end of the cache lectures…
•
•
> dmidecode -t cache A Real Example
Cache Information
• Socket Designation: L1 Cache Microsoft Surfacebook
• Configuration: Enabled, Not Socketed,
Level 1 Dual core
• Operational Mode: Write Back
• Location: Internal Intel i7-6600 CPU @ 2.6 GHz
•
•
Installed Size: 128 kB
Maximum Size: 128 kB (purchased in 2016)
Cache Information
• Supported SRAM Types: Socket Designation: L3 Cache
• Synchronous Configuration: Enabled, Not Socketed,
• Installed SRAM Type: Synchronous Level 3
• Speed: Unknown Operational Mode: Write Back
• Error Correction Type: Parity Location: Internal
Installed Size: 4096 kB
• System Type: Unified Maximum Size: 4096 kB
• Associativity: 8-way Set-associative
Supported SRAM Types:
Synchronous
• Cache Information Installed SRAM Type: Synchronous
• Socket Designation: L2 Cache Speed: Unknown
Error Correction Type: Multi-bit ECC
• Configuration: Enabled, Not Socketed,
System Type: Unified
• Level 2 Associativity: 16-way Set-associative
• Operational Mode: Write Back
• Location: Internal
• Installed Size: 512 kB
• Maximum Size: 512 kB
• Supported SRAM Types:
• Synchronous
• Installed SRAM Type: Synchronous
• Speed: Unknown
• Error Correction Type: Single-bit ECC
• System Type: Unified
• Associativity: 4-way Set-associative
•
A Real Example
> sudo dmidecode -t cache
• Cache Information
•
•
Dual-core 3.16GHz Intel
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
• Installed Size: 128 KB (purchased in 2011)
• Error Correction Type: None
• Cache Information
• Configuration: Enabled, Not Socketed, Level 2
• Operational Mode: Varies With Memory Address
• Installed Size: 6144 KB
• Error Correction Type: Single-bit ECC
• > cd /sys/devices/system/cpu/cpu0; grep cache/*/*
• cache/index0/level:1
• cache/index0/type:Data
• cache/index0/ways_of_associativity:8
• cache/index0/number_of_sets:64
• cache/index0/coherency_line_size:64
• cache/index0/size:32K
• cache/index1/level:1
• cache/index1/type:Instruction
• cache/index1/ways_of_associativity:8
• cache/index1/number_of_sets:64
• cache/index1/coherency_line_size:64
• cache/index1/size:32K
• cache/index2/level:2
• cache/index2/type:Unified
• cache/index2/shared_cpu_list:0-1
• cache/index2/ways_of_associativity:24
• cache/index2/number_of_sets:4096
• cache/index2/coherency_line_size:64
• cache/index2/size:6144K
A Real Example