11_caches
11_caches
CIS 5710
Computer Organization and Design
Readings
• P&H Chapter 5
• 5.1-5.3, 5.5
• Appendix C.9
caches
temporal
locality
spatial
locality
int sum = 0;
int X[1000];
for(int c = 0; c < 1000; c++){
sum += X[c];
}
1.5MB L2
1024
• Each block can hold a 4-byte word
10
• Physical cache implementation
• 1K (1024 bit) by 4B SRAM
• Called data array
32
• 10-bit address input
• 32-bit data input/output
addr data
CIS 5710 | Prof Joseph Devietti
Looking Up A Block
• Which 10 of the 32 address bits to use?
• use bits [11:2]
• 2 least significant (LS) bits [1:0] are the offset bits
• Locate byte within word
• Don't need these to locate word
[11:2]
• Next 10 LS bits [11:2] are the index bits
• These locate the word
• These bits work best in practice
• Why?
0 0 C 1
1 0000 0000 0000 1100 0001
[31:12]
==
0000 0000 0000 1100 0001 0100 1011 10 00
• 4B cache, 1B blocks
• Figure out number of sets: 4 (capacity / block-size)
• Figure out how address splits into offset/index/tag bits
• Offset: least-significant log2(block-size) = log2(1) = 0
• Index: next log2(number-of-sets) = log2(4) = 2 ®
00000000
• Tag: rest = 8 – 2 = 6 ® 00000000
Regfile a
I$
D$
d
nop nop
Cache Capacity
• For a given capacity, adjust %miss via
organization
CIS 5710 | Prof Joseph Devietti
Block Size
• Given capacity, manipulate %miss by changing
organization 512*512bit
SRAM
• One option: increase block size 0
• Exploit spatial locality 1
• Boundary between index and offset changes 2
associativity
associativity
access
= =
WE
[31:15] [14:5] [4:0] <<
frame
block/line
valid bit
tag
set/row
way
block offset/displacement
address: tag index
valid bit
dirty bit
tag block
initial state - I - -
after lw r0 <= [A] C V A 0x01
L2
evicted from L2
Memory
block is cleansed
no need for dirty/valid bits or tag 0x02
$
2
1
WBB
4 3
Next-level-$
+ Better locality
+ Fewer taken branches + Better locality for code
after code3
CIS 5710 | Prof Joseph Devietti + Fewer taken branches
Cache Hierarchies
tmiss-M3 = tavg-M4
M4