Computer Architecture-Cache Microarchitecture
Computer Architecture-Cache Microarchitecture
Basic Optimizations
Cache Examples
Agenda
ECE 4750
2 / 36
Basic Optimizations
Cache Examples
ECE 4750
3 / 36
Basic Optimizations
Cache Examples
ECE 4750
4 / 36
Basic Optimizations
Cache Examples
ECE 4750
5 / 36
Basic Optimizations
Cache Examples
Synchronous SRAMs
ECE 4750
6 / 36
Basic Optimizations
Cache Examples
ECE 4750
7 / 36
Basic Optimizations
Cache Examples
ECE 4750
8 / 36
Basic Optimizations
Cache Examples
ECE 4750
9 / 36
Basic Optimizations
Cache Examples
Processor-Cache Interaction
0x4
Add
M
A
nop
PC
addr
inst
IR
D
Decode,
Register
Fetch
ALU
Primary
Data rdata
Cache
hit?
wdata
wdata
hit?
Primary
Instruction
Cache
MD1
we
addr
MD2
Stall entire
CPU on data
cache miss
To Memory Control
Cache Refill Data from Lower Levels of
Memory Hierarchy
ECE 4750
10 / 36
Basic Optimizations
Cache Examples
cs_M
cs_W
decode
br_targ
jr
j_targ
pc_plus4
br_targ
_X
pc_plus4_D
pc_F
+4
ir[15:0]
pc_sel_P
stall_D
stall_F
kill_F
addr
ir[25:0]
ir_D
j_tgen
op0
_sel_D
br_tgen
alu
_func_X
op0_X
16
ir[10:6]
wb_sel_M
result_M
ir[25:21]
rdata
ir[20:16]
imem
branch
_cond
regfile
(read)
op1_X
ir[15:0]
ir[15:0]
regfile
(write)
dmem
_wen_M
zext
sext
result_W
alu
nop
stall_D
regfile
_waddr_W
regfile
_wen_W
sd_X
sd_M
addr rdata
wdata
op1
_sel_D
dmem
bypass_from_X1
bypass_from_M
bypass_from_W
Fetch (F)
ECE 4750
Execute (X)
Tag Check
Data Access
Memory (M)
Writeback (W)
11 / 36
Basic Optimizations
Cache Examples
cs_M0
cs_M1
cs_W
decode
br_targ
jr
j_targ
pc_plus4
br_targ
_X
pc_plus4_F1 pc_plus4_D
pc_F0
+4
ir[15:0]
pc_sel_P
stall_D
stall_F0
!rdy
memreq
valrdy
ir[25:0]
kill_F1
ir_D
memresp
val
imem
branch
_cond
j_tgen
op0
_sel_D
br_tgen
op0_X
16
ir[10:6]
alu
_func_X
wb_sel_M
ir[25:21]
ir[20:16]
regfile
(read)
op1_X
regfile
(write)
alu
nop
stall_D
ir[15:0]
ir[15:0]
zext
sext
sd_X
op1
_sel_D
!rdy
bypass_from_X1
bypass_from_M
bypass_from_W
Fetch (F0/F1)
ECE 4750
regfile
_waddr_W
regfile
_wen_W
Execute (X)
memreq
valrdy
memresp
val
dmem
Tag Check
Data Access
Memory (M0/M1) Writeback (W)
12 / 36
Basic Optimizations
Cache Examples
cs_M
cs_W
decode
br_targ
jr
j_targ
pc_plus4
br_targ
_X
pc_plus4_D
pc_F
+4
ir[15:0]
pc_sel_P
stall_D
stall_F
!rdy
memreq
valrdy
kill_F
ir_D
memresp
val
imem
ir[25:0]
branch
_cond
j_tgen
op0
_sel_D
br_tgen
op0_X
16
ir[10:6]
alu
_func_X
wb_sel_M
result_M
ir[25:21]
ir[20:16]
regfile
(read)
op1_X
stall_D
ir[15:0]
ir[15:0]
sd_X
op1
_sel_D
!rdy
bypass_from_X1
bypass_from_M
bypass_from_W
Fetch (F)
ECE 4750
regfile
(write)
dmem
_wen_M
zext
sext
result_W
alu
nop
regfile
_waddr_W
regfile
_wen_W
Execute (X)
memreq
valrdy
memresp
val
dmem
azard
ew H
s?
Tag Check
Read Access
Memory (M)
Write
Access
Writeback (W)
13 / 36
Basic Optimizations
Cache Examples
Agenda
ECE 4750
14 / 36
Basic Optimizations
Cache Examples
ECE 4750
15 / 36
Basic Optimizations
Cache Examples
ECE 4750
16 / 36
Basic Optimizations
Cache Examples
ECE 4750
17 / 36
Basic Optimizations
Cache Examples
ECE 4750
18 / 36
Basic Optimizations
Cache Examples
ECE 4750
19 / 36
Basic Optimizations
Cache Examples
Agenda
ECE 4750
20 / 36
Basic Optimizations
Cache Examples
Avg Mem Access Time = Hit Time + ( Miss Rate Miss Penalty )
.
.
.
.
21 / 36
Basic Optimizations
Cache Examples
ECE 4750
22 / 36
Basic Optimizations
Cache Examples
23 / 36
Basic Optimizations
Cache Examples
ECE 4750
24 / 36
Basic Optimizations
Cache Examples
25 / 36
Basic Optimizations
Cache Examples
ECE 4750
26 / 36
Basic Optimizations
Cache Examples
Loop Interchange
for(j=0; j < N; j++) {
for(i=0; i < M; i++) {
x[i][j] = 2 * x[i][j];
}
}
27 / 36
Basic Optimizations
Cache Examples
Loop Fusion
for(i=0; i < N; i++)
a[i] = b[i] * c[i];
for(i=0; i < N; i++)
d[i] = a[i] * c[i];
28 / 36
Basic Optimizations
Cache Examples
L1 Cache
L2 Cache
Main
Memory
L1 Miss -- L2 Hit
29 / 36
Basic Optimizations
Cache Examples
ECE 4750
30 / 36
Basic Optimizations
Cache Examples
Data
Cache
Write
buffe
r
Evicted dirty lines for writeback cache
OR
All writes in writethrough cache
Unified
L2 Cache
I Write buffer may hold updated value of location needed by read miss
. On read miss, wait for write buffer to be empty
. Check write buffer addresses and bypass
ECE 4750
31 / 36
Basic Optimizations
Cache Examples
Technique
++
++
Miss
Rate
HW
0
1
0
++
Multi-level cache
Prioritize reads
ECE 4750
Miss
Penalty
0
1
1
0
2
1
32 / 36
Basic Optimizations
Cache Examples
Agenda
ECE 4750
33 / 36
Basic Optimizations
Cache Examples
On-Chip Caches
Level 1: 16KB, 4-way s.a.,
64B line, quad-port (2
load+2 store), single cycle
latency
Level 2: 256KB, 4-way s.a,
128B line, quad-port (4
load or 4 store), five cycle
latency
Level 3: 3MB, 12-way s.a.,
128B line, single 32B port,
twelve cycle latency
February 9, 2/17/2009
2010
ECE 4750
24
34 / 36
Basic Optimizations
Cache Examples
February 9, 2010
ECE 4750
25
35 / 36
Basic Optimizations
Cache Examples
Acknowledgements
ECE 4750
36 / 36