Week 12 - Lecture 12 - Memory
Week 12 - Lecture 12 - Memory
Pipeline Data
Instruction
(Instruction Supply
Supply
execution)
❑ Operations
➢ Write: turn on access FET with the wordline & charge/discharge storage
capacitor through the bitline.
➢ Read: more complicated & destructive
→ data rewritten after read.
❑ Capacitor leaks
➢ DRAM cell loses charge over time
➢ DRAM cell needs to be refreshed
Memory Technology: SRAM
❑ Static random access memory row select
bitline
bitline
➢ 2 inverters wired in a positive feedback loop
forming a bistable element (2 stable states)
➢ 4 transistors for storage Vdd GND “1”
❑ Read sequence
1. address decode
2. drive row select bit-cell array
row decoder
n+m n 2n
3. selected bit-cells drive bitlines 2n row x 2m-col
(entire row is read together) (nm to minimize
overall latency)
4. differential sensing and column select
(data is ready)
m 2m diff pairs
5. precharge all bitlines
(for next read or write) sense amp and mux
1
Memory Technology: DRAM vs. SRAM
❑ DRAM
➢ Slower access (capacitor)
➢ Higher density (1T 1C cell)
➢ Lower cost
➢ Requires refresh (power, performance, circuitry)
➢ Manufacturing requires putting capacitor and logic together
❑ SRAM
➢ Faster access (no capacitor)
➢ Lower density (6T cell)
➢ Higher cost
➢ No need for refresh
➢ Manufacturing compatible with logic process (no capacitor)
Memory Technology: Non-volatile
storage (flash)
❑ Idea:
➢ Have multiple levels of storage (progressively bigger and slower as the
levels are farther from the processor) and ensure most of the data the
processor needs is kept in the fast(er) level(s)
On-Chip Components
Control
Cache Cache
Second Secondary
Instr
ITLB
Cache
DTLB
(DRAM)
Data
(SRAM)
DRAM BANKS
DRAM INTERFACE
DRAM MEMORY
CORE 1
CORE 3
CONTROLLER
L2 CACHE 1 L2 CACHE 3
L2 CACHE 0 L2 CACHE 2
CORE 0
CORE 2
SHARED L3 CACHE
The memory locality principle
❑ One of the most important principle in computer design.
➢ A “typical” program has a lot of locality in memory references
▪ typical programs are composed of “loops”
Address 00 00 00 01 00 10 00 11 01 00 11 11
Data 0 1 2 3 4 15
➢ Start with an empty cache - all blocks initially marked as not valid
Tags and Valid Bits: example solution
Main memory
Address 00 00 00 01 00 10 00 11 01 00 11 11
Data 0 1 2 3 4 15
Cache
Processor DRAM
write buffer
❑ Write buffer is just a FIFO between the cache and main memory
➢ Typical number of entries: 4
➢ Once data has been written into the write buffer & assuming a cache hit, the
processor is done, then the memory controller will move the write buffer’s
contents to the real memory behind the scene.
➢ Works fine if store frequency (w.r.t. time) << 1/DRAM write cycle
❑ Conflict (collision):
➢ Multiple memory locations mapped to the same cache location
➢ Solution 1: increase cache size
➢ Solution 2: increase associativity (next lecture)
❑ Capacity:
➢ Cache cannot contain all blocks accessed by the program
➢ Solution: increase cache size
Handling Cache Misses (Single Word
Blocks)
❑ Read misses (I$ and D$)
➢ Stall the pipeline, fetch the block from the next level in the memory
hierarchy, install it in the cache and send the requested word to the
processor, then let the pipeline resume.
4 hit 15 miss
01 Mem(5) Mem(4) 1101 Mem(5) Mem(4)
15 14
00 Mem(3) Mem(2) 00 Mem(3) Mem(2)