0% found this document useful (0 votes)
76 views

Large and Fast: Exploiting Memory Hierarchy

This document describes the memory hierarchy and caching techniques used to improve memory performance. It discusses how memory is organized into a hierarchy with smaller, faster levels closer to the CPU caching data from larger, slower levels further away. The key principles of locality are exploited by caching recently and spatially accessed data. Caches use tags and valid bits to track what data is stored in each block. Write policies like write-through and write-back aim to balance consistency with performance when data is written. Block size, replacement policies, and handling of write misses also impact cache performance.

Uploaded by

Afs Asg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Large and Fast: Exploiting Memory Hierarchy

This document describes the memory hierarchy and caching techniques used to improve memory performance. It discusses how memory is organized into a hierarchy with smaller, faster levels closer to the CPU caching data from larger, slower levels further away. The key principles of locality are exploited by caching recently and spatially accessed data. Caches use tags and valid bits to track what data is stored in each block. Write policies like write-through and write-back aim to balance consistency with performance when data is written. Block size, replacement policies, and handling of write misses also impact cache performance.

Uploaded by

Afs Asg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter 5

Large and Fast:


Exploiting Memory
Hierarchy
Introduction
Programmers want unlimited amounts of memory with
low latency
Fast memory technology is more expensive per bit than
slower memory
Solution: organize memory system into a hierarchy
Entire addressable memory space available in largest, slowest
memory
Incrementally smaller and faster memories, each containing a
subset of the memory below it, proceed in steps up toward the
processor
Temporal and spatial locality insures that nearly all
references can be found in smaller memories
Gives the allusion of a large, fast memory being presented to the
processor

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 2


5.1 Introduction
Memory Technology
Static RAM (SRAM)
0.5ns 2.5ns, $2000 $5000 per GB
Dynamic RAM (DRAM)
50ns 70ns, $20 $75 per GB
Magnetic disk
5ms 20ms, $0.20 $2 per GB
Ideal memory
Access time of SRAM
Capacity and cost/GB of disk

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 3


Principle of Locality
Programs access a small proportion of
their address space at any time
Temporal locality
Items accessed recently are likely to be
accessed again soon
e.g., instructions in a loop, induction variables
Spatial locality
Items near those accessed recently are likely
to be accessed soon
E.g., sequential instruction access, array data
Chapter 5 Large and Fast: Exploiting Memory Hierarchy 4
Taking Advantage of Locality
Memory hierarchy
Store everything on disk
Copy recently accessed (and nearby)
items from disk to smaller DRAM memory
Main memory
Copy more recently accessed (and
nearby) items from DRAM to smaller
SRAM memory
Cache memory attached to CPU

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5


Memory Hierarchy Levels
Block (aka line): unit of copying
May be multiple words
If accessed data is present in
upper level
Hit: access satisfied by upper level
Hit ratio: hits/accesses
If accessed data is absent
Miss: block copied from lower level
Time taken: miss penalty
Miss ratio: misses/accesses
= 1 hit ratio
Then accessed data supplied from
upper level

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 6


5.2 The Basics of Caches
Cache Memory
Cache memory
The level of the memory hierarchy closest to
the CPU
Given accesses X1, , Xn1, Xn

How do we know if
the data is present?
Where do we look?

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 7


Direct Mapped Cache
Location determined by address
Direct mapped: only one choice
(Block address) modulo (#Blocks in cache)

#Blocks is a
power of 2
Use low-order
address bits

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 8


Tags and Valid Bits
How do we know which particular block is
stored in a cache location?
Store block address as well as the data
Actually, only need the high-order bits
Called the tag
What if there is no data in a location?
Valid bit: 1 = present, 0 = not present
Initially 0

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 9


Cache Example
8-blocks, 1 word/block, direct mapped
Initial state

Index V Tag Data


000 N
001 N
010 N
011 N
100 N
101 N
110 N
111 N

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 10


Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110

Index V Tag Data


000 N
001 N
010 N
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 11


Cache Example
Word addr Binary addr Hit/miss Cache block
26 11 010 Miss 010

Index V Tag Data


000 N
001 N
010 Y 11 Mem[11010]
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 12


Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Hit 110
26 11 010 Hit 010

Index V Tag Data


000 N
001 N
010 Y 11 Mem[11010]
011 N
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 13


Cache Example
Word addr Binary addr Hit/miss Cache block
16 10 000 Miss 000
3 00 011 Miss 011
16 10 000 Hit 000

Index V Tag Data


000 Y 10 Mem[10000]
001 N
010 Y 11 Mem[11010]
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 14


Cache Example
Word addr Binary addr Hit/miss Cache block
18 10 010 Miss 010

Index V Tag Data


000 Y 10 Mem[10000]
001 N
010 Y 10 Mem[10010]
011 Y 00 Mem[00011]
100 N
101 N
110 Y 10 Mem[10110]
111 N

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 15


Address Subdivision

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 16


Example: Larger Block Size
Consider a cache with 64 blocks and a
block size of 16 bytes. What block number
does byte address 1200 map to?

Block address = 1200/16 = 75


Block number = 75 modulo 64 = 11
1200 = (40)16
31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 17


Block Size Considerations
Larger blocks should reduce miss rate
Due to spatial locality
But in a fixed-sized cache
Larger blocks fewer of them
More competition increased miss rate
Larger blocks pollution
Larger miss penalty
Can override benefit of reduced miss rate
Early restart and critical-word-first can help

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 18


Cache Misses
On cache hit, CPU proceeds normally
On cache miss
Stall the CPU pipeline
Fetch block from next level of hierarchy
Instruction cache miss
Restart instruction fetch
Data cache miss
Complete data access

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 19


Write-Through
On data-write hit, could just update the block in
cache
But then cache and memory would be inconsistent
Write through: also update memory
But makes writes take longer
e.g., if base CPI = 1, 10% of instructions are stores,
write to memory takes 100 cycles
Effective CPI = 1 + 0.1100 = 11
Solution: write buffer
Holds data waiting to be written to memory
CPU continues immediately
Only stalls on write if write buffer is already full

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 20


Write-Back
Alternative: On data-write hit, just update
the block in cache
Keep track of whether each block is dirty
When a dirty block is replaced
Write it back to memory
Can use a write buffer to allow replacing block
to be read first

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 21


Write Allocation
What should happen on a write miss?
Alternatives for write-through
Allocate on miss: fetch the block
Write around: dont fetch the block
Since programs often write a whole block before
reading it (e.g., initialization)
For write-back
Usually fetch the block

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 22


Example: Intrinsity FastMATH
Embedded MIPS processor
12-stage pipeline
Instruction and data access on each cycle
Split cache: separate I-cache and D-cache
Each 16KB: 256 blocks 16 words/block
D-cache: write-through or write-back
SPEC2000 miss rates
I-cache: 0.4%
D-cache: 11.4%
Weighted average: 3.2%

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 23


Example: Intrinsity FastMATH

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 24

You might also like