0% found this document useful (0 votes)

6 views

cache_ppt

Uploaded by

Tharun Chitipolu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

cache_ppt

Uploaded by

Tharun Chitipolu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Computer Architecture

ELE 475 / COS 475

Slide Deck 3: Cache Review
David Wentzlaff
Department of Electrical Engineering
Princeton University

1
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

2
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

3
Naive Register File

Write
Data

Read
Data

clk Read
Decoder Address

Write
Address
4
Memory Arrays: Register File

5
Memory Arrays: SRAM

6
Memory Arrays: DRAM

7
Relative Memory Sizes of
SRAM vs. DRAM

On-Chip DRAM on
SRAM on memory chip
logic chip

[ From Foss, R.C. “Implementing Application-

Specific Memory”, ISSCC 1996 ] 8
Memory Technology Trade-offs
Low Capacity
Latches/Registers Low Latency
High Bandwidth
(more and wider ports)

SRAM
High Capacity
DRAM High Latency
Low Bandwidth

9
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

10
CPU-Memory Bottleneck
Main
Processor
Memory

• Performance of high-speed computers is usually limited by

memory bandwidth and latency
• Latency is time for a single access
– Main memory latency is usually >> than processor cycle time
• Bandwidth is the number of accesses per unit time
– If m instructions are loads/stores, 1 + m memory accesses per
instruction, CPI = 1 requires at least 1 + m memory accesses
per cycle
• Bandwidth-Delay Product is amount of data that can be in
flight at the same time (Little’s Law)
11
Processor-DRAM Latency Gap

[Hennessy &
Patterson 2011]

• Four-issue 2 GHz superscalar accessing 100 ns DRAM could execute

800 instructions during the time for one memory access!
• Long latencies mean large bandwidth-delay products which can be
difficult to saturate, meaning bandwidth is wasted
12
From Hennessy and Patterson Ed. 5 Image Copyright © 2011, Elsevier Inc. All rights Reserved.
Physical Size Affects Latency
Processor

Processor

Small
Memory
Big Memory

• Signals have further to travel

• Fan out to more locations

13
Memory Hierarchy
Small Fast Big Slow
Processor Memory Memory
(RF, SRAM) (DRAM)

• Capacity: Register << SRAM << DRAM

• Latency: Register << SRAM << DRAM
• Bandwidth: on-chip >> off-chip
• On a data access:
– if data is in fast memory -> low-latency access to SRAM
– if data is not in fast memory -> long-latency access to DRAM
• Memory hierarchies only work if the small, fast memory
actually stores data that is reused by the processor 14
Common And Predictable Memory
Reference Patterns
Address n loop iterations
Temporal Locality:
Instruction If a location is
fetches reference it is likely
to be reference again
subroutine subroutine in the near future
call return
Stack
Spatial Locality:
accesses
argument access If a location is
referenced it is likely
that locations near it
will be referenced in
Data the near future
accesses scalar accesses
Time
15
Real Memory Reference Patterns
Spatial
Locality

Temporal
Locality
Memory Address

Temporal
& Spatial
Locality

Time (one dot per access to that address at that time)

[From Donald J. Hatfield, Jeanette Gerald: Program Restructuring 16
for Virtual Memory. IBM Systems Journal 10(3): 168-192 (1971)]
Caches Exploit Both Types of Locality
Small Fast Big Slow
Processor Memory Memory
(RF, SRAM) (DRAM)

• Exploit temporal locality by remembering the

contents of recently accessed locations
• Exploit spatial locality by fetching blocks of
data around recently accessed locations

17
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

18
Inside a Cache
Address Address
Main
Processor CACHE Memory
Data Data

copy of main copy of main

memory memory
location 100 location 101
Data Data
100 Byte Byte Line
Data
304 Byte

Address 6848
Tag 416

Data Block
19
Basic Cache Algorithm for a Load

20
Classifying Caches
Address Address
Main
Processor CACHE Memory
Data Data

• Block Placement: Where can a block be

placed in the cache?
• Block Identification: How a block is found if it
is in the cache?
• Block Replacement: Which block should be
replaced on a miss?
• Write Strategy: What happens on a write? 21
Block Placement:
Where Place Block in Cache?
1111111111 2222222222 33
Block Number 0123456789 0123456789 0123456789 01

Memory

Set Number 0 1 2 3 01234567

Cache

Fully (2-way) Set Direct

Associative Associative Mapped
anywhere anywhere in only into
block 12
can be placed set 0 block 4
(12 mod 4) (12 mod 8)
22
Block Placement:
Where Place Block in Cache?
1111111111 2222222222 33
Block Number 0123456789 0123456789 0123456789 01

Memory

Set Number 0 1 2 3 01234567

Cache

Fully (2-way) Set Direct

Associative Associative Mapped
anywhere anywhere in only into
block 12
can be placed set 0 block 4
(12 mod 4) (12 mod 8)
23
Block Identification: How to find block
in cache?

• Cache uses index and offset to find

potential match, then checks tag
• Tag check only includes higher order bits
• In this example (Direct-mapped, 8B block,
4 line cache )
24
Block Identification: How to find block
in cache?

• Cache checks all potential blocks with

parallel tag check
• In this example (2-way associative, 8B block,
4 line cache) 25
Block Replacement: Which block to
replace?
• No choice in a direct mapped cache
• In an associative cache, which block from set should be
evicted when the set becomes full?
• Random
• Least Recently Used (LRU)
– LRU cache state must be updated on every access
– True implementation only feasible for small sets (2-way)
– Pseudo-LRU binary tree often used for 4-8 way
• First In, First Out (FIFO) aka Round-Robin
– Used in highly associative caches
• Not Most Recently Used (NMRU)
– FIFO with exception for most recently used block(s)
26
Write Strategy: How are writes
handled?
• Cache Hit
– Write Through – write both cache and memory,
generally higher traffic but simpler to design
– Write Back – write cache only, memory is written
when evicted, dirty bit per block avoids unnecessary
write backs, more complicated
• Cache Miss
– No Write Allocate – only write to main memory
– Write Allocate – fetch block into cache, then write
• Common Combinations
• Write Through & No Write Allocate
• Write Back & Write Allocate
27
Agenda
• Memory Technology
• Motivation for Caches
• Classifying Caches
• Cache Performance

28
Average Memory Access Time
Hit
Main
Processor CACHE Memory
Miss

• Average Memory Access Time = Hit Time + ( Miss Rate * Miss Penalty )

29
Categorizing Misses: The Three C’s

• Compulsory – first-reference to a block, occur even

with infinite cache
• Capacity – cache is too small to hold all data needed by
program, occur even under perfect replacement policy
(loop over 5 cache lines)
• Conflict – misses that occur because of collisions due
to less than full associativity (loop over 3 cache lines) 30
Reduce Hit Time: Small & Simple
Caches

Plot from Hennessy and Patterson Ed. 4

• Less tag overhead • Can waste bandwidth if data is

• Exploit fast burst transfers not used
from DRAM • Fewer blocks -> more conflicts
• Exploit fast burst transfers
over wide on-chip busses

Empirical Rule of Thumb:

If cache size is doubled, miss rate usually drops by about √2

Empirical Rule of Thumb:

Direct-mapped cache of size N has about the same miss rate
as a two-way set- associative cache of size N/2
Plot from Hennessy and Patterson Ed. 5 Image Copyright © 2011, Elsevier Inc. All rights Reserved.
Reduce Miss Rate: High Associativity

Empirical Rule of Thumb:

36
Acknowledgements
• These slides contain material developed and copyright by:
– Arvind (MIT)
– Krste Asanovic (MIT/UCB)
– Joel Emer (Intel/MIT)
– James Hoe (CMU)
– John Kubiatowicz (UCB)
– David Patterson (UCB)
– Christopher Batten (Cornell)

• MIT material derived from course 6.823

• UCB material derived from course CS252 & CS152
• Cornell material derived from course ECE 4750

Computer and Its Application in Pharmacy
No ratings yet
Computer and Its Application in Pharmacy
15 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
1559460031_Chap 4 Cache Memory
No ratings yet
1559460031_Chap 4 Cache Memory
55 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
]MemorySystem Part 1
No ratings yet
]MemorySystem Part 1
24 pages
Memory Design
No ratings yet
Memory Design
36 pages
Memory Hierarchy
100% (1)
Memory Hierarchy
47 pages
CS5204/EE5364 - Advanced Computer Architecture - Memory
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Memory
67 pages
Lecture 7 Memory 2021
No ratings yet
Lecture 7 Memory 2021
64 pages
CS 3853 Computer Architecture - Memory Hierarchy
No ratings yet
CS 3853 Computer Architecture - Memory Hierarchy
37 pages
Memory
No ratings yet
Memory
125 pages
Unit 3 - Memory Organization
No ratings yet
Unit 3 - Memory Organization
98 pages
Precat-Operating System Day 2: PPT'S Compiled By: Mrs. Akshita. S. Chanchlani Sunbeam Infotech
No ratings yet
Precat-Operating System Day 2: PPT'S Compiled By: Mrs. Akshita. S. Chanchlani Sunbeam Infotech
33 pages
Cache&Virtual Memory
No ratings yet
Cache&Virtual Memory
50 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Virtual Memory
No ratings yet
Virtual Memory
48 pages
Unit 5 1 Cache Performance V 2
No ratings yet
Unit 5 1 Cache Performance V 2
29 pages
mod5
No ratings yet
mod5
28 pages
Lecture 14
No ratings yet
Lecture 14
14 pages
110029
No ratings yet
110029
30 pages
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
No ratings yet
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
44 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
96 pages
Chapter 4 V
No ratings yet
Chapter 4 V
37 pages
04 Memory
No ratings yet
04 Memory
101 pages
ICS 431-Ch11-Mass Storage Structure
No ratings yet
ICS 431-Ch11-Mass Storage Structure
42 pages
OS Diagram
No ratings yet
OS Diagram
2 pages
MEMORY ORGANIZATION
No ratings yet
MEMORY ORGANIZATION
52 pages
chotha CO
No ratings yet
chotha CO
32 pages
2021 Chapter 4 Memory Lecture 1
No ratings yet
2021 Chapter 4 Memory Lecture 1
65 pages
Location Organization: Characteristic of Memory System
No ratings yet
Location Organization: Characteristic of Memory System
6 pages
Demystifying Multicore Germany 14 PDF
No ratings yet
Demystifying Multicore Germany 14 PDF
82 pages
Memory-3
No ratings yet
Memory-3
138 pages
ACID vs. BASE - NoSQL Erklärt
100% (1)
ACID vs. BASE - NoSQL Erklärt
70 pages
Patterson6e_MIPS_Ch05_Modified_Part2 (3)
No ratings yet
Patterson6e_MIPS_Ch05_Modified_Part2 (3)
121 pages
Seth 740 Fall13 Module3.5 Main Memory Part1
No ratings yet
Seth 740 Fall13 Module3.5 Main Memory Part1
69 pages
Memory Organization Lecture
No ratings yet
Memory Organization Lecture
91 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Memory Organization: Memory Hierarchy Main Memory Associative Memory Cache Memory Virtual Memory
No ratings yet
Memory Organization: Memory Hierarchy Main Memory Associative Memory Cache Memory Virtual Memory
11 pages
Memory Subsystem: Dr. Gayathri Sivakumar Assistant Professor (SG-I) School of Electronics VIT, Chennai
No ratings yet
Memory Subsystem: Dr. Gayathri Sivakumar Assistant Professor (SG-I) School of Electronics VIT, Chennai
16 pages
Chapter 05
No ratings yet
Chapter 05
105 pages
Lecture 3 (Memory Hierarchy and Caches)
No ratings yet
Lecture 3 (Memory Hierarchy and Caches)
88 pages
Memory
No ratings yet
Memory
12 pages
CH04 COA11e
No ratings yet
CH04 COA11e
48 pages
Cache Memory A
No ratings yet
Cache Memory A
62 pages
Lectures wk1
No ratings yet
Lectures wk1
18 pages
L-4 (Cache Memory)
No ratings yet
L-4 (Cache Memory)
61 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
7 pages
Topic_2_Memory_Hierarchy_Design_Fundamentals
No ratings yet
Topic_2_Memory_Hierarchy_Design_Fundamentals
51 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
79 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
COA_4_Part 2 (1)
No ratings yet
COA_4_Part 2 (1)
52 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
Cache Memory: Computer Organization and Architecture Characteristics of Memory Systems
No ratings yet
Cache Memory: Computer Organization and Architecture Characteristics of Memory Systems
16 pages
Mem Cache
No ratings yet
Mem Cache
23 pages
Fundamentals of Ultra-Low Voltage Embedded Memory Design: Eric Karl
No ratings yet
Fundamentals of Ultra-Low Voltage Embedded Memory Design: Eric Karl
66 pages
COA-Unit- IV
No ratings yet
COA-Unit- IV
66 pages
Memory Basics Explained
From Everand
Memory Basics Explained
Alisa Turing
No ratings yet
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
IBM H3256-A3 ATA Hard Disk Drive Product Manual & Specifications
No ratings yet
IBM H3256-A3 ATA Hard Disk Drive Product Manual & Specifications
50 pages
Handling Files in C
No ratings yet
Handling Files in C
35 pages
NetApp - SnapShot
No ratings yet
NetApp - SnapShot
9 pages
Hitachi Matching VM Datastore To LUN
No ratings yet
Hitachi Matching VM Datastore To LUN
4 pages
Discover Pro June - 2019 - EN-combinado
No ratings yet
Discover Pro June - 2019 - EN-combinado
17 pages
Latches Notes
No ratings yet
Latches Notes
8 pages
1Z0 822 Demo 1
No ratings yet
1Z0 822 Demo 1
8 pages
I/O Management - Part 1: Department of CSE/IT, PSIT, Kanpur
No ratings yet
I/O Management - Part 1: Department of CSE/IT, PSIT, Kanpur
112 pages
Full download (Ebook) Multimedia Information Storage and Retrieval: Techniques and Technologies by Philip K. C. Tse ISBN 9781599042251, 1599042258 pdf docx
100% (4)
Full download (Ebook) Multimedia Information Storage and Retrieval: Techniques and Technologies by Philip K. C. Tse ISBN 9781599042251, 1599042258 pdf docx
67 pages
3 - File System Structure
No ratings yet
3 - File System Structure
11 pages
Qp1 Worksheet
No ratings yet
Qp1 Worksheet
3 pages
Session_38_Handshaking buffering (1)
No ratings yet
Session_38_Handshaking buffering (1)
20 pages
Disk-Partition and File System
No ratings yet
Disk-Partition and File System
5 pages
Lecture-17 CH-05 1
No ratings yet
Lecture-17 CH-05 1
21 pages
78.EMAT Formulas
No ratings yet
78.EMAT Formulas
4 pages
Common Building Block (CBB) Hard Disk Drive For Notebooks: Specification
No ratings yet
Common Building Block (CBB) Hard Disk Drive For Notebooks: Specification
7 pages
24 - Caching
No ratings yet
24 - Caching
22 pages
Backup and Recovery
No ratings yet
Backup and Recovery
35 pages
Huawei DPA Appliance Data Sheet
No ratings yet
Huawei DPA Appliance Data Sheet
3 pages
Unit IV Two Marks
No ratings yet
Unit IV Two Marks
9 pages
(Allocation Unit Size FAT32 Explained) What Allocation Unit Size Should I Use For FAT32 - EaseUS
No ratings yet
(Allocation Unit Size FAT32 Explained) What Allocation Unit Size Should I Use For FAT32 - EaseUS
13 pages
ds-pure-storage-flasharray-c20
No ratings yet
ds-pure-storage-flasharray-c20
3 pages
15.2 Practice Questions PDF
No ratings yet
15.2 Practice Questions PDF
3 pages
Past Paper Worksheet Chapter 3
No ratings yet
Past Paper Worksheet Chapter 3
4 pages
ICT PPT On Solid State Media
No ratings yet
ICT PPT On Solid State Media
32 pages
G9 Ict CSS Week 11 Lesson 1 PDF
No ratings yet
G9 Ict CSS Week 11 Lesson 1 PDF
8 pages
Perpendicular Recording
No ratings yet
Perpendicular Recording
4 pages
Draft
No ratings yet
Draft
22 pages
Class Quiz 02 - Chapter 02 - The System Unit Processing and Memory
No ratings yet
Class Quiz 02 - Chapter 02 - The System Unit Processing and Memory
2 pages

cache_ppt

Uploaded by

cache_ppt

Uploaded by

Computer Architecture

ELE 475 / COS 475

[ From Foss, R.C. “Implementing Application-

• Performance of high-speed computers is usually limited by

• Four-issue 2 GHz superscalar accessing 100 ns DRAM could execute

• Signals have further to travel

• Capacity: Register << SRAM << DRAM

Time (one dot per access to that address at that time)

• Exploit temporal locality by remembering the

copy of main copy of main

• Block Placement: Where can a block be

Set Number 0 1 2 3 01234567

Fully (2-way) Set Direct

Set Number 0 1 2 3 01234567

Fully (2-way) Set Direct

• Cache uses index and offset to find

• Cache checks all potential blocks with

• Compulsory – first-reference to a block, occur even

Plot from Hennessy and Patterson Ed. 4

• Less tag overhead • Can waste bandwidth if data is

Empirical Rule of Thumb:

Empirical Rule of Thumb:

Empirical Rule of Thumb:

• MIT material derived from course 6.823

You might also like