0% found this document useful (0 votes)

6 views

Memory 2

Uploaded by

cse.20201016

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Memory 2

Uploaded by

cse.20201016

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Memory Hierarchy-II

1
Memory Hierarchy

o Motivation
m Exploitinglocality to provide a large, fast and
inexpensive memory
2
Cache Basics
o Cache is a high speed buffer between
CPU and main memory
o Memory is divided into blocks
m Q1: Where can a block be placed in the upper
level? (Block placement)
m Q2: How is a block found if it is in the upper
level? (Block identification)
m Q3: Which block should be replaced on a
miss? (Block replacement)
m Q4: What happens on a write? (Write strategy)

3
Q1: Block Placement
o Fully associative, direct mapped, set
associative
m Example: Block 12 placed in 8 block cache:
n Mapping = Block Number Modulo Number Sets
Direct Mapped 2-Way Assoc
Full Mapped
(12 mod 8) = 4 (12 mod 4) = 0
01234567 01234567 01234567

Cache

1111111111222222222233
01234567890123456789012345678901

Memory

4
Q2: Block Identification
o Tag on each block
m No need to check index or block offset
o Increasing associativity Þshrinks index Þ
expands tag

Block Address Block

Tag Index Offset

5
Q3: Block Replacement
o Easy for direct-mapped caches
o Set associative or fully associative:
m Random
n Easy to implement
m LRU (Least Recently Used)
n Relying on past to predict future, hard to implement
m FIFO
n Sort of approximate LRU
m Not Recently Used
n Maintain reference bits and dirty bits; clear reference bits
periodically; Divide all blocks into four categories; choose one
from the lower category
m Optimal replacement?
n Label the blocks in cache by the number of instructions to be
executed before that block is referenced. Then choose a
block with the highest lable
n Unrealizable!
6
Q4: Write Strategy
Write-Through Write-Back
Write data only
Data written to to the cache
cache block
Policy also written to Update lower
lower-level level when a
memory block falls out
of the cache
Implement Easy Hard
Do read misses
produce writes? No Yes
Do repeated
writes make it Yes No
to lower level?

7
Write Buffers
Cache Lower
Processor Level
Memory
Write Buffer

Write-through cache: holds data awaiting

write-through to lower level memory
Q. Why a write buffer ? A. So CPU doesn’t stall.
Q. Why a buffer, why not A. Bursts of writes are
just one register ? common.
A. Yes! Drain buffer before
Q. Are Read After Write
next read, or check write buffer
(RAW) hazards an issue for
before read and perform read
write buffer?
only when no conflict. 8
Cache Performance
o Average memory access time
m Timetotal mem access = Nhit´Thit + Nmiss´Tmiss
=Nmem access ´Thit + Nmiss ´Tmiss penalty

m AMAT = Thit+ miss rate ´Tmiss penalty

o Miss penalty: time to replace a block from lower

level, including time to replace in CPU
m Access time: time to lower level(latency)
m Transfer time: time to transfer block(bandwidth)

9
Performance Example
o Two data caches (assume one clock cycle for hit)
m I: 8KB, 44% miss rate, 1ns hit time
m II: 64KB, 37% miss rate, 2ns hit time
m Miss penalty: 60ns, 30% memory accesses

m AMATI = 1ns + 44%´60ns = 27.4ns

m AMATII = 2ns + 37%´60ns = 24.2ns

m Larger cache Þsmaller miss rate but longer

ThitÞreduced AMAT

10
Miss Penalty in OOO Environment
o In processors with out-of-order execution
m Memory accesses can overlap with other
computation
m Latency of memory accesses is not always
fully exposed

m E.g.8KB cache, 44% miss rate, 1ns hit time,

miss penalty: 60ns, only 70% exposed on
average
m AMAT= 1ns + 44%´(60ns´70%) = 19.5ns

11
Cache Performance Optimizations
o Performance formulas
m AMAT = Thit+ miss rate ´Tmiss penalty
o Reducing miss rate
m Change cache configurations, compiler optimizations
o Reducing hit time
m Simple cache, fast access and address translation
o Reducing miss penalty
m Multilevel caches, read and write policies
o Taking advantage of parallelism
m Cache serving multiple requests simultaneously
m Prefetching

12
Cache Miss Rate
o Three C’s
o Compulsory misses (cold misses)
m The first access to a block: miss regardless of cache
size
o Capacity misses
m Cache too small to hold all data needed
o Conflict misses
m More blocks mapped to a set than the associativity
o Reducing miss rate
m Larger block size (compulsory)
m Larger cache size (capacity, conflict)
m Higher associativity (conflict)
m Compiler optimizations (all three)
13
Miss Rate vs. Block Size

o Larger blocks: compulsory misses reduced, but may

increase conflict misses or even capacity misses if the
cache is small; may also increase miss penalty
14
Reducing Cache Miss Rate
o Larger cache
m Less capacity misses
m Less conflict misses
n Implies higher associativity: less competition to the same set
m Has to balance hit time, energy consumption, and cost
o Higher associativity
m Less conflict misses
m Miss rate (2-way, X) » Miss rate(direct-map, 2X)

m Similarly, need to balance hit time, energy

consumption: diminishing return on reducing conflict
misses

15
Reducing Cache Miss Penalty
o A difficult decision is
m whether to make the cache hit time fast, to keep pace with the high
clock rate of processors,
m or to make the cache large to reduce the gap between the
processor accesses and main memory accesses.

o Solution:
m Use multi-level cache:
n The first-level cache can be small enough to match a fast clock
cycle time.
n Higher level cache can be large enough to capture many
accesses that would go to main memory.
n Multilevel caches are more power-efficient than single
aggregated cache.

16
Compiler Optimizations for Cache
o Increasing locality of programs
m Temporal locality, spatial locality
o Rearrange code
m Targeting instruction cache directly
m Reorder instructions based on the set of data accessed
o Reorganize data
m Padding to eliminate conflicts:
n Change the address of two variables such that they do not map to
the same cache location
n Change the size of an array via padding
m Group data that tend to be accessed together in one block
o Example optimizations
m Merging arrays, loop interchange, loop fusion

17
Merging Arrays
/* Before: 2 sequential arrays */
int val[SIZE];
int key[SIZE];
/* After: 1 array of structures */
struct merge {
int val;
int key;
};
struct merge merged_array[SIZE];
o Improve spatial locality
m If val[i] and key[i] tend to be accessed together
o Reducing conflicts between val & key
18
Loop Interchange
o Idea: switching the nesting order of two or
more loops

m Sequentialaccesses instead of striding

through memory; improved spatial locality
o Safety of loop interchange
m Need to preserve true data dependences

19
Loop Fusion
o Takes multiple compatible loop nests and
combines their bodies into one loop nest
m Is legal if no data dependences are reversed
o Improves locality directly by merging accesses to
the same cache line into one loop iteration
/* Before */ /* After */
for (i = 0; i < N; i = i+1) for (i = 0; i < N; i = i+1)
for (j = 0; j < N; j = j+1) for (j = 0; j < N; j = j+1){
a[i][j] = 1/b[i][j] * c[i][j]; a[i][j] = 1/b[i][j] * c[i][j];
for (i = 0; i < N; i = i+1) d[i][j] = a[i][j] + c[i][j];
for (j = 0; j < N; j = j+1) }
d[i][j] = a[i][j] + c[i][j];
20
Seminar

o Pipelining Cache

o Prefetching Cache

21
Main Memory Background
o Main memory performance
m Latency: cache miss penalty
n Access time: time between request and word arrives
n Cycle time: time between requests
m Bandwidth: multiprocessors, I/O,
n large block miss penalty
o Main memory technology
m Memory is DRAM: Dynamic Random Access Memory
n Dynamic since needs to be refreshed periodically
n Requires data written back after being read
n Concerned with cost per bit and capacity
m Cache is SRAM: Static Random Access Memory
n Concerned with speed and capacity

22
Memory vs. Virtual Memory
o Analogy to cache
m Size: cache << memory << address space
m Both provide big and fast memory - exploit locality

m Both need a policy - 4 memory hierarchy questions

o Difference from cache

m Cache primarily focuses on speed
m VM facilitates transparent memory management
n Providing large address space
n Sharing, protection in multi-programming environment

23
Four Memory Hierarchy Questions
o Where can a block be placed in main memory?
m OS allows block to be placed anywhere: fully
associative
n No conflict misses;
o Which block should be replaced?
m An approximation of LRU: true LRU too costly and
adds little benefit
n A reference bit is set if a page is accessed
n The bit is shifted into a history register periodically
n When replacing, find one with smallest value in history
register
o What happens on a write?
m Write back: write through is prohibitively expensive

24
Four Memory Hierarchy Questions
o How is a block found in main memory?
m Use page table to translate virtual address into
physical address
• 32-bit virtual
address, page
size: 4KB, 4 bytes
per page table
entry, page table
size?
• (232/212)´22= 222
or 4MB

25
Fast Address Translation
o Motivation
m Page table is too large to be stored in cache
n May even expand multiple pages itself
m Multiple page table levels
o Solution: exploit locality and cache recent
translations

Example:
n Four page table levels

26
Fast Address Translation
o TLB: translation look-aside buffer
m A special fully-associative cache for recent translation
m Tag: virtual address
m Data: physical page frame number, protection field,
valid bit, use bit, dirty bit

o Translation
m Send virtual
address to all tags
m Check violation
m Matching tag send
physical address
m Combine offset to
get full physical address
27
Virtual Memory and Cache
o Physical cache: index cache using physical
address
m Always address translation before accessing cache
m Simple implementation, performance issue

o Virtual cache: index cache using virtual address

to avoid translation
m Address translation only @ cache misses
m Issues
n Protection: copy protection info to each block
n Context switch: add PID to address tag
n Synonym/alias -- different virtual addresses map the same
physical address
l Checking multiple places, enforce aliases to be identical in a
fixed number of bits (page coloring)

28
Virtual Memory and Cache
o Physical cache (PIPT)
cache line return

• Slow

Processor Physical
TLB Main Memory
Core Cache
VA PA miss

hit

o Virtual cache (VIVT)

cache line return
• Protection bits
missing
Processor Virtual TLB Main Memory
Core Cache miss • Context-switch
VA
enforces cache flush

hit • Aliasing issue

29
Virtually-Indexed Physically-Tagged
o Virtually-indexed physically-tagged cache
m Use the page offset (identical in virtual & physical
addresses) to index the cache
m Associate physical address of the block as the
verification tag
m Perform cache reading and tag matching with the
physical address at the same time
m Issue: cache size is limited by page size (the length of
offset bits)

30
Advantages of Virtual Memory
o Translation
m Program can be given a consistent view of memory,
even though physical memory is scrambled
m Only the most important part of program (“Working
Set”) must be in physical memory
o Protection
m Different threads/processes protected from each other
m Different pages can be given special behavior
n Read only, invisible to user programs, etc.
m Kernel data protected from user programs
m Very important for protection from malicious programs

o Sharing
m Can map same physical page to multiple users
31

Widevine Modular DRMSecurity Integration Guidefor CENC
No ratings yet
Widevine Modular DRMSecurity Integration Guidefor CENC
76 pages
Samsung AR12
100% (1)
Samsung AR12
103 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
Cache
No ratings yet
Cache
36 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Memory Design
No ratings yet
Memory Design
36 pages
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
No ratings yet
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
25 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
CS 3853 Computer Architecture - Memory Hierarchy
No ratings yet
CS 3853 Computer Architecture - Memory Hierarchy
37 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
72 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
37 pages
Lecture 13- Introduction to Cache
No ratings yet
Lecture 13- Introduction to Cache
47 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
L17
No ratings yet
L17
23 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
10_Caches
No ratings yet
10_Caches
34 pages
CA_Lecture_08
No ratings yet
CA_Lecture_08
38 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
L07-MemoryII
No ratings yet
L07-MemoryII
27 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
No ratings yet
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
17 pages
10 Cache Memories
No ratings yet
10 Cache Memories
49 pages
L18-Cache-Wrap-up
No ratings yet
L18-Cache-Wrap-up
30 pages
Input Output Organization(2.3)
No ratings yet
Input Output Organization(2.3)
151 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Chapter 3 Cache
No ratings yet
Chapter 3 Cache
38 pages
Unit 6
No ratings yet
Unit 6
25 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
No ratings yet
Cse 410 Computer Systems: Hal Perkins Spring 2010 L T 13 C Hwit DPF Lecture 13 - Cache Writes and Performance
20 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Chapter 2z Ppt
No ratings yet
Chapter 2z Ppt
54 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Chapter 03
No ratings yet
Chapter 03
57 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
COA_PPT
No ratings yet
COA_PPT
158 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
Cache Org
No ratings yet
Cache Org
19 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
04 Cache Memory
No ratings yet
04 Cache Memory
36 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
UNIT IV.ppt
No ratings yet
UNIT IV.ppt
61 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
Cache Basics and Operation
No ratings yet
Cache Basics and Operation
42 pages
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Memory Hierarchy 1
No ratings yet
Memory Hierarchy 1
44 pages
Tlp
No ratings yet
Tlp
19 pages
Gmail - Associate Engineer-Trainee hiring– Entry-Level Opportunity at Innofied Solutions
No ratings yet
Gmail - Associate Engineer-Trainee hiring– Entry-Level Opportunity at Innofied Solutions
2 pages
Chapter 3 Addressing Modes
No ratings yet
Chapter 3 Addressing Modes
26 pages
Item Import Id
No ratings yet
Item Import Id
152 pages
Image Steganography
No ratings yet
Image Steganography
14 pages
Computer Repair With Diagnostic Flowcharts Third Edition
No ratings yet
Computer Repair With Diagnostic Flowcharts Third Edition
69 pages
20742B Setupguide
No ratings yet
20742B Setupguide
22 pages
Unit 1
No ratings yet
Unit 1
25 pages
Beagle Entertainment System v0.8 User Guide: Andrew Henderson
No ratings yet
Beagle Entertainment System v0.8 User Guide: Andrew Henderson
34 pages
Quick Hang Barn Door Without Backerboard Installation Instructions 2020 No Soft Close 1
No ratings yet
Quick Hang Barn Door Without Backerboard Installation Instructions 2020 No Soft Close 1
12 pages
Crash VermişAgayaĞĞĞ
No ratings yet
Crash VermişAgayaĞĞĞ
18 pages
DX Diag
No ratings yet
DX Diag
43 pages
GM812X Data Sheet V0.4
No ratings yet
GM812X Data Sheet V0.4
966 pages
Ihome User Guide
No ratings yet
Ihome User Guide
16 pages
V5 Series V5 Series V5 Series: Wiring Devices Wiring Devices Wiring Devices
No ratings yet
V5 Series V5 Series V5 Series: Wiring Devices Wiring Devices Wiring Devices
2 pages
Rate Analysis Jajarkot
No ratings yet
Rate Analysis Jajarkot
98 pages
Chapter 10 LSIS - XGI PLC
No ratings yet
Chapter 10 LSIS - XGI PLC
36 pages
EB8000 User Manual
No ratings yet
EB8000 User Manual
755 pages
Memory Management in xv6
No ratings yet
Memory Management in xv6
14 pages
710 Motor Manual
No ratings yet
710 Motor Manual
4 pages
Megasquirt CAN Broadcast
No ratings yet
Megasquirt CAN Broadcast
21 pages
MM850290-00 CalpMan 2000
No ratings yet
MM850290-00 CalpMan 2000
22 pages
Robotpark MF 70 CNC Kit Install EN
No ratings yet
Robotpark MF 70 CNC Kit Install EN
16 pages
ASRock - 775VM800
No ratings yet
ASRock - 775VM800
8 pages
Brosur Zestron Web
No ratings yet
Brosur Zestron Web
3 pages
How To Create A Bootable Device Using CMD
No ratings yet
How To Create A Bootable Device Using CMD
9 pages
lecture 2
No ratings yet
lecture 2
42 pages
MDC Ue Me de Protocol Ver 063-Eng
No ratings yet
MDC Ue Me de Protocol Ver 063-Eng
132 pages
STOCK DE PORTATILES "Mas de 3u Aplica 5% Dscto"
No ratings yet
STOCK DE PORTATILES "Mas de 3u Aplica 5% Dscto"
2 pages
Programming With STM32: Getting Started With the Nucleo Board and C/C++ Donald Norris - Download the ebook today and own the complete version
100% (1)
Programming With STM32: Getting Started With the Nucleo Board and C/C++ Donald Norris - Download the ebook today and own the complete version
59 pages

Memory 2

Uploaded by

Memory 2

Uploaded by

Memory Hierarchy-II

Block Address Block

Write-through cache: holds data awaiting

m AMAT = Thit+ miss rate ´Tmiss penalty

o Miss penalty: time to replace a block from lower

m AMATI = 1ns + 44%´60ns = 27.4ns

m Larger cache Þsmaller miss rate but longer

m E.g.8KB cache, 44% miss rate, 1ns hit time,

o Larger blocks: compulsory misses reduced, but may

m Similarly, need to balance hit time, energy

m Sequentialaccesses instead of striding

m Both need a policy - 4 memory hierarchy questions

o Difference from cache

o Virtual cache: index cache using virtual address

o Virtual cache (VIVT)

hit • Aliasing issue

You might also like