0% found this document useful (0 votes)

102 views

Memory Hierarchy Design: A Quantitative Approach, Fifth Edition

Uploaded by

Erz Se

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views

Memory Hierarchy Design: A Quantitative Approach, Fifth Edition

Uploaded by

Erz Se

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 112

Computer Architecture

A Quantitative Approach, Fifth Edition

Chapter 2
Memory Hierarchy Design

Copyright © 2012, Elsevier Inc. All rights reserved. 1

Introduction
Introduction
 Programmers want unlimited amounts of memory with
low latency
 Fast memory technology is more expensive per bit than
slower memory
 Solution: organize memory system into a hierarchy
 Entire addressable memory space available in largest, slowest
memory
 Incrementally smaller and faster memories, each containing a
subset of the memory below it, proceed in steps up toward the
processor
 Temporal and spatial locality insures that nearly all
references can be found in smaller memories
 Gives the allusion of a large, fast memory being presented to the
processor

Copyright © 2012, Elsevier Inc. All rights reserved. 2

Introduction
Memory Hierarchy

Copyright © 2012, Elsevier Inc. All rights reserved. 3

Introduction
Memory Performance Gap

Copyright © 2012, Elsevier Inc. All rights reserved. 4

Introduction
Memory Hierarchy Design
 Memory hierarchy design becomes more crucial
with recent multi-core processors:
 Aggregate peak bandwidth grows with # cores:
 Intel Core i7 can generate two references per core per clock
 Four cores and 3.2 GHz clock
 25.6 billion 64-bit data references/second +

 12.8 billion 128-bit instruction references

 = 409.6 GB/s!

 DRAM bandwidth is only 6% of this (25 GB/s)

 Requires:
 Multi-port, pipelined caches
 Two levels of cache per core
 Shared third-level cache on chip

Copyright © 2012, Elsevier Inc. All rights reserved. 5

Introduction
Performance and Power
 High-end microprocessors have >10 MB on-chip
cache
 Consumes large amount of area and power budget

Copyright © 2012, Elsevier Inc. All rights reserved. 6

Introduction
Memory Hierarchy Basics
 When a word is not found in the cache, a miss
occurs:
 Fetch word from lower level in hierarchy, requiring a
higher latency reference
 Lower level may be another cache or the main
memory
 Also fetch the other words contained within the block
 Takes advantage of spatial locality
 Place block into cache in any location within its set,
determined by address
 block address MOD number of sets

Copyright © 2012, Elsevier Inc. All rights reserved. 7

Memory Hierarchy Basics
• Hit: data appears in some block in the upper level
(example: Block X)
– Hit Rate: the fraction of memory access found in the upper level
– Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
• Miss: data needs to be retrieved from a block in the
lower level (Block Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
• Hit Time << Miss Penalty (500 instructions on 21264!)
Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y
Introduction
Memory Hierarchy Basics
 n blocks per set => n-way set associative
 Direct-mapped cache => one block per set (one-way)
 Fully associative => one set
 Place block into cache in any location within its set,
determined by address
 block address MOD number of sets
 Writing to cache: two strategies
 Write-through
 Immediately update lower levels of hierarchy
 Write-back
 Only update lower levels of hierarchy when an updated block
is replaced
 Both strategies use write buffer to make writes
asynchronous

Copyright © 2012, Elsevier Inc. All rights reserved. 9

Q4: What happens on a write?
Write-Through Write-Back
Write data only to the
Data written to cache cache
block
Policy
also written to lower- Update lower level
level memory when a block falls out
of the cache

Debug Easy Hard

Do read misses
produce writes? No Yes
Do repeated writes
make it to lower Yes No
level?

Additional option (on miss)-- let writes to an un-cached address;

allocate a new cache line (“write-allocate”).
CSCE 430/830, Memory
Hierarchy Introduction
Write Buffers for Write-Through Caches

Cache Lower
Processor Level
Memory

Write Buffer

Holds data awaiting write-through to

lower level memory
Q. Why a write buffer ? A. So CPU doesn’t stall
Q. Why a buffer, why not A. Bursts of writes are
just one register ? common.
Q. Are Read After Write A. Yes! Drain buffer before
(RAW) hazards an issue next read, or send read 1st
for write buffer? after check write buffers.
CSCE 430/830, Memory
Hierarchy Introduction
Memory Hierarchy Basics
• Hit rate: fraction found in that level
– So high that usually talk about Miss rate
– Miss rate fallacy: as MIPS to CPU performance,
miss rate to average memory access time in memory
• Average memory-access time
= Hit time + Miss rate x Miss penalty
(ns or clocks)
• Miss penalty: time to replace a block from
lower level, including time to replace in CPU
– access time: time to lower level
= f(latency to lower level)
– transfer time: time to transfer block
=f(BW between upper & lower levels, block size)

CSCE 430/830, Memory

Hierarchy Introduction
Introduction
Memory Hierarchy Basics
 Miss rate
 Fraction of cache access that result in a miss

 Causes of misses
 Compulsory
 First reference to a block, also called “cold miss”
 Capacity
 Blocks discarded (lack of space) and later retrieved
 Conflict
 Program makes repeated references to multiple addresses
from different blocks that map to the same location in the
cache

Copyright © 2012, Elsevier Inc. All rights reserved. 13

Introduction
Memory Hierarchy Basics

 Note that speculative and multithreaded

processors may execute other instructions
during a miss
 Reduces performance impact of misses

Copyright © 2012, Elsevier Inc. All rights reserved. 14

Improve Cache Performance
improve cache and memory access times:

Average Memory Access Time = Hit Time + Miss Rate * Miss Penalty

Reducing each of these!

Simultaneously?

CPUtime  IC * (CPI Execution  MemoryAcce ss

Instruction * MissRate * MissPenalt y * ClockCycle Time )

 Improve performance by:

1. Reduce the miss rate,
2. Reduce the miss penalty, or
3. Reduce the time to hit in the cache.

CSCE 430/830, Memory

Hierarchy Introduction
Introduction
Memory Hierarchy Basics
 Six basic cache optimizations:
 Larger block size
 Reduces compulsory misses
 Increases capacity and conflict misses, increases miss penalty
 Larger total cache capacity to reduce miss rate
 Increases hit time, increases power consumption
 Higher associativity
 Reduces conflict misses
 Increases hit time, increases power consumption
 Higher number of cache levels
 Reduces overall memory access time
 Giving priority to read misses over writes
 Reduces miss penalty
 Avoiding address translation in cache indexing
 Reduces hit time

Copyright © 2012, Elsevier Inc. All rights reserved. 16

The Limits of Physical Addressing
“Physical addresses” of memory locations

A0-A31 A0-A31

CPU Memory
D0-D31 D0-D31

Data
oAll programs share one address space:
The physical address space
oMachine language programs must be
aware of the machine organization
oNo way to prevent a program from
accessing any machine resource
CSCE 430/830, Memory
Hierarchy Introduction
Solution: Add a Layer of Indirection
“Virtual Addresses” “Physical Addresses”
A0-A31 A0-A31
Virtual Physical

CPU Address Memory

Translation
D0-D31 D0-D31

Data
• User programs run in a standardized
virtual address space
• Address Translation hardware, managed
by the operating system (OS), maps
virtual address to physical memory
• Hardware supports “modern” OS features:
Protection, Translation, Sharing
CSCE 430/830, Memory
Hierarchy Introduction
Three Advantages of Virtual Memory
• Translation:
– Program can be given consistent view of memory, even though physical
memory is scrambled
– Makes multithreading reasonable (now used a lot!)
– Only the most important part of program (“Working Set”) must be in
physical memory.
– Contiguous structures (like stacks) use only as much physical memory
as necessary yet still grow later.
• Protection:
– Different threads (or processes) protected from each other.
– Different pages can be given special behavior
» (Read Only, Invisible to user programs, etc).
– Kernel data protected from User programs
– Very important for protection from malicious programs
• Sharing:
– Can map same physical page to multiple users
(“Shared memory”)

CSCE 430/830, Memory

Hierarchy Introduction
Page tables encode virtual address spaces
Physical
Page Table
Memory Space A virtual address space
frame
is divided into blocks
frame
of memory called pages
frame
frame A machine
usually supports
virtual pages of a few
address
sizes
(MIPS R4000):
OS manages A page table is indexed by a
the page virtual address
table for
each ASID
A
(Addr. Space valid page table entry codes physical memory
ID) “frame” address for the page
An Example of Page Table
Virtual Memory
Physical Memory

CSCE 430/830, Memory

Hierarchy Introduction
Dividing the address space by a page size
Virtual Memory
Physical Memory

Page size:4KB
CSCE 430/830, Memory
Hierarchy Introduction
Virtual Page & Physical Page
Virtual Memory
V.P. 0 Physical Memory
P.P. 0
V.P. 1
P.P. 1
V.P. 2
P.P. 2
V.P. 3
P.P. 3
V.P. 4

V.P. 5

Page size:4KB
CSCE 430/830, Memory
Hierarchy Introduction
Addressing
Virtual Memory Virtual Address
V.P. 0 Virtual Page No. P. Offset Physical Memory
P.P. 0
V.P. 1
P.P. 1
V.P. 2
P.P. 2
V.P. 3
P.P. 3
V.P. 4
Physical Address
V.P. 5
Physical Page No. P. Offset

Page size:4KB
CSCE 430/830, Memory
Hierarchy Introduction
Addressing
Virtual Memory Virtual Address
Virtual Page No. P. Offset
Physical Memory

Page Table Entry

Physical Address
Physical Page No. P. Offset

CSCE 430/830, Memory

Hierarchy Introduction
Addressing
Virtual Memory Virtual Address
Virtual Page No. P. Offset
Physical Memory

Page Table Entry

V P R D Physical Page No.

Valid/Present Bit
If set, page being pointed is
resident in memory
Otherwise, on disk or not
allocated
Physical Address
Physical Page No. P. Offset

CSCE 430/830, Memory

Hierarchy Introduction
Addressing
Virtual Memory Virtual Address
Virtual Page No. P. Offset
Physical Memory

Page Table Entry

V P R D Physical Page No.

Protection Bits
Restrict access;
read-only, read/write, system-
only access

Physical Address
Physical Page No. P. Offset

CSCE 430/830, Memory

Hierarchy Introduction
Addressing
Virtual Memory Virtual Address
Virtual Page No. P. Offset
Physical Memory

Page Table Entry

V P R D Physical Page No.

Reference Bit
Needed by replacement policies
If set, page has been referenced

Physical Address
Physical Page No. P. Offset

CSCE 430/830, Memory

Hierarchy Introduction
Page Table Entry
Virtual Memory Virtual Address
Virtual Page No. P. Offset
Physical Memory

Page Table Entry

V P R D Physical Page No.

Dirty Bit
If set, at least one word in page
has been modified

Physical Address
Physical Page No. P. Offset

CSCE 430/830, Memory

Hierarchy Introduction
Page Table Entry
Virtual Memory Virtual Address
Virtual Page No. P. Offset
Physical Memory