0% found this document useful (0 votes)

3 views

Week 12 - Lecture 12 - Memory

The lecture discusses computer memory architecture, focusing on the performance gap between processors and memory, and the need for large and fast memory solutions. It covers various memory technologies such as DRAM and SRAM, their operations, and the concept of memory hierarchy to balance speed and capacity. Additionally, it explains cache management, including hit/miss rates, cache organization, and handling cache misses.

Uploaded by

tuyetngan0518

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Week 12 - Lecture 12 - Memory

Uploaded by

tuyetngan0518

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

ELT3047 Computer Architecture

Lecture 12: Memory

Hoang Gia Hung

Faculty of Electronics and Telecommunications
University of Engineering and Technology, VNU Hanoi
Introduction
Computer

Processor Memory Devices

(CPU) (passive)
(active)
(where
Control programs, Input
(“brain”) & data
live when
running)
Datapath Output

Users’ need: large and fast memory

Reality:
▪ Physical memory size is limited
Processor-Memory
Performance Gap: ▪ Processor vs memory speed
(grows 50% / year) disparity continues to grow
⇒ Processor-Memory: an unbalanced
system

❑ Life’s easier for programmers, harder for architects

The ideal memory

Pipeline Data
Instruction
(Instruction Supply
Supply
execution)

▪Zero-cycle latency ▪Zero-cycle latency

▪Infinite capacity ▪Infinite capacity
▪Perfect control flow ▪Infinite bandwidth
▪Zero cost ▪Zero cost

❑ The problem: ideal memory’s requirements oppose each other

➢ Bigger is slower
▪ Bigger → Takes longer to determine the location
➢ Faster is more expensive
▪ Technologies: SRAM vs. DRAM vs. Disk vs. Tape
➢ Higher bandwidth is more expensive
▪ Need more banks, more ports, higher frequency, or faster technology
Memory Technology: DRAM
❑ Dynamic random access memory
❑ Capacitor charge state indicates stored value
➢ Whether the capacitor is charged or discharged indicates storage of 1 or 0
➢ 1 storage capacitor
➢ 1 access FET → select which bits will be affected by read/write operations

❑ Operations
➢ Write: turn on access FET with the wordline & charge/discharge storage
capacitor through the bitline.
➢ Read: more complicated & destructive
→ data rewritten after read.

❑ Capacitor leaks
➢ DRAM cell loses charge over time
➢ DRAM cell needs to be refreshed
Memory Technology: SRAM
❑ Static random access memory row select

❑ 2 cross coupled inverters store a single bit

bitline

bitline
➢ 2 inverters wired in a positive feedback loop
forming a bistable element (2 stable states)
➢ 4 transistors for storage Vdd GND “1”

➢ 2 transistors for access “0”

GND Vdd

❑ Read sequence
1. address decode
2. drive row select bit-cell array

row decoder
n+m n 2n
3. selected bit-cells drive bitlines 2n row x 2m-col
(entire row is read together) (nm to minimize
overall latency)
4. differential sensing and column select
(data is ready)
m 2m diff pairs
5. precharge all bitlines
(for next read or write) sense amp and mux
1
Memory Technology: DRAM vs. SRAM
❑ DRAM
➢ Slower access (capacitor)
➢ Higher density (1T 1C cell)
➢ Lower cost
➢ Requires refresh (power, performance, circuitry)
➢ Manufacturing requires putting capacitor and logic together

❑ SRAM
➢ Faster access (no capacitor)
➢ Lower density (6T cell)
➢ Higher cost
➢ No need for refresh
➢ Manufacturing compatible with logic process (no capacitor)
Memory Technology: Non-volatile
storage (flash)

❑ Use floating gate transistors to store charge

➢ Very dense: multiple bits/transistor, read/written in blocks
➢ Slower than DRAM (especially on writes)
➢ Limited number of writes: charging/discharging the floating gate requires large
voltages that damage transistor
➢ Long time technology of choice for non-volatile storage: higher-performance
but higher-cost replacement for HDD.
Memory hierarchy: the idea
❑ The problem:
➢ Bigger is slower
➢ Faster is more expensive (dollars and chip area)

❑ We want both fast and large

➢ But we cannot achieve both with a single level of memory

❑ Idea:
➢ Have multiple levels of storage (progressively bigger and slower as the
levels are farther from the processor) and ensure most of the data the
processor needs is kept in the fast(er) level(s)

❑ Why Does it Work?

➢ Locality of memory reference: if there’s an access to address 𝑋 at time 𝑡, it’s
very probable that the program will access a nearby location in the near
future.
A Typical Memory Hierarchy
❑ Presents the user with as much memory as is available in the
cheapest technology at the speed offered by the fastest one.
➢ Store everything on disk
➢ Copy recently accessed items from disk to smaller DRAM memory
➢ Copy more recently accessed items from DRAM to smaller SRAM memory

On-Chip Components
Control
Cache Cache

Second Secondary
Instr
ITLB

Level Main Memory

Datapath Memory (Disk)
RegFile

Cache
DTLB

(DRAM)
Data

(SRAM)

Speed (%cycles): ½’s 1’s 10’s 100’s 10,000’s

Size (bytes): 100’s 10K’s M’s G’s T’s
Cost: highest lowest
Memory in a Modern System

DRAM BANKS
DRAM INTERFACE
DRAM MEMORY
CORE 1

CORE 3
CONTROLLER
L2 CACHE 1 L2 CACHE 3
L2 CACHE 0 L2 CACHE 2
CORE 0

CORE 2
SHARED L3 CACHE
The memory locality principle
❑ One of the most important principle in computer design.
➢ A “typical” program has a lot of locality in memory references
▪ typical programs are composed of “loops”

❑ Temporal Locality (locality in time)

➢ A program tends to reference the same memory location many times and all
within a small window of time
➢ E.g., instructions in a loop, induction variables
 Keep most recently accessed data items closer to the processor

❑ Spatial Locality (locality in space)

➢ A program tends to reference a cluster of memory locations at a time
➢ E.g., sequential instruction access, array data
 Move blocks consisting of contiguous words closer to the processor
Characteristics of the Memory
Hierarchy
❑ The data is similarly hierarchical
➢ Inclusive: a level closer to the processor is
generally a subset of any level further away
➢ Block (or line): the minimum unit of
information in a cache (may be multiple words)

❑ If the data the processor wants is found

in the upper level → a hit
#hits
➢ Hit rate (aka hit ratio):
#accesses
➢ Hit Time: time to access the block + time to
determine hit/miss

❑ If the required data is absent → a miss

#miss
➢ Miss rate: = 1 – (Hit rate)
#accesses
➢ Miss penalty: Time taken to block copy the
missed data from lower level → >> hit time.
How is the hierarchy managed?
❑ registers ↔ memory
➢ by compiler/programmer
❑ cache ↔ main memory
➢ by the cache controller hardware
❑ main memory ↔ disks
➢ by the operating system (virtual memory)
➢ virtual to physical address mapping
assisted by the hardware (TLB)
➢ by the programmer (files)
Cache Basics
❑ Two questions to answer (in hardware):
➢ Q1: How do we know if a data item is in the
cache?
➢ Q2: If it is, how do we find it?

❑ Q2 simplest answer: direct mapped

➢ Location in the cache determined by address
in memory
➢ Location mapping = (Block address) modulo
(#Blocks in cache)
➢ #Blocks in cache is usually a power of 2
➢ Use low-order address bits

❑ Example: an 8-block cache

➢ 8 = 23 → uses the three lowest bits of the
block address
➢ lots of lower level blocks must share blocks
in the cache
Tags and Valid Bits
❑ [Q1] How do we determine if a requested word is in the cache or
not?
➢ Have a tag associated with each cache block that contains the address
information (the upper portion of the address).

❑ What if there is no data in a location?

➢ Add a valid bit to indicate that the associated block in the hierarchy contains
valid data
➢ If valid bit = 0 → there cannot be a match for this block.

❑ Example: Consider the main memory word reference string

0 1 2 3 4 3 4 15
➢ Data memory allocation is given below

Address 00 00 00 01 00 10 00 11 01 00 11 11
Data 0 1 2 3 4 15

➢ Start with an empty cache - all blocks initially marked as not valid
Tags and Valid Bits: example solution
Main memory
Address 00 00 00 01 00 10 00 11 01 00 11 11
Data 0 1 2 3 4 15

Reference String : 0 1 2 3 4 3 4 15 8 requests, 6 misses

0 miss 1 miss 2 miss 3 miss
Idx. Val. Tag Data Idx. Val. Tag Data Idx. Val. Tag Data Idx. Val. Tag Data
00 1 00 Mem(0) 00 1 00 Mem(0) 00 1 00 Mem(0) 00 1 00 Mem(0)
01 1 00 Mem(1) 01 1 00 Mem(1) 01 1 00 Mem(1)
10 1 00 Mem(2) 10 1 00 Mem(2)
11 1 00 Mem(3)

4 miss 3 hit 4 hit 15 miss

Idx. Val. Tag Data Idx. Val. Tag Data Idx. Val. Tag Data Idx. Val. Tag Data
01
00 1 00 Mem(0) 4 00 1 01 Mem(4) 00 1 01 Mem(4) 00 1 01 Mem(4)
01 1 00 Mem(1) 01 1 00 Mem(1) 01 1 00 Mem(1) 01 1 00 Mem(1)
10 1 00 Mem(2) 10 1 00 Mem(2) 10 1 00 Mem(2) 10 1 00 Mem(2)
11 1 00 Mem(3) 11 1 00 Mem(3) 11 1 00 Mem(3) 11 1 00 Mem(3) 15
11
Direct Mapped: MIPS Address
Subdivision
❑ A memory address contains
➢ Block address → block in memory
➢ Block offset → bytes within a block

❑ E.g. One word blocks, cache

size = 1K words
➢ 2 LSB’s of the address = byte offset
➢ Cache size = 1K word → the next
10 bits of the address = cache index
➢ The remaining upper 20 bits of the
address will be stored as cache tag.
➢ Index is used to access cache
block, then address tag is compared
against stored tag - if equal & cache
block is valid → hit; otherwise, miss.
➢ What kind of locality are we taking
advantage of in this example?
Handling Cache Hits
❑ Read hits (I$ and D$)
➢ Trivial

❑ Write hits (D$ only)

➢ Write Through: always writing the data into both the cache block and the
next level in the memory hierarchy.
▪ ensures the cache and memory are consistent
▪ slow (run at the speed of the next level in the hierarchy) → use write
buffer & stall only if the write buffer is full → a write-through can be done
in one cycle if there is room in the write buffer.
➢ Write Back: write the new data only into the cache block, then write-back
the cache contents to the memory when that cache block is evicted.
▪ allows the cache and memory to be (temporarily) inconsistent
▪ need a dirty bit for each data cache block to tell if it needs to be written
back to memory when it is evicted.
▪ more complex to implement than write-through.
Write Buffer for Write-Through Caching

Cache
Processor DRAM

write buffer

❑ Write buffer is just a FIFO between the cache and main memory
➢ Typical number of entries: 4
➢ Once data has been written into the write buffer & assuming a cache hit, the
processor is done, then the memory controller will move the write buffer’s
contents to the real memory behind the scene.
➢ Works fine if store frequency (w.r.t. time) << 1/DRAM write cycle

❑ Memory system designer’s nightmare

➢ When the store frequency ≈ 1/DRAM write cycle → write buffer saturation
➢ Solutions: use a write-back cache; or use an L2 cache
Direct mapped: conflict miss
❑ Consider the main memory word reference string:
0 4 0 4 0 4 0 4
➢ Start with an empty cache - all blocks initially marked as not valid.

0 miss 4 miss 0 miss 4 miss

01 4 00 0 01
00 Mem(0) 00 Mem(0) 01 Mem(4) 00 Mem(0) 4

0 miss 4 miss 0 miss 4 miss

00 01 4 00 0 01
0 4
01 Mem(4) 00 Mem(0) 01 Mem(4) 00 Mem(0)

❑ Ping pong effect due to conflict misses - two memory locations

that map into the same cache block
Sources of Cache Misses
❑ Compulsory (cold start or process migration, first reference):
➢ First access to a block, “cold” fact of life, not a whole lot you can do about it
➢ If you are going to run “millions” of instruction, compulsory misses are
insignificant

❑ Conflict (collision):
➢ Multiple memory locations mapped to the same cache location
➢ Solution 1: increase cache size
➢ Solution 2: increase associativity (next lecture)

❑ Capacity:
➢ Cache cannot contain all blocks accessed by the program
➢ Solution: increase cache size
Handling Cache Misses (Single Word
Blocks)
❑ Read misses (I$ and D$)
➢ Stall the pipeline, fetch the block from the next level in the memory
hierarchy, install it in the cache and send the requested word to the
processor, then let the pipeline resume.

❑ Write misses (D$ only)

1. Stall the pipeline, fetch the block from next level in the memory hierarchy,
install it in the cache (which may involve having to evict a dirty block if using
a write-back cache), write the word from the processor to the cache, then let
the pipeline resume;
or (normally used in write-back caches)
2. Write allocate: just write the word into the cache (updating both the tag
too), no need to check for cache hit, no need to stall; or
3. No-write allocate: skip the cache write (but must invalidate that cache
block since it will now hold stale data) and just write the word to the write
buffer (and eventually to the next memory level), no need to stall if the write
buffer isn’t full.
Design trade off: Miss Rate vs Cache
Size
❑ Small cache hit rate
➢ doesn’t exploit temporal locality well
→ increases miss rate
➢ useful data replaced often
“working set”
❑ Large cache size
➢ can exploit temporal locality better →
improves miss rate
➢ not ALWAYS better

❑ Too large a cache: cache size

➢ adversely affects hit and miss latency: bigger is slower → access time may
degrade critical path

❑ Working set: the whole set of data the executing application

references within a time interval
Direct Mapped: Multiword Block Cache
❑ FastMATH (embedded MIPS processor)
➢ 16KB cache = 256 blocks × 16 words/block
➢ What kind of locality are we taking advantage of?
Multiword Block Cache: Taking
Advantage of Spatial Locality
❑ Let retake the previous reference string example with the
cache block now holds two words.
Reference String : 0 1 2 3 4 3 4 15
0 miss 1 hit 2 miss
00 Mem(1) Mem(0) 00 Mem(1) Mem(0) 00 Mem(1) Mem(0)
00 Mem(3) Mem(2)

3 hit 4 miss 3 hit

01 5 4
00 Mem(1) Mem(0) 00 Mem(1) Mem(0) 01 Mem(5) Mem(4)
00 Mem(3) Mem(2) 00 Mem(3) Mem(2) 00 Mem(3) Mem(2)

4 hit 15 miss
01 Mem(5) Mem(4) 1101 Mem(5) Mem(4)
15 14
00 Mem(3) Mem(2) 00 Mem(3) Mem(2)

➢ 8 requests, 4 misses vs 6 misses in the one word blocks example.

Design trade off: Miss Rate vs Block
Size
Average
Miss Access
Exploits Spatial Locality Miss Time
Rate Penalty

Increased Miss Penalty

Fewer blocks & Miss Rate
compromises
Temporal Locality

Block Size Block Size Block Size

❑ Larger blocks should reduce miss rate (due to spatial locality)

➢ But: larger blocks (block size ≈ a significant fraction of cache size) → fewer
of them → more competition → increased miss rate.

❑ Larger block size means larger miss penalty

➢ Bigger is slower → takes longer to transfer the block into the cache

❑ Average Memory Access Time (AMAT) = Hit Time + Miss

Penalty x Miss Rate
Today’s lecture summary
❑ The Principle of Locality:
➢ Program likely to access a relatively small portion of the address space at
any instant of time.
▪ Temporal Locality: Locality in Time
▪ Spatial Locality: Locality in Space

❑ Three major categories of cache misses:

1. Compulsory misses: sad facts of life. Example: cold start misses
2. Conflict misses: multiple memory location being mapped to the same
cache location. Nightmare Scenario: ping pong effect.
3. Capacity misses: the cache is not big enough to contains all the cache
blocks required by the program. Solution: increase cache size.

❑ Cache design space:

➢ total size, block size
➢ write-hit policy (write-through, write-back)
➢ write-miss policy (write allocate, write buffers)

SAP S4HANA Cloud Integration SAP SuccessFactors Employee Central
0% (1)
SAP S4HANA Cloud Integration SAP SuccessFactors Employee Central
18 pages
Education Cloud Recommendation Map
No ratings yet
Education Cloud Recommendation Map
18 pages
Echo Kenya Digital Services Gateway
100% (1)
Echo Kenya Digital Services Gateway
29 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
Memory Design
No ratings yet
Memory Design
36 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Memory Sub-System: CT101 - Computing Systems
No ratings yet
Memory Sub-System: CT101 - Computing Systems
46 pages
Memory Interface & Controller Lecture 3
No ratings yet
Memory Interface & Controller Lecture 3
77 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
No ratings yet
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
44 pages
Lecture 3 (Memory Hierarchy and Caches)
No ratings yet
Lecture 3 (Memory Hierarchy and Caches)
88 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Unit 5 Memory System
No ratings yet
Unit 5 Memory System
77 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Chapter 2z Ppt
No ratings yet
Chapter 2z Ppt
54 pages
CH04 COA10e
No ratings yet
CH04 COA10e
41 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
28 pages
Memory
No ratings yet
Memory
57 pages
Chap 6
No ratings yet
Chap 6
48 pages
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
29 pages
Cache Memory A
No ratings yet
Cache Memory A
62 pages
ch5
No ratings yet
ch5
116 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
Chapter 3 Large and Fast
No ratings yet
Chapter 3 Large and Fast
86 pages
CS5204/EE5364 - Advanced Computer Architecture - Memory
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Memory
67 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
04 - Computer Memory Systems
No ratings yet
04 - Computer Memory Systems
91 pages
Memory Hierarchy
100% (1)
Memory Hierarchy
47 pages
12-caches-notes
No ratings yet
12-caches-notes
144 pages
Memory Organization: Dr. Bernard Chen PH.D
No ratings yet
Memory Organization: Dr. Bernard Chen PH.D
77 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
CS 3853 Computer Architecture - Memory Hierarchy
No ratings yet
CS 3853 Computer Architecture - Memory Hierarchy
37 pages
Cache Memory
No ratings yet
Cache Memory
89 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Chapter 7
No ratings yet
Chapter 7
43 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Lec13 Memory 1 Notes
No ratings yet
Lec13 Memory 1 Notes
27 pages
CH 06
No ratings yet
CH 06
58 pages
Final Chapter-5
No ratings yet
Final Chapter-5
9 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
William Stallings Computer Organization and Architecture: Internal Memory
No ratings yet
William Stallings Computer Organization and Architecture: Internal Memory
60 pages
Module 4: Memory System Organization & Architecture
No ratings yet
Module 4: Memory System Organization & Architecture
97 pages
03-Chap4-Cache Memory Mapping
No ratings yet
03-Chap4-Cache Memory Mapping
24 pages
Supplemental Material On Cache From ECE-341 Memory
No ratings yet
Supplemental Material On Cache From ECE-341 Memory
79 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
No ratings yet
Chapter 6: Memory: - CPU Accesses Memory at Least Once Per Fetch-Execute Cycle: - Memory Is Organized Into A Hierarchy
25 pages
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
No ratings yet
Cache Memory: William Stallings, Computer Organization and Architecture, 9 Edition
47 pages
Memory
No ratings yet
Memory
125 pages
Unit 5
No ratings yet
Unit 5
40 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
Unit III Memory Hierarchy
No ratings yet
Unit III Memory Hierarchy
21 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
12-caches-notes
No ratings yet
12-caches-notes
144 pages
CH04 Cache Memory
No ratings yet
CH04 Cache Memory
44 pages
Memory Basics Explained
From Everand
Memory Basics Explained
Alisa Turing
No ratings yet
UFCEPM-15-M Object-Oriented Design and Programming: Jin Sa
No ratings yet
UFCEPM-15-M Object-Oriented Design and Programming: Jin Sa
21 pages
Microsoft PowerPoint - 02 - TOGAF 9 Overview v1.1
No ratings yet
Microsoft PowerPoint - 02 - TOGAF 9 Overview v1.1
13 pages
Empowerment Technologies 1st Sem Midterm Examination
No ratings yet
Empowerment Technologies 1st Sem Midterm Examination
5 pages
Resume Navneet Kaur
No ratings yet
Resume Navneet Kaur
2 pages
Marketing Cloud API FAQ: What Is The Purpose of This FAQ Document?
No ratings yet
Marketing Cloud API FAQ: What Is The Purpose of This FAQ Document?
3 pages
Difference Between Old and New Syllabus-IP
No ratings yet
Difference Between Old and New Syllabus-IP
5 pages
PRIMAVERA
0% (1)
PRIMAVERA
15 pages
2.1 Communication (Network & Internet (MT) ) PDF
No ratings yet
2.1 Communication (Network & Internet (MT) ) PDF
16 pages
S1 Ict Notes
No ratings yet
S1 Ict Notes
13 pages
SAP Basis Consultant - Krishna
No ratings yet
SAP Basis Consultant - Krishna
4 pages
QA Chapter2 PDF
No ratings yet
QA Chapter2 PDF
3 pages
NetBackup10 AdminGuide Hadoop
No ratings yet
NetBackup10 AdminGuide Hadoop
67 pages
Artificial Intelligence in Europe - DK Version
100% (1)
Artificial Intelligence in Europe - DK Version
41 pages
os
No ratings yet
os
12 pages
Laboratory Activity 2
No ratings yet
Laboratory Activity 2
4 pages
DB Security Group Ass 1
No ratings yet
DB Security Group Ass 1
7 pages
Hamza Shakoor Software Engineer
No ratings yet
Hamza Shakoor Software Engineer
2 pages
FIRST
No ratings yet
FIRST
16 pages
Aayushi S Resume
No ratings yet
Aayushi S Resume
1 page
CS Project 2024-25
No ratings yet
CS Project 2024-25
30 pages
Venkata Abhilash Putta - BA
No ratings yet
Venkata Abhilash Putta - BA
12 pages
Apps Dba Notes: Adstrtal - SH Exits With Status 150
No ratings yet
Apps Dba Notes: Adstrtal - SH Exits With Status 150
4 pages
Unit 1: Question Bank BCA (SEM-3) Software Engineering
No ratings yet
Unit 1: Question Bank BCA (SEM-3) Software Engineering
8 pages
Rohini 72167552789
No ratings yet
Rohini 72167552789
3 pages
Technical Seminar On Big Data'
No ratings yet
Technical Seminar On Big Data'
14 pages
"Hostel Management System": Project Report ON
No ratings yet
"Hostel Management System": Project Report ON
8 pages
Shlok's Resume
No ratings yet
Shlok's Resume
1 page

Week 12 - Lecture 12 - Memory

Uploaded by

Week 12 - Lecture 12 - Memory

Uploaded by

ELT3047 Computer Architecture

Lecture 12: Memory

Hoang Gia Hung

Processor Memory Devices

Users’ need: large and fast memory

❑ Life’s easier for programmers, harder for architects

▪Zero-cycle latency ▪Zero-cycle latency

❑ The problem: ideal memory’s requirements oppose each other

❑ 2 cross coupled inverters store a single bit

➢ 2 transistors for access “0”

❑ Use floating gate transistors to store charge

❑ We want both fast and large

❑ Why Does it Work?

Level Main Memory

Speed (%cycles): ½’s 1’s 10’s 100’s 10,000’s

❑ Temporal Locality (locality in time)

❑ Spatial Locality (locality in space)

❑ If the data the processor wants is found

❑ If the required data is absent → a miss

❑ Q2 simplest answer: direct mapped

❑ Example: an 8-block cache

❑ What if there is no data in a location?

❑ Example: Consider the main memory word reference string

Reference String : 0 1 2 3 4 3 4 15 8 requests, 6 misses

4 miss 3 hit 4 hit 15 miss

❑ E.g. One word blocks, cache

❑ Write hits (D$ only)

❑ Memory system designer’s nightmare

0 miss 4 miss 0 miss 4 miss

0 miss 4 miss 0 miss 4 miss

❑ Ping pong effect due to conflict misses - two memory locations

❑ Write misses (D$ only)

❑ Too large a cache: cache size

❑ Working set: the whole set of data the executing application

3 hit 4 miss 3 hit

➢ 8 requests, 4 misses vs 6 misses in the one word blocks example.

Increased Miss Penalty

Block Size Block Size Block Size

❑ Larger blocks should reduce miss rate (due to spatial locality)

❑ Larger block size means larger miss penalty

❑ Average Memory Access Time (AMAT) = Hit Time + Miss

❑ Three major categories of cache misses:

❑ Cache design space:

You might also like