0% found this document useful (0 votes)

3 views

6.Module 2_Part 2

The document discusses cache memory in computer architecture, highlighting its importance in bridging the speed gap between the CPU and main memory. It covers concepts such as locality of reference, cache operations, mapping functions, and various cache types and policies. Additionally, it addresses performance considerations and techniques to enhance cache memory efficiency, including interleaving, prefetching, and lockup-free caches.

Uploaded by

My Creations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

6.Module 2_Part 2

Uploaded by

My Creations

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

COMPUTER

ARCHITECTURE
PCC-CS 402
(Module 2 - Part 2)

Department of Computer Science & Engineering

2nd Year,4th Semester
2022
Memory

• Concept of Cache Memory

• Techniques for reducing cache misses
• Cache Coherence
• Memory Interleaving
CACHE MEMORIES
• Processor is much faster than the main memory
− As a result, processor spends much of its time waiting for instructions
and data being fetched from main memory
− Major obstacle towards achieving good performance
• Speed of main memory cannot be increased beyond a certain point
• Cache memory is an architectural arrangement which makes main
memory appear faster to the processor than it really is
• Cache memory design is based on property of computer programs
known as “Locality of Reference”
Locality of Reference
• Analysis of address space of a program indicates
− Many instructions in localized areas of a program are executed
repeatedly during a period of time
− Other instructions are accessed relatively less frequently
• Similar property also observed for data access
− A Data Set in a specific address is accessed more often than data in
other locations
• This property of program execution focused on a specific address
space is called Locality of Reference
Locality of Reference
• Temporal locality of reference:
− Locality of Reference in Time
− Recently executed instructions and data is likely to be executed
again in near future
• Spatial locality of reference:
− Locality of Reference in Space
− Instructions and data with addresses close to a recently accessed
address are likely to be accessed soon
CACHE-OVERVIEW
 Small amount of fast memory
 Sits between normal main memory and CPU
 May be located on CPU chip or module
 An entire blocks of data is copied from memory to the cache because the principle
of locality tells us that once a byte is accessed, it is likely that a nearby data
element will be needed soon.
CACHE OPERATION - OVERVIEW
• CPU requests contents of a word in memory
• Memory Controller checks cache for this word
• If present in cache, send to processor (fast access)
• If not present, copies the block containing the requested word
from main memory to cache and delivers the memory content
to CPU (slow access)
• Cache includes tags to identify which block of main memory
is in each cache slot
CACHE HIT
• Existence of a cache is transparent to the processor. The processor issues
Read and Write requests in the same manner.

• If the data is in the cache it is called a Read or Write hit.

• Read hit:
 The data is obtained from the cache.

• Write hit:
 Cache has a replica of the contents of the main memory.
 Contents of the cache and the main memory may be updated simultaneously.
This is the write-through protocol.
 Update the contents of the cache, and mark it as updated by setting a bit known
as the dirty bit or modified bit. The contents of the main memory are updated
when this block is replaced. This is write-back or copy-back protocol.
CACHE MISS
• If the data is not present in the cache, then a Read miss or Write miss occurs.

• Read miss:
 Block of words containing this requested word is transferred from the memory.
 After the block is transferred, the desired word is forwarded to the processor.
 The desired word may also be forwarded to the processor as soon as it is transferred
without waiting for the entire block to be transferred. This is called load-through or
early-restart.

• Write-miss:
 Write-through protocol is used, then the contents of the main memory are updated
directly.
 If write-back protocol is used, the block containing the addressed
word is first brought into the cache. The desired word is overwritten with new
information.
Mapping Functions
• Mapping functions determine how main memory blocks are placed in
the cache
• A simple processor example:
− Cache consisting of 128 blocks of 16 words, total 2048 (2K) words
− Main memory is addressable by 16-bit address
− Main memory has 64K words, with 4028 blocks of 16 words each
• Three mapping functions:
− Direct mapping
− Associative mapping
− Set-associative mapping
Direct Mapping
Main
• Block j of the main memory maps to j modulo 128
memory
Block 0

tag
Cache Block 1
of the cache; 0 maps to 0, 129 maps to 1
• Each memory block can be placed in only one
Block 0
tag
Block 1
position in the cache
Block 127
• More than one memory block can be mapped onto
tag
Block 128
same position in cache
Block 127 Block 129
• Memory address is divided into three fields:
− Low order 4 bits determine one of the 16 words
in a block
Tag Block Word
Block 255 − Next 7 bits determine location of cache block
5 7 4 Block 256
− High order 5 bits determine which of possible
Main memory address Block 257 32 blocks is currently present in cache; these
are Tag bits which get stored along with cache
block
Block 4095
Direct Mapping
Main
memory
Block 0 • Locating an address in cache
Cache Block 1 − Block Number derived from Main Memory
tag
Block 0 Block
tag
Block 1 − Upper 5 bits of address matched with Tag of
Block 127 the specific cache block
Block 128 − If match, cache hit, else, cache miss
tag
Block 127 Block 129

• Advantages:
− Simple to implement
Block 255 − Replacement method also simple
Tag Block Word
5 7 4 Block 256 • Disadvantages:
Main memory address Block 257 − Cache Hit Ratio is not high
− Not very flexible

Block 4095
Associative Mapping
Main
memory
Block 0 • Main memory block can be placed into any
Cache Block 1 cache position
tag
Block 0 • Memory address is divided into two fields:
tag
Block 1
− Low order 4 bits identify the word within a
Block 127 block
Block 128 − High order 12 bits or tag bits identify a
tag
Block 127 Block 129 memory block when it is resident in the
cache
• Advantages:
Tag Word
Block 255 − Flexible and uses cache space efficiently
12 4 Block 256
• Disadvantages:
Main memory address Block 257
− More complex, as all tag addresses must be
checked to locate a memory block in cache
− Requires associative memory access
Block 4095
Set-Associative Mapping
Cache Block 0
Main

tag Block 0
memory
Block 1
• Blocks of cache are grouped into sets; Mapping
tag Block 1
function allows a block of the main memory to
tag Block 2
reside in any block of a specific set
tag Block 3 Block 127
• Hence, this mapping is associative between all
blocks in the same set (Set-Associative)
Block 128

Block 129
• In this example, the cache is divided into 64 sets,
tag
Block 126
with two blocks per set; called 2-way set-
tag
Block 127 associative
• Memory block 0, 64, 128 etc. map to set 0, and
they can occupy either of the two positions
Block 255
Tag Set Word
Block 256
6 6 4
• Other possible combinations are 32 sets with 4
Block 257
Main memory address blocks each (4-way) or 16 sets with 8 blocks (8-
way)
• K-way set-associative cache stands for k blocks
Block 4095
per set
Set-Associative Mapping
Cache
Main
memory
Block 0 • Memory address is divided into three parts:
tag Block 0 Block 1 − 4 bits for words in block
tag Block 1
− 6 bit field determines the set number
tag Block 2
− High order 6 bits determines tag bits for both
tag Block 3 Block 127
blocks in a set
• Set-associative mapping is combination of direct
Block 128

and associative mapping and most common

tag Block 129
Block 126

tag
Block 127 • Reduces block conflict of direct mapping and
complex tag search in associative mapping
Tag Set Word
Block 255
• Number of blocks per set is a design parameter
If all blocks are in one set, same as associative
Block 256
6 6 4 •
mapping
Block 257
Main memory address

• If there is only one block per set (k=1), it is the

same as direct mapping
Block 4095
REPLACEMENT POLICY
In case of a Cache Miss, an existing block must be replaced to put in the
new block. Typical Replacement Policies are:
• Random:
− Any existing block can be discarded
• Least Recently Used (LRU):
− Discard block that is unused for the longest time
− Identification is very complex for higher associative cache
• First in First Out (FIFO):
− Discard block that was copied earliest from main memory
CACHE TYPES
• Multi-Level Cache:
− Cache may be organized in 2 or 3 levels
− Level 1 (L1) is typically closest to processor, small but fast
− Level 2 (L2) is larger but slower than L1
− Multi-processor systems use L3, larger and slower than L2
• Unified or Split Cache:
− Unified cache stores both Instructions & Data
− Split Cache are separate units for Instructions & Data
− Typically, L1 is a Split cache and others are Unified caches
CACHE CONTROL
• Valid Bit:
− It is set to ‘1’ when data in cache block is valid, otherwise ‘0’
− At system startup, valid bit is set to ‘0’
− When a memory block is loaded into cache, it is set to ‘1’
− Also, data transfer may occur between main memory and disk
directly bypassing the cache
− If main memory block is updated and the data is also resident in
the cache, the cache valid bit is set to ‘0’
• Dirty Bit:
− Set to ‘1’ the block contains data that has not been written into
memory, otherwise ‘0’
CACHE MISS CATEGORY
• Compulsory (Cold) Misses:
− Very first access to a cache block always results in a miss, when
cache block must be loaded from main memory
• Conflict (Collision) Misses:
− Cache block discarded when another block maps to same place
− Applicable for Direct and Set-Associative Mapping
• Capacity Misses:
− Cache not large enough to store all required data for program
execution
CACHE COHERENCE
• In case of multi-processor systems, each processor has its own
cache and a shared memory environment
• Same data may exist in different caches and main memory,
• In case any of the copies get updated, other copies become
invalid. This is called Cache Coherence problem.
• Cache inconsistencies can caused by data sharing, process
migration or in I/O activity.
CACHE COHERENCE PROTOCOL
• Cache Coherence protocols are applied to maintain data
consistency between multiple caches and main memory
− In Software, using compiler techniques
− In Hardware, such as Snoopy and Directory Protocols
CACHE MEMORY

Performance considerations
PERFORMANCE CONSIDERATIONS
 A key design objective of a computer system is to achieve the best
possible performance at the lowest possible cost.
 Price/performance ratio is a common measure of success.
 Performance of a processor depends on:
 How fast machine instructions can be brought into the processor for
execution.
 How fast the instructions can be executed.
PERFORMANCE OF CACHE MEMORY
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE
CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
Solution:
INTERLEAVING
 Divides the memory system into a number of memory
modules. Each module has its own address buffer register (ABR) and data
buffer register (DBR).
 Arranges addressing so that successive words in the
address space are placed in different modules.
 When requests for memory access involve consecutive
addresses, the access will be to different modules.
 Since parallel access to these modules is possible, the
average rate of fetching words from the Main Memory
can be increased.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)

Prefetching
• New data are brought into the processor when they are first needed.
• Processor has to wait before the data transfer is complete.
• Prefetch the data into the cache before they are actually needed, or a before
a Read miss occurs.
• Prefetching can be accomplished through software by including a special
instruction in the machine language of the processor.
 Inclusion of prefetch instructions increases the length of the programs.
• Prefetching can also be accomplished using hardware:
 Circuitry that attempts to discover patterns in
memory references and then prefetches according
to this pattern.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)
Lockup-Free Cache
• Prefetching scheme does not work if it stops other accesses
to the cache until the prefetch is completed.
• A cache of this type is said to be “locked” while it services
a miss.
• Cache structure which supports multiple outstanding
misses is called a lockup free cache.
• Since only one miss can be serviced at a time, a lockup
free cache must include circuits that keep track of all the
outstanding misses.
• Special registers may hold the necessary
information about these misses.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)
WRITE BUFFER
• Each write operation involves writing to the main memory
• If the processor has to wait for the write operation to complete, it
slows down the processor
• Processor does not depend on the results of the write operation
• Write buffer can be included for temporary storage of write requests
• Processor places each write request into the buffer and continues
execution
• If a subsequent Read request references data which is still in the write
buffer, then this data is referenced in the write buffer
• Applies to both Write-Through and Write-Back techniques
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)

PREFETCHING
• New data are brought into the processor when they are first needed
• The processor has to wait before the data transfer is complete
• Prefetching loads data into the cache before they are actually needed
• Prefetching can be accomplished through software:
− Including a special instruction in the machine language of the
processor
• Prefetching can also be accomplished using hardware:
− Special circuitry used to discover patterns in memory references
and then prefetching according to this pattern
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)

LOCKUP FREE CACHE

• Prefetching scheme does not work if it stops other accesses to the
cache until the prefetch is completed
• A cache of this type is said to be “locked” while it services a miss
• Cache structure which supports multiple outstanding misses is called
a Lockup Free cache
• Since only one miss can be serviced at a time, a lockup free cache
must include circuits that keep track of all the outstanding misses
• Special registers may hold the necessary information about these
misses

Structure Charts-WS
No ratings yet
Structure Charts-WS
6 pages
How Guide Sapcreditmanage
No ratings yet
How Guide Sapcreditmanage
4 pages
Cache Memory
No ratings yet
Cache Memory
7 pages
Chapter 2z Ppt
No ratings yet
Chapter 2z Ppt
54 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
CACHE MEMORY
No ratings yet
CACHE MEMORY
42 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
Cache - Memory - Concept
No ratings yet
Cache - Memory - Concept
73 pages
Conspect of Lecture 7
No ratings yet
Conspect of Lecture 7
13 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
52-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
No ratings yet
52-Cache Memory_ Principles, Cache Memory Management Techniques-28!02!2025
11 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
47 pages
6.cache Memory - BVK
No ratings yet
6.cache Memory - BVK
47 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
Lec 23 CAOCache Memory
No ratings yet
Lec 23 CAOCache Memory
11 pages
Unit 5
No ratings yet
Unit 5
40 pages
Unit-2_CDA_DrManojY
No ratings yet
Unit-2_CDA_DrManojY
81 pages
Chap 6
No ratings yet
Chap 6
48 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
Cache Memory,Virtual Memory and Auxiliary Memory Ppts Lecture (3)
No ratings yet
Cache Memory,Virtual Memory and Auxiliary Memory Ppts Lecture (3)
42 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
Cache memory,Virtual memory and Auxiliary memory notes
No ratings yet
Cache memory,Virtual memory and Auxiliary memory notes
42 pages
CS2115 chapter-6
No ratings yet
CS2115 chapter-6
45 pages
Cache Memory
No ratings yet
Cache Memory
47 pages
cache_memory
No ratings yet
cache_memory
51 pages
Computer Architecture and Organization: Dr. Mohd Hanafi Ahmad Hijazi
No ratings yet
Computer Architecture and Organization: Dr. Mohd Hanafi Ahmad Hijazi
47 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
46 pages
module-4-part-2.2
No ratings yet
module-4-part-2.2
9 pages
DECO - Module 4.3 - Cache
No ratings yet
DECO - Module 4.3 - Cache
20 pages
Cache Memory in Computer Organizatin
No ratings yet
Cache Memory in Computer Organizatin
12 pages
CH04 COA9e Cache Memory Repaired
No ratings yet
CH04 COA9e Cache Memory Repaired
42 pages
cache_ppt
No ratings yet
cache_ppt
38 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
67 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Unit 6
No ratings yet
Unit 6
25 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
CH04 COA10e
No ratings yet
CH04 COA10e
41 pages
Cache Memory CAD
No ratings yet
Cache Memory CAD
16 pages
Unit 5 Memory System
No ratings yet
Unit 5 Memory System
77 pages
16-Cache Memory-13-03-2024
No ratings yet
16-Cache Memory-13-03-2024
50 pages
Implementation of Cache Memory
No ratings yet
Implementation of Cache Memory
15 pages
Cache Memory in Computer Organization
No ratings yet
Cache Memory in Computer Organization
5 pages
Cache Memory
No ratings yet
Cache Memory
4 pages
COA_PPT
No ratings yet
COA_PPT
158 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
Cache + Associations Ch-4
No ratings yet
Cache + Associations Ch-4
52 pages
CH 06
No ratings yet
CH 06
58 pages
ch5 Easy
No ratings yet
ch5 Easy
27 pages
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
No ratings yet
Lecture 5: Memory Hierarchy and Cache Traditional Four Questions For Memory Hierarchy Designers
10 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
No ratings yet
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
33 pages
4 Unit Speed, Size and Cost
No ratings yet
4 Unit Speed, Size and Cost
5 pages
Cache 13115
No ratings yet
Cache 13115
20 pages
Cache Memory
No ratings yet
Cache Memory
24 pages
Lec 4
No ratings yet
Lec 4
31 pages
UNIT IV.ppt
No ratings yet
UNIT IV.ppt
61 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
JCL
No ratings yet
JCL
89 pages
PE(BEE503)-Assignment 3
No ratings yet
PE(BEE503)-Assignment 3
3 pages
BEE Important Theory Questions
No ratings yet
BEE Important Theory Questions
2 pages
Unit4 - Congestion Control and QoS
No ratings yet
Unit4 - Congestion Control and QoS
52 pages
Snubber Circuit in The Flyback Converter: Transformer Model I L C R + D
No ratings yet
Snubber Circuit in The Flyback Converter: Transformer Model I L C R + D
1 page
Brosura Visiolite Tester Al Vederii
No ratings yet
Brosura Visiolite Tester Al Vederii
4 pages
Ard Based Elevator Control System
No ratings yet
Ard Based Elevator Control System
7 pages
Microcontroller Prototypes With Arduino
100% (3)
Microcontroller Prototypes With Arduino
355 pages
Medicine
No ratings yet
Medicine
43 pages
Imaging Science and Informatics
No ratings yet
Imaging Science and Informatics
90 pages
Connection 2
No ratings yet
Connection 2
5 pages
2600A_IM-CD_20050303
No ratings yet
2600A_IM-CD_20050303
32 pages
LPD6803 Datasheet
No ratings yet
LPD6803 Datasheet
5 pages
Waxworks Manual PDF
No ratings yet
Waxworks Manual PDF
15 pages
Most Flexible and Secure Coin Validator: CPI Headquarters 3222 Phoenixville Pike, Suite 200, Malvern, PA 19355 USA
No ratings yet
Most Flexible and Secure Coin Validator: CPI Headquarters 3222 Phoenixville Pike, Suite 200, Malvern, PA 19355 USA
2 pages
Integra II en 1212
No ratings yet
Integra II en 1212
32 pages
XPC BIOS User Guide: For The: DQ70 Series
No ratings yet
XPC BIOS User Guide: For The: DQ70 Series
25 pages
DeZhi Mainframe User Support Site
No ratings yet
DeZhi Mainframe User Support Site
1 page
N Vision
No ratings yet
N Vision
305 pages
AI-Driven Smart Fan With Adaptive Speed Control
No ratings yet
AI-Driven Smart Fan With Adaptive Speed Control
10 pages
Job Description for Lead - Network & IT Infrastructure
No ratings yet
Job Description for Lead - Network & IT Infrastructure
2 pages
Atmel AVR
No ratings yet
Atmel AVR
17 pages
Nokia N900 RX-51 Schematics
No ratings yet
Nokia N900 RX-51 Schematics
18 pages
Compile Time Polymorphism: Compile Time Polymorphism Means Compiler
No ratings yet
Compile Time Polymorphism: Compile Time Polymorphism Means Compiler
5 pages
EM4094 Datasheet
No ratings yet
EM4094 Datasheet
1 page
Anju Yadav
No ratings yet
Anju Yadav
3 pages
LTC6810 1 6810 2
No ratings yet
LTC6810 1 6810 2
86 pages
VZ01 Rockwell Automation TechED 2016 - VZ01 - Overview of Visualization - Demo
No ratings yet
VZ01 Rockwell Automation TechED 2016 - VZ01 - Overview of Visualization - Demo
45 pages

6.Module 2_Part 2

Uploaded by

6.Module 2_Part 2

Uploaded by

COMPUTER

Department of Computer Science & Engineering

• Concept of Cache Memory

• If the data is in the cache it is called a Read or Write hit.

and associative mapping and most common

• If there is only one block per set (k=1), it is the

LOCKUP FREE CACHE

You might also like