6.Module 2_Part 2
6.Module 2_Part 2
ARCHITECTURE
PCC-CS 402
(Module 2 - Part 2)
• Read hit:
The data is obtained from the cache.
• Write hit:
Cache has a replica of the contents of the main memory.
Contents of the cache and the main memory may be updated simultaneously.
This is the write-through protocol.
Update the contents of the cache, and mark it as updated by setting a bit known
as the dirty bit or modified bit. The contents of the main memory are updated
when this block is replaced. This is write-back or copy-back protocol.
CACHE MISS
• If the data is not present in the cache, then a Read miss or Write miss occurs.
• Read miss:
Block of words containing this requested word is transferred from the memory.
After the block is transferred, the desired word is forwarded to the processor.
The desired word may also be forwarded to the processor as soon as it is transferred
without waiting for the entire block to be transferred. This is called load-through or
early-restart.
• Write-miss:
Write-through protocol is used, then the contents of the main memory are updated
directly.
If write-back protocol is used, the block containing the addressed
word is first brought into the cache. The desired word is overwritten with new
information.
Mapping Functions
• Mapping functions determine how main memory blocks are placed in
the cache
• A simple processor example:
− Cache consisting of 128 blocks of 16 words, total 2048 (2K) words
− Main memory is addressable by 16-bit address
− Main memory has 64K words, with 4028 blocks of 16 words each
• Three mapping functions:
− Direct mapping
− Associative mapping
− Set-associative mapping
Direct Mapping
Main
• Block j of the main memory maps to j modulo 128
memory
Block 0
tag
Cache Block 1
of the cache; 0 maps to 0, 129 maps to 1
• Each memory block can be placed in only one
Block 0
tag
Block 1
position in the cache
Block 127
• More than one memory block can be mapped onto
tag
Block 128
same position in cache
Block 127 Block 129
• Memory address is divided into three fields:
− Low order 4 bits determine one of the 16 words
in a block
Tag Block Word
Block 255 − Next 7 bits determine location of cache block
5 7 4 Block 256
− High order 5 bits determine which of possible
Main memory address Block 257 32 blocks is currently present in cache; these
are Tag bits which get stored along with cache
block
Block 4095
Direct Mapping
Main
memory
Block 0 • Locating an address in cache
Cache Block 1 − Block Number derived from Main Memory
tag
Block 0 Block
tag
Block 1 − Upper 5 bits of address matched with Tag of
Block 127 the specific cache block
Block 128 − If match, cache hit, else, cache miss
tag
Block 127 Block 129
• Advantages:
− Simple to implement
Block 255 − Replacement method also simple
Tag Block Word
5 7 4 Block 256 • Disadvantages:
Main memory address Block 257 − Cache Hit Ratio is not high
− Not very flexible
Block 4095
Associative Mapping
Main
memory
Block 0 • Main memory block can be placed into any
Cache Block 1 cache position
tag
Block 0 • Memory address is divided into two fields:
tag
Block 1
− Low order 4 bits identify the word within a
Block 127 block
Block 128 − High order 12 bits or tag bits identify a
tag
Block 127 Block 129 memory block when it is resident in the
cache
• Advantages:
Tag Word
Block 255 − Flexible and uses cache space efficiently
12 4 Block 256
• Disadvantages:
Main memory address Block 257
− More complex, as all tag addresses must be
checked to locate a memory block in cache
− Requires associative memory access
Block 4095
Set-Associative Mapping
Cache Block 0
Main
tag Block 0
memory
Block 1
• Blocks of cache are grouped into sets; Mapping
tag Block 1
function allows a block of the main memory to
tag Block 2
reside in any block of a specific set
tag Block 3 Block 127
• Hence, this mapping is associative between all
blocks in the same set (Set-Associative)
Block 128
Block 129
• In this example, the cache is divided into 64 sets,
tag
Block 126
with two blocks per set; called 2-way set-
tag
Block 127 associative
• Memory block 0, 64, 128 etc. map to set 0, and
they can occupy either of the two positions
Block 255
Tag Set Word
Block 256
6 6 4
• Other possible combinations are 32 sets with 4
Block 257
Main memory address blocks each (4-way) or 16 sets with 8 blocks (8-
way)
• K-way set-associative cache stands for k blocks
Block 4095
per set
Set-Associative Mapping
Cache
Main
memory
Block 0 • Memory address is divided into three parts:
tag Block 0 Block 1 − 4 bits for words in block
tag Block 1
− 6 bit field determines the set number
tag Block 2
− High order 6 bits determines tag bits for both
tag Block 3 Block 127
blocks in a set
• Set-associative mapping is combination of direct
Block 128
tag
Block 127 • Reduces block conflict of direct mapping and
complex tag search in associative mapping
Tag Set Word
Block 255
• Number of blocks per set is a design parameter
If all blocks are in one set, same as associative
Block 256
6 6 4 •
mapping
Block 257
Main memory address
Performance considerations
PERFORMANCE CONSIDERATIONS
A key design objective of a computer system is to achieve the best
possible performance at the lowest possible cost.
Price/performance ratio is a common measure of success.
Performance of a processor depends on:
How fast machine instructions can be brought into the processor for
execution.
How fast the instructions can be executed.
PERFORMANCE OF CACHE MEMORY
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE
CONT…
TECHNIQUES TO IMPROVE THE CACHE MEMORY PERFORMANCE CONT…
Solution:
INTERLEAVING
Divides the memory system into a number of memory
modules. Each module has its own address buffer register (ABR) and data
buffer register (DBR).
Arranges addressing so that successive words in the
address space are placed in different modules.
When requests for memory access involve consecutive
addresses, the access will be to different modules.
Since parallel access to these modules is possible, the
average rate of fetching words from the Main Memory
can be increased.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)
Prefetching
• New data are brought into the processor when they are first needed.
• Processor has to wait before the data transfer is complete.
• Prefetch the data into the cache before they are actually needed, or a before
a Read miss occurs.
• Prefetching can be accomplished through software by including a special
instruction in the machine language of the processor.
Inclusion of prefetch instructions increases the length of the programs.
• Prefetching can also be accomplished using hardware:
Circuitry that attempts to discover patterns in
memory references and then prefetches according
to this pattern.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)
Lockup-Free Cache
• Prefetching scheme does not work if it stops other accesses
to the cache until the prefetch is completed.
• A cache of this type is said to be “locked” while it services
a miss.
• Cache structure which supports multiple outstanding
misses is called a lockup free cache.
• Since only one miss can be serviced at a time, a lockup
free cache must include circuits that keep track of all the
outstanding misses.
• Special registers may hold the necessary
information about these misses.
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)
WRITE BUFFER
• Each write operation involves writing to the main memory
• If the processor has to wait for the write operation to complete, it
slows down the processor
• Processor does not depend on the results of the write operation
• Write buffer can be included for temporary storage of write requests
• Processor places each write request into the buffer and continues
execution
• If a subsequent Read request references data which is still in the write
buffer, then this data is referenced in the write buffer
• Applies to both Write-Through and Write-Back techniques
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)
PREFETCHING
• New data are brought into the processor when they are first needed
• The processor has to wait before the data transfer is complete
• Prefetching loads data into the cache before they are actually needed
• Prefetching can be accomplished through software:
− Including a special instruction in the machine language of the
processor
• Prefetching can also be accomplished using hardware:
− Special circuitry used to discover patterns in memory references
and then prefetching according to this pattern
OTHER PERFORMANCE ENHANCEMENTS (CONTD.,)