0% found this document useful (0 votes)
24 views

Lec 4

The document introduces cache memory and discusses its role in bridging the processor-memory performance gap by storing frequently accessed memory blocks closer to the CPU, the principles of locality that caches exploit, the basic organization and addressing of direct-mapped caches including block placement, identification and replacement policies, and provides an example of a 1KB direct-mapped cache layout.

Uploaded by

jettychetan524
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lec 4

The document introduces cache memory and discusses its role in bridging the processor-memory performance gap by storing frequently accessed memory blocks closer to the CPU, the principles of locality that caches exploit, the basic organization and addressing of direct-mapped caches including block placement, identification and replacement policies, and provides an example of a 1KB direct-mapped cache layout.

Uploaded by

jettychetan524
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Multicore Computer Architecture - Storage and Interconnects

Lecture 4
Introduction to Cache Memory

Dr. John Jose


Assistant Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati, Assam.
Processor Memory Performance Gap
Relationship of Caches and Pipeline
Memory

I-$ D-$

Next PC
Zero?
4

MEM/WB
EX/MEM
Address

Memory

ID/EX

Memory
IF/ID

ALU

Data
Ins

RF

WB Data
RD RD RD
Role of memory
 Programmers want unlimited amount of fast memory
 Create the illusion of a very large and fast memory
 Implement the memory of a computer as a hierarchy
 Multiple levels of memory with different speeds and sizes
 Entire addressable memory space available in largest, slowest
memory
 Keep the smaller and faster memories close to the processor and
the slower, large memory below that.
Memory Hierarchy
Cache Memory - Introduction
 Cache is a small, fast buffer between processor and memory
 Old values will be removed from cache to make space for new
values
 Principle of Locality : Programs access a relatively small portion
of their address space at any instant of time
 Temporal Locality : If an item is referenced, it will tend to be
referenced again soon
 Spatial Locality : If an item is referenced, items whose addresses
are close by will tend to be referenced soon
Access Patterns

Ref: MISSISSIPPI

Ref: ABCDEFAGHI
Cache Fundamentals
 Block/Line : Minimum unit of information that can be either present or
not present in a cache level
 Hit : An access where the data requested by the processor is present in
the cache
 Miss : An access where the data requested by the processor is not
present in the cache
 Hit Time : Time to access the cache memory block and return the data
to the processor.
 Hit Rate / Miss Rate: Fraction of memory access found (not found) in
the cache
 Miss Penalty : Time to replace a block in the cache with the
corresponding block from the next level.
CPU – Cache Interaction
The transfer unit The tiny, very fast CPU
between the CPU register file has room for four 4-
register file and the byte words
cache is a 4-byte word
line 0
line 1 The small fast L1 cache has
The transfer unit room for two 4-word blocks
between the cache and
block 10 abcd
main memory is a 4- .. The big slow main memory
word block (16B) block 21 pqrs has room for many 4-word
...
. blocks
block 30 wxyz
...
General Organization of a Cache
1 valid bit t tag bits B = 2 b bytes
Cache is an array of
per line per line per cache block
sets
Each set contains valid tag 0 1 • • • B–1
one or more lines set 0: valid ••• E lines
tag 0 1 • • • B–1 per set
Each line holds a
block of data valid tag 0 1 • • • B–1
s set 1: •••
S = 2 sets valid tag 0 1 • • • B–1
•••
valid tag 0 1 • • • B–1
set S-1: valid •••
tag 0 1 • • • B–1
Cache size: C = B x E x S data bytes
Addressing Caches
CPU wants  Address A: t bits s bits b bits
m-1 0
v tag 0 1 • • • B–1
••• <tag> <set index><block offset>
set 0: v tag 0 1 • • • B–1

v tag 0 1 • • • B–1
•••
set 1: • • • B–1
v tag 0 1 The word at address A is in the cache if
•••
the tag bits in one of the <valid> lines in
v tag 0 1 • • • B–1
set S-1:
••• set <set index> match <tag>
v tag 0 1 • • • B–1
The word contents begin at offset
<block offset> bytes from the beginning of
the block
Addressing Caches
CPU wants  Address A: t bits s bits b bits
m-1 0
v tag 0 1 • • • B–1
••• <tag> <set index><block offset>
set 0: v tag 0 1 • • • B–1

v tag 0 1 • • • B–1
•••
set 1: • • • B–1
v tag 0 1
 Locate the set based on <set index>
•••
 Locate the line in the set based on
v tag 0 1 • • • B–1
••• <tag>
set S-1: v tag 0 1 • • • B–1
 Check that the line is valid
 Locate the data in the line based on
<block offset>
Four cache memory design choices
 Where can a block be placed in the cache?
– Block Placement
 How is a block found if it is in the upper level?
– Block Identification
 Which block should be replaced on a miss?
– Block Replacement
 What happens on a write?
– Write Strategy
Block Placement
Cache Mapping / Block Placement
 Direct mapped
 Block can be placed in only one location
 (Block Number) Modulo (Number of blocks in cache)
 Set associative
 Block can be placed in one among a list of locations
 (Block Number) Modulo (Number of sets)
 Fully associative
 Block can be placed anywhere
Direct-Mapped Cache
 Simplest kind of cache, easy to build
 Only 1 tag compare required per access
 Characterized by exactly one line per set.

set 0: valid tag cache block E=1 lines per set


set 1: valid tag cache block
•••
set S-1: valid tag cache block

Cache size: C = B x S data bytes


Accessing Direct-Mapped Caches
 Set selection is done by the set index bits

t bits s bits b bits


00001
m-1 tag set index block offset 0

set 0: valid tag cache block

set 1: valid tag cache block


•••
set S-1: valid tag cache block
Accessing Direct-Mapped Caches
 Block matching: Find a valid block in the selected set with a
matching tag
 Word selection: Then extract the word

=1? (1) The valid bit must be set


0 1 2 3 4 5 6 7
selected set (i):
1 0110 b0 b1 b2 b3
(2) The tag bits in the If (1) and (2), then cache hit If cache hit,
cache line must =? block offset
match the tag bits selects starting
in the address t bits s bits b bits byte.
0110 i 100
m-1 0
tag set index block offset
Block Identification – Direct mapped
Direct Mapped Cache
Eg: 1KB direct mapped cache with 32 B cache line

31 Block address 9 4 0
Cache Tag Cache Index Byte Select
Example: 0x50 Ex: 0x01 Ex: 0x00
Stored as part of the cache “state”

Valid Bit Cache Tag Cache Data

: :
0 Byte 31 Byte 1 Byte 0 0
1 0x50 Byte 63 Byte 33 Byte 32 1
2 2
3 3
: : :

:
31 Byte 1023 Byte 992 31
Set Associative Cache
 Characterized by more than one line per set

valid tag cache block


set 0: E=2
valid tag cache block
lines per set
valid tag cache block
set 1:
valid tag cache block
•••
valid tag cache block
set S-1: valid tag cache block

2-way associative cache


Accessing Set Associative Caches
 Set selection is identical to direct-mapped cache
t bits s bits b bits
0001
m-1 0
tag set index block offset

valid tag cache block


set 0: valid tag cache block

valid tag cache block


set 1: valid tag cache block
•••
valid tag cache block
set S-1: valid tag cache block
Accessing Set Associative Caches
 Block matching is done by comparing the tag in each valid line in the
selected set.

=1? (1) The valid bit must be set


0 1 2 3 4 5 6 7

1 1001
selected set (i): 1 0110 b0 b1 b2 b3

(2) The tag bits in one of


If (1) and (2), then cache hit
the cache lines must
=?
match the tag bits in the
address
t bits s bits b bits
0110 i 100
m-1 0
tag set index block offset
Accessing Set Associative Caches
 Word selection is done same as direct mapped cache but chosen only
on the line has produced a hit.

=1? (1) The valid bit must be set


0 1 2 3 4 5 6 7

1 1001
selected set (i): 1 0110 b0 b1 b2 b3
(2) The tag bits in one of
(3) If cache hit,
the cache lines must
=? block offset
match the tag bits in
selects starting
the address
t bits s bits b bits byte.
0110 i 100
m-1 0
tag set index block offset
Block Identification – Set Associative
Set Associative Cache
 2-way set associative: 2 direct mapped caches in parallel
Cache Index selects a set from the cache
The two tags in the set are compared to the input in parallel
Data is selected based on the tag result
Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0
: : : : : :

Adr Tag Adr Tag


Compare Sel1 1 0 Sel0 Compare
Mux
OR
Cache Block
Hit
Block Identification – Fully Associative
Cache Indexing
t bits s bits b bits
00001
m-1 tag set index block offset 0
 Decoders are used for indexing
 Indexing time depends on decoder size ( s: 2s)
 Smaller number of sets, less indexing time.
Why Use Middle Bits as Index?
4-line Cache
00 High-Order Bit Indexing
01 0000
10 0001
11 0010
0011
High-Order Bit Indexing 0100
0101
Adjacent memory lines 0110
0111
would map to same 1000
cache entry 1001
1010
Poor use of spatial 1011
1100
locality 1101
1110
1111
Why Use Middle Bits as Index?
4-line Cache High-Order Middle-Order
00 Bit Indexing Bit Indexing
01 0000 0000
10 0001 0001
11 0010 0010
0011 0011
Middle-Order Bit Indexing 0100 0100
0101 0101
Consecutive memory 0110 0110
0111 0111
lines map to different 1000 1000
cache lines 1001 1001
1010 1010
Better use of spacial 1011 1011
1100 1100
locality without 1101 1101
replacement 1110 1110
1111 1111
[email protected]
http://www.iitg.ac.in/johnjose/

You might also like