0% found this document useful (0 votes)
15 views37 pages

Lecture 03

Uploaded by

Tinotenda Kondo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views37 pages

Lecture 03

Uploaded by

Tinotenda Kondo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

LECTURE 3 Memory Hierarchy-PART 1

MEMORY HIERARCHY
When it comes to memory, there are two universally desirable properties:
• Large Size: ideally, we want to never have to worry about running out of memory.
• Speed of Access: we want the process of accessing memory to take as little time as
possible.

But we cannot optimize both of these properties at the same time. As our memory size
increases, the time to find a memory location and access it grows as well.

The goal of designing a memory hierarchy is to simulate having unlimited amounts of


fast memory.
LOCALITY
To simulate these properties, we can take advantage of two forms of locality.
• Temporal Locality: if an item is referenced, it will tend to be referenced again soon.
(TIME)
• The location of the variable counter may be accessed frequently in a loop.
• A branching instruction may be accessed repeatedly in a given period of time.

• Spatial Locality – if an item is referenced, items whose addresses are close by will
tend to be referenced soon. (ADDRESS)
• If we access the location of A[0], we will probably also be accessing A[1], A[2], etc.
• Sequential instruction access also exhibits spatial locality.
MEMORY HIERARCHY
A memory hierarchy, consisting of multiple levels of memory with varying speed and
size, exploits these principles of locality.
• Faster memory is more expensive per bit, so we use it in smaller quantities.
• Slower memory is much cheaper so we can afford to use a lot of it.

The goal is to, whenever possible, keep references in the fastest memory. However,
we also want to minimize our overall memory cost.
MEMORY HIERARCHY

All data in a level is typically also found


in the next largest level.

We keep the smallest, faster memory


unit closest to the processor.

The idea is that our access time during a


running program is defined primarily by
the access time of the level 1 unit. But
our memory capacity is as large as the
level n unit.
MEMORY HIERARCHY
Processor

The unit of data that is transferred between two levels is fixed


Level 1 in size and is called a block, or a line.

Level 2

Level 3
MEMORY HIERARCHY
Speed Processor Size Cost ($/bit) Technology

Fastest Memory Smallest Highest SRAM

Memory DRAM

Slowest Biggest Lowest Magnetic Disk


Memory
MEMORY HIERARCHY
There are four technologies that are used in a memory hierarchy:
• SRAM (Static Random Access Memory): fastest memory available. Used in memory
units close to the processor called caches. volatile.
• DRAM (Dynamic Random Access Memory): mid-range. Used in main memory. Volatile.
• Flash: Falls between DRAM and disk in cost and speed. Used as non-volatile memory
in personal mobile devices.
• Magnetic Disk: slowest memory available. Used as non-volatile memory in a server
or PC.
MEMORY HIERARCHY

Technology Typical Access Time $ per GiB in 2016

SRAM 0.5-5 ns $400 - $1000


DRAM 50-70 ns $3 - $5
Flash 5,000-50,000 ns $0.30 - $0.50
Magnetic Disk 5,000,000 – 20,000,000 ns $0.05 - $0.10
MEMORY HIERARCHY TERMS
• Hit: item found in a specified level of the hierarchy.
• Miss: item not found in a specified level of the hierarchy.
• Hit time: time required to access the desired item in a specified level of the
hierarchy (includes the time to determine if the access is a hit or a miss).
• Miss penalty: the additional time required to service the miss.
• Hit rate: fraction of accesses that are in a specified level of the hierarchy.
• Miss rate: fraction of accesses that are not in a specified level of the hierarchy.
• Block: unit of information that is checked to reside in a specified level of the
hierarchy and is retrieved from the next lower level on a miss.
MEMORY HIERARCHY
The key points so far:
• Memory hierarchies take advantage of temporal locality (TIME) by keeping more
recently accessed data items closer to the processor. Memory hierarchies take
advantage of spatial locality (ADDRESS) by moving blocks consisting of multiple
contiguous words in memory to upper levels of the hierarchy.
• Memory hierarchy uses smaller and faster memory technologies close to the
processor. Accesses that hit in the highest level can be processed quickly. Accesses that
miss go to lower levels, which are larger but slower. If the hit rate is high enough, the
memory hierarchy has an effective access time close to that of the highest (and
fastest) level and a true size equal to that of the lowest (and largest) level.
• Memory is typically a true hierarchy, meaning that data cannot be present in level i
unless it is also present in level i+1.
CACHES
We’ll begin by looking at the most basic cache. Let’s say we’re running a program
that, so far, has referenced 𝑛 − 1 words. These could be 𝑛 − 1 independent integer
variables, for example.
At this point, our cache might look like this (assuming a
block is simply 1 word). That is, every reference made
so far has been moved into the cache to take
advantage of temporal locality.

What happens when our program references 𝑋𝑛 ?


CACHES
A reference to 𝑋𝑛 causes a miss, which forces the cache to fetch 𝑋𝑛 from some lower
level of the memory hierarchy, presumably main memory.

Two Questions:
1. How do we know if an item is
present in the cache?
2. How do we find the item in the
cache?
CACHES
A reference to 𝑋𝑛 causes a miss, which forces the cache to fetch 𝑋𝑛 from some lower
level of the memory hierarchy, presumably main memory.

Two Questions:
1. How do we know if an item is
present in the cache?
2. How do we find the item in the
cache?

One Answer: If each word can go


in exactly one place in the cache, then
we can easily find it in the cache.
DIRECT-MAPPED CACHES
The simplest way to assign a location in the cache for each word in memory is to
assign the cache location based on the address of the word in memory.
This creates a direct-mapped cache – every location in memory is mapped directly to
one location in the cache.
A typical direct-mapped cache uses the following mapping:
𝐵𝑙𝑜𝑐𝑘 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 % (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒)
Conveniently, entering a block into a cache with 2𝑛 entries means just looking at the
lower 𝑛 bits of the block address.
DIRECT-MAPPED CACHE

Here is an example cache which contains


23 = 8 entries.

Blocks in memory are mapped to a


particular cache index if the lower 3 bits
of the block address matches the index.

So, now we know where to find the data


but we still have to answer the following
question: how do we know if the data we
want is in the cache?
TAGS
To verify that a cache entry contains the data we’re looking for, and not data from
another memory address with the same lower bits, we use a tag.
A tag is a field in a table which corresponds to a cache entry and gives extra
information about the source of the data in the cache entry.

What is an obvious choice for the tag?


TAGS
To verify that a cache entry contains the data we’re looking for, and not data from
another memory address with the same lower bits, we use a tag.
A tag is a field in a table which corresponds to a cache entry and gives extra
information about the source of the data in the cache entry.

What is an obvious choice for the tag? The upper bits of the address of the block!
TAGS
For instance, in this particular example,
let’s say the block at address 01101 is
held in the cache entry with index 101.

The tag for the cache entry with index 101


must then be 01, the upper bits of the
address.

Therefore, when looking in the cache for


the block at address 11101, we know that
we have a miss because 11 != 01.
VALID BIT
Even if there is data in the cache entry and a tag associated with the entry, we may
not want to use the data. For instance, when a processor has first started up or when
switching processes, the cache entries and tag fields may be meaningless.
Generally speaking, a valid bit associated with the cache entry can be used to ensure
that an entry is valid.
EXERCISE
Let’s assume we have an 8-entry cache with the initial state shown
to the right. Let’s fill in the cache according to the references that
come in listed in the table below.

Note that initially the valid-bit entries are all ‘N’ for not valid.
EXERCISE
The first reference is for the block at address 22, which uses
the lower bits 110 to index into the cache. The 110 cache
entry is not valid so this is a miss.

We need to retrieve the contents of the block at address 22 and place


it in the cache entry.
EXERCISE
The block at address 22 is now placed in the data entry of the
cache and the tag is updated to the upper portion of the address, 10.
Also, the valid bit is set to ‘Y’.

Now, we have a reference to the block at address 26. What happens


here?
EXERCISE
We have a miss, so we retrieve the data from address 26 and place
it in the cache entry. We also update the tag and valid bit.

Now, we have a reference the block at address 22 again. Now what


happens?
EXERCISE
The correct data is already in the cache! We don’t have to update the
contents or fetch anything from main memory.

Similarly, we will have another reference to the block at address 26.


We do not need to update the cache at all.
EXERCISE
Now, we have a reference to the block at address 16. Its associated
cache entry is invalid, so we will need to fetch the data from main
memory and update the entry.
EXERCISE
Now, we have a reference to the block at address 3. Its associated
cache entry is invalid, so we will need to fetch the data from main
memory and update the entry.
EXERCISE
A reference to the block at address 16 causes a hit (as we have
already pulled this data into the cache) so we do not have to make
any changes.
EXERCISE
Now, we get something interesting. We have a reference to the block
at address 18. The lower bits used to index into the cache are 010.
As these are also the lower bits of address 26, we have a valid entry
but it’s not the one we want. Comparing the tag of the entry with the
upper portion of 18’s binary representation tells us we have a miss.
EXERCISE
We fetch the data at address 18 and update the cache entry to hold
this data, as well as the correct tag. Note now that a reference to the
block at address 26 will result in a miss and we’ll have to fetch that
data again.
PHYSICAL ADDRESS TO CACHE
To the right is a figure showing how a typical physical
address may be divided up to find the valid entry
within the cache.

• The offset is used to indicate the first byte accessed


within a block. Its size is 𝒍𝒐𝒈𝟐 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒃𝒚𝒕𝒆𝒔 𝒊𝒏 𝒃𝒍𝒐𝒄𝒌.
For example, a block containing 4 bytes does not need to
consider the lower 2 bits of the address to index into the
cache.
• The cache index, in this case, is a 10-bit wide lower
portion of the physical address (because there are
210 = 1024 entries).
• The tag is the upper 20 bits of the physical address.
BLOCKS IN A CACHE
We’ve mostly assumed so far that a block contains one word, or 4 bytes. In reality, a
block contains several words.
Assuming we are using 32-bit addresses, consider a direct-mapped cache which
holds 2𝑛 blocks and each block contains 2𝑚 words.
Example of a block in memory
How many bytes are in a block?

How big does a tag field need to be?


BLOCKS IN A CACHE
We’ve mostly assumed so far that a block contains one word, or 4 bytes. In reality, a
block contains several words.
Assuming we are using 32-bit addresses, consider a direct-mapped cache which
holds 2𝑛 blocks and each block contains 2𝑚 words.

How many bytes are in a block? 2𝑚 ∗ 4 = 2𝑚 ∗ 22 = 2𝑚+2 bytes per block.


How big does a tag field need to be? 32 – (n + m + 2). A block has a 32-bit
address. We do not consider the lower m+2 bits because there are 2𝑚+2 bytes in a
block. We need n bits to index into the cache, m bits to identify the word.
EXERCISE
How many total bits are required for a direct-mapped cache with 16 KB of data and
4-word blocks, assuming a 32-bit address?
EXERCISE
How many total bits are required for a direct-mapped cache with 16 KB of data and
4-word blocks, assuming a 32-bit address?

We know that 16 KB is 4K words, which is 212 words, and, with a block size of 4
words (22 ), 210 blocks.
Each block contains 4 words, or 128 bits, of data. Each block also has a tag that is
32-10-2-2 bits long, as well as one valid bit. Therefore, the total cache size is

210 × 128 + 32 − 10 − 2 − 2 + 1 = 147 𝐾𝑏𝑖𝑡𝑠


Or, 18.4 KB cache for 16KB of data.
EXERCISE
Consider a cache with 64 blocks and a block size of 16 bytes (4 words). What block
number does byte address 1200 ( 0100 1011 0000) map to?
EXERCISE
Consider a cache with 64 blocks and a block size of 16 bytes (4 words). What block
number does byte address 1200 map to?

First of all, we know the entry into the cache is given by


𝐵𝑙𝑜𝑐𝑘 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 % (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑐ℎ𝑒)
𝐵𝑦𝑡𝑒 𝐴𝑑𝑑𝑟𝑒𝑠𝑠
Where the block address is given by .
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑦𝑡𝑒𝑠 𝑝𝑒𝑟 𝑏𝑙𝑜𝑐𝑘
1200
So, the block address is = 75.
16
This corresponds to block number 75 % 64 = 11.

You might also like