Unit 4 MMemory Hierarchy
Unit 4 MMemory Hierarchy
In the Computer System Design, Memory Hierarchy is an enhancement to organize the memory such that it
can minimize the access time. The Memory Hierarchy was developed based on a program behavior known as
locality of references. The figure below clearly demonstrates the different levels of the memory hierarchy.
Memory Hierarchy is one of the most required things in Computer Memory as it helps in optimizing the
memory available in the computer. There are multiple levels present in the memory, each one having a
different size, different cost, etc. Some types of memory like cache, and main memory are faster as compared
to other types of memory but they are having a little less size and are also costly whereas some memory has
a little higher storage value, but they are a little slower. Accessing of data is not similar in all types of memory,
some have faster access whereas some have slower access.
External Memory or Secondary Memory: Comprising of Magnetic Disk, Optical Disk, and
Magnetic Tape i.e. peripheral storage devices which are accessible by the processor via an I/O
Module.
Internal Memory or Primary Memory: Comprising of Main Memory, Cache Memory & CPU
registers. This is directly accessible by the processor.Memory Hierarchy Design
1. Registers
Registers are small, high-speed memory units located in the CPU. They are used to store the most frequently
used data and instructions. Registers have the fastest access time and the smallest storage capacity, typically
ranging from 16 to 64 bits.
2. Cache Memory
Cache memory is a small, fast memory unit located close to the CPU. It stores frequently used data and
instructions that have been recently accessed from the main memory. Cache memory is designed to minimize
the time it takes to access data by providing the CPU with quick access to frequently used data.
3. Main Memory
1
Main memory, also known as RAM (Random Access Memory), is the primary memory of a computer system.
It has a larger storage capacity than cache memory, but it is slower. Main memory is used to store data and
instructions that are currently in use by the CPU.
Static RAM: Static RAM stores the binary information in flip flops and information remains valid
until power is supplied. It has a faster access time and is used in implementing cache memory.
Dynamic RAM: It stores the binary information as a charge on the capacitor. It requires refreshing
circuitry to maintain the charge on the capacitors after a few milliseconds. It contains more memory
cells per unit area as compared to SRAM.
4. Secondary Storage
Secondary storage, such as hard disk drives (HDD) and solid-state drives (SSD), is a non-volatile memory
unit that has a larger storage capacity than main memory. It is used to store data and instructions that are not
currently in use by the CPU. Secondary storage has the slowest access time and is typically the least expensive
type of memory in the memory hierarchy.
5. Magnetic Disk
Magnetic Disks are simply circular plates that are fabricated with either a metal or a plastic or a magnetized
material. The Magnetic disks work at a high speed inside the computer and these are frequently used.
6. Magnetic Tape
Magnetic Tape is simply a magnetic recording device that is covered with a plastic film. It is generally used
for the backup of data. In the case of a magnetic tape, the access time for a computer is a little slower and
therefore, it requires some amount of time for accessing the strip.
Capacity: It is the global volume of information the memory can store. As we move from top to
bottom in the Hierarchy, the capacity increases.
Access Time: It is the time interval between the read/write request and the availability of the data.
As we move from top to bottom in the Hierarchy, the access time increases.
Performance: Earlier when the computer system was designed without a Memory Hierarchy design,
the speed gap increased between the CPU registers and Main Memory due to a large difference in
access time. This results in lower performance of the system and thus, enhancement was required.
This enhancement was made in the form of Memory Hierarchy Design because of which the
performance of the system increases. One of the most significant ways to increase system
performance is minimizing how far down the memory hierarchy one has to go to manipulate data.
Cost Per Bit: As we move from bottom to top in the Hierarchy, the cost per bit increases i.e. Internal
Memory is costlier than External Memory.
It helps in removing some destruction, and managing the memory in a better way.
2
It helps in spreading the data all over the computer system.
It saves the consumer’s price and time.
According to the memory Hierarchy, the system-supported memory standards are defined below:
Level 1 2 3 4
Name Register Cache Main Memory Secondary
Memory
Size <1 KB less than 16 MB <16GB >100 GB
Implementation Multi-ports On-chip/SRAM DRAM (capacitor Magnetic
memory)
Access Time 0.25ns to 0.5ns 0.5 to 25ns 80ns to 250ns 50 lakh ns
Bandwidth 20000 to 1 lakh 5000 to 15000 1000 to 5000 20 to 150
MB
Managed by Compiler Hardware Operating System Operating
System
Backing From cache from Main from Secondary from ie
Mechanism Memory Memory
Answer:
The number of chips is computed as:
3
7 The memory hierarchy (1)
• read instructions;
• read/write data.
Memory is not uniformly accessed; addresses in some region are accessed more often than
others, and some addresses are accessed again shortly after the current access. In other
words programs tend to favor parts of the address space at any moment of time.
4
7.2 Finite memory latency and performance
The numerical expression of this principle is given by the 90/10 rule of thumb: 90% of the
running time of a program is spent accessing 10% of the address space of that program. If
this is the case, then it is natural to think at a memory hierarchy: map somehow the most
used addresses to a fast memory that needs to represent roughly only 10% of the address
space, and the program will run for most of the time (90%) using that fast memory. The rest
of the memory can be slower because is accessed less, it has to be larger though. This is
not that difficult because slower memories are cheaper.
A memory hierarchy has several levels: the uppermost level is the closest to the CPU and
it is the fastest (to match the processor's speed) and the smallest; as we go downwards to
the bottom of the hierarchy, each level gets slower and larger as the previous one but with
a lower price per bit.
As for the data different levels of the hierarchy hold, each level is a subset of the level below
it, in that data in one level can be found in the level immediately below it. Memory items
(they may represent instructions or data) are brought into the higher level when they are
referred for the first time because there is a good chance they will be accessed again soon,
and migrate back to the lower level when room must be made for newcomers.
In previous Chapter we discussed about the ideal CPI of instructions in the instruction set.
At that moment we assumed the memory is fast enough to deliver the item being accessed
without introducing wait states. If we look at Figure 5.1, we see that in state Q1 the
MemoryReady signal is tested to determine if the memory cycle is complete or not; if not,
then the Control Unit returns in state Q1, continuing to assert the control lines necessary for
a memory access. Every clock cycle in which MemoryReady = No increases the CPI for
that instruction with one. The same is true for load or store instructions. The bigger the
memory's response time is, the higher the real CPI for that instruction is. Suppose an
instruction has an ideal CPI of n (i.e. there is a sequence of n states in the state-diagram,
corresponding to this instruction), and k of them are addressing cycles; k=2 for load/store
instructions, and k=1 for all other ones in our instruction set. The ideal CPI for this
instruction is:
If every addressing cycle introduces w waiting clock cycles, we have the real CPI for this
instruction:
5
4 The memory hierarchy (1)
We want to calculate the real CPI for our instruction set; assume that the
ideal CPI is 4 (computed with some accepted instruction mix). Which is
the real CPI if every memory access introduces one wait cycle? Loads and
stores are 25% of the instructions being executed.
We get:
CPIreal = 4 + (f1*1 + f2*2)*w
CPIreal = 4 + (0.75*1 + 0.25*2)*1
CPIreal = 4 + 1.25 = 5.25
A machine that had an ideal memory would run faster then the one in our
problem by:
CPI
real 5.25
---------------------- – 1 = ---------- – 1 = 0.31 = 31%
CPI 4
ideal
The above example should make clear what a big difference is between the
expectations and the reality. We must have a really fast memory close to
the CPU to get full advantage of the CPU's performance. As a final
comment, it may happen that the read and write behave differently, in that
they require different clock cycles to conclude; the formula giving CPIreal
will be then slightly modified.
6
7.3 Some Definitions
level i
Block
level i+1
FIGURE 7.1 Two levels in a memory hierarchy. The unit of information that is transferred between levels
of hierarchy is called block.
7
4 The memory hierarchy (1)
Fixed size blocks are the most common; in this case the size of the memory
is a multiple of the block size.
Note that it is not necessary that blocks between different memory levels
all have the same size. It is possible that transfers between level i and i+1
are done with blocks of size bi, while transfers between level i+1 and i+2 is
done with a different block size bi+1. Generally the block size is a power of
2 number of bytes, but there is no rule for this, and, as a matter of fact,
deciding the block size is a difficult problem as we shall discuss soon.
The reason for having a memory hierarchy is that we want a memory that
behaves like a very fast one and is cheap as a slower one. For this to happen,
most of the memory accesses must be found in the upper level of the
hierarchy. In this case we say we have a hit. Otherwise, if the addressed item
is in a lower level of the hierarchy, we have a miss; it will take longer until
the addressed item gets to the CPU.
The hit time is the time it takes to access an item in the upper level of the
memory hierarchy; this time includes the time spent to determine if there is
a hit or a miss. In the case of a miss there is a miss penalty because the item
accessed has to be brought from the lower level in memory into the higher
level, and then the sought item delivered to the caller (this is usually the
CPU). The miss penalty includes two times:
• the access time for the first element of a block into the lower level
of the hierarchy;
• the transfer time for the remaining parts of the block; in the case
of a miss a whole block is replaced with a new one from the lower
level.
The hit rate is the fraction of the memory accesses that hit. The miss rate is
the fraction of the memory accesses that miss:
miss rate = 1 - hit rate
The hit rate (or the miss rate if you prefer) does not characterize the memory
hierarchy alone; it depends both upon the memory organization and the
program being run on the machine. For a given program and machine the hit
rate can be experimentally determined as follows: run the program and count
how many times the memory is accessed, say this number is N, and how
many times accesses are hits, say this number is Nh; then the hit ratio (H) is
given by:
8
7.4 Defining the performance for a memory hierarchy
The cost of a memory hierarchy can be computed if we know the price per
bit Ci and the capacity Si of every level in the hierarchy. Then the average
cost per bit is given by:
C 1 * S 1 + C 2 * S 2 + ... + Cn * Sn
C = --------------------------------------------------------------------------------------------
S 1 + S 2 + ... + S n
where tav is the average memory access time. Do not forget that the hit time
is basically the access time of the memory at the first level in the hierarchy,
tA1, plus the time to detect if it is a hit or a miss.
The hit time for a two level memory hierarchy is 40ns, the miss penalty is
400ns, and the hit rate is 99%. Which is the average access time for this
memory?
Answer:
The miss rate is:
miss_rate = 1 - hit_rate
miss_rate = 1 - 0.99 = 0.01
The hit time as well as the miss time can be expressed as absolute time, as
in the example above, or in clock cycles as in the example 7.4
9
4 The memory hierarchy (1)
miss penalty
access time
block size
miss rate
block size
FIGURE 7.2 The relation between block size and miss penalty / miss rate.
10
7.4 Defining the performance for a memory hierarchy
The hit time for a memory is 1 clock cycles, and the miss penalty is 20
clock cycles. What should be the hit rate to have an average access time of
1.5 clock cycles.
Answer:
– hit_time
t
av
miss_rate = ----------------------------------
miss_penalty
1.5 – 1
miss_rate = ---------------- = 0.0025 = 2.5%
20
Figure 7.2 presents the general relation between the miss penalty and the block size as well as
the general appearance of a relation between the miss rate and the block size, for a given two
level memory hierarchy. The minimum value of the miss penalty equals the access time of the
memory in the lower level of the memory hierarchy; this happens if there is only one item
transferred from the lower level. As the block size increases, the miss penalty increases also,
every supplementary item being transferred taking the same amount of time.
On the other hand, the miss penalty decreases for a while when the block size increases, This is
due to the spatial locality: a larger block size increases the probability that neighboring items
will be found in the upper level of the memory. Above a certain block size, the miss rate starts
to increase; as the block size increases the upper level of the hierarchy can accommodate fewer
and fewer blocks: when the block being transferred contains more information than needed for
the spatial locality properties of the program, it means that time is being spent for useless
transfers, and that blocks containing useful information, which could be accessed soon (temporal
locality), are replaced in the upper level of the hierarchy.
As the goal of the memory hierarchy is to provide the best access time, the designer must find
the minimum of the product:
miss_rate * miss_penalty
11
4 The memory hierarchy (1)
As we have already mentioned, the hit time includes the time necessary to
determine if the item being accessed is in the upper level of the memory
hierarchy (a hit) or not (a miss). Because this decision must take as little
time as possible, it has to be implemented in hardware.
A block transfer occurs at every miss. If the block transfer is short (tens of
clock cycles) then it is hardware handled. If the block transfer is large
(hundreds to thousands of clock cycles) then it can be software controlled.
Which could be the reason for such long lasting transfers? Basically this
happens when the difference between memory access times at two levels of
memory hierarchy are very large.
The typical access time for a hard-disk is 10ms. The CPU is running at a
50MHz clock rate. How many clock cycles does the access time represent?
How many clock cycles are necessary to transfer a 4KB block at a rate of
10MB/s?
Answer:
The clock cycle is given by:
1000
T ns = ---------------------------------------
ck clock_rate[Mhz]
1000
T = ----------- = 20ns
ck 50
3
4 *10 –3
t = -------------------- = 4 *10 s = 4ms
T 6
10 *10
12
4.6 How Does Data Migrate Between the Hierarchy's Levels
This example clearly shows that a block transfer from the disk can be
resolved in software, in the sense that it is the CPU that takes all necessary
actions to start the disk accessing process; a few thousand clock cycles are
around 1% of the disk access time.
When block transfers are short, up to tens of clock cycles, the CPU waits
until the transfer is complete. On the other hand, for long transfers, it
would be a waste to let the CPU wait until the transfer is complete; in this
case it is more appropriate to switch to another task (process) and works
until an interrupt from the accessed devices informs the CPU that the
transfer is complete; then the instruction that caused the miss can be
restarted (obviously there must be hardware + software support to restart
instructions).
In the case of a miss data has to be brought from a lower level in the
hierarchy into a higher level; the question here is:
Bringing a new block into the upper level means that it has to replace some
other block here; the question is:
These questions may be asked for any two neighbor levels of the hierarchy,
and they will help us take the proper decisions in design.
13
4.6 How Does Data Migrate Between the Hierarchy's Levels
14