0% found this document useful (0 votes)
34 views

Unit 4 MMemory Hierarchy

The document discusses the memory hierarchy design in computer systems. It describes the different levels of memory from fastest to slowest, including registers, cache memory, main memory, and secondary storage. It explains the principles of locality and performance benefits of the memory hierarchy.

Uploaded by

Ngum Precious
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Unit 4 MMemory Hierarchy

The document discusses the memory hierarchy design in computer systems. It describes the different levels of memory from fastest to slowest, including registers, cache memory, main memory, and secondary storage. It explains the principles of locality and performance benefits of the memory hierarchy.

Uploaded by

Ngum Precious
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit 4

Memory Hierarchy Design

In the Computer System Design, Memory Hierarchy is an enhancement to organize the memory such that it
can minimize the access time. The Memory Hierarchy was developed based on a program behavior known as
locality of references. The figure below clearly demonstrates the different levels of the memory hierarchy.

Why Memory Hierarchy is Required in the System?

Memory Hierarchy is one of the most required things in Computer Memory as it helps in optimizing the
memory available in the computer. There are multiple levels present in the memory, each one having a
different size, different cost, etc. Some types of memory like cache, and main memory are faster as compared
to other types of memory but they are having a little less size and are also costly whereas some memory has
a little higher storage value, but they are a little slower. Accessing of data is not similar in all types of memory,
some have faster access whereas some have slower access.

Types of Memory Hierarchy

This Memory Hierarchy Design is divided into 2 main types:

 External Memory or Secondary Memory: Comprising of Magnetic Disk, Optical Disk, and
Magnetic Tape i.e. peripheral storage devices which are accessible by the processor via an I/O
Module.
 Internal Memory or Primary Memory: Comprising of Main Memory, Cache Memory & CPU
registers. This is directly accessible by the processor.Memory Hierarchy Design

Memory Hierarchy Design

1. Registers

Registers are small, high-speed memory units located in the CPU. They are used to store the most frequently
used data and instructions. Registers have the fastest access time and the smallest storage capacity, typically
ranging from 16 to 64 bits.

2. Cache Memory

Cache memory is a small, fast memory unit located close to the CPU. It stores frequently used data and
instructions that have been recently accessed from the main memory. Cache memory is designed to minimize
the time it takes to access data by providing the CPU with quick access to frequently used data.

3. Main Memory

1
Main memory, also known as RAM (Random Access Memory), is the primary memory of a computer system.
It has a larger storage capacity than cache memory, but it is slower. Main memory is used to store data and
instructions that are currently in use by the CPU.

Types of Main Memory

 Static RAM: Static RAM stores the binary information in flip flops and information remains valid
until power is supplied. It has a faster access time and is used in implementing cache memory.
 Dynamic RAM: It stores the binary information as a charge on the capacitor. It requires refreshing
circuitry to maintain the charge on the capacitors after a few milliseconds. It contains more memory
cells per unit area as compared to SRAM.

4. Secondary Storage

Secondary storage, such as hard disk drives (HDD) and solid-state drives (SSD), is a non-volatile memory
unit that has a larger storage capacity than main memory. It is used to store data and instructions that are not
currently in use by the CPU. Secondary storage has the slowest access time and is typically the least expensive
type of memory in the memory hierarchy.

5. Magnetic Disk

Magnetic Disks are simply circular plates that are fabricated with either a metal or a plastic or a magnetized
material. The Magnetic disks work at a high speed inside the computer and these are frequently used.

6. Magnetic Tape

Magnetic Tape is simply a magnetic recording device that is covered with a plastic film. It is generally used
for the backup of data. In the case of a magnetic tape, the access time for a computer is a little slower and
therefore, it requires some amount of time for accessing the strip.

Characteristics of Memory Hierarchy

 Capacity: It is the global volume of information the memory can store. As we move from top to
bottom in the Hierarchy, the capacity increases.
 Access Time: It is the time interval between the read/write request and the availability of the data.
As we move from top to bottom in the Hierarchy, the access time increases.
 Performance: Earlier when the computer system was designed without a Memory Hierarchy design,
the speed gap increased between the CPU registers and Main Memory due to a large difference in
access time. This results in lower performance of the system and thus, enhancement was required.
This enhancement was made in the form of Memory Hierarchy Design because of which the
performance of the system increases. One of the most significant ways to increase system
performance is minimizing how far down the memory hierarchy one has to go to manipulate data.
 Cost Per Bit: As we move from bottom to top in the Hierarchy, the cost per bit increases i.e. Internal
Memory is costlier than External Memory.

Advantages of Memory Hierarchy

 It helps in removing some destruction, and managing the memory in a better way.

2
 It helps in spreading the data all over the computer system.
 It saves the consumer’s price and time.

System-Supported Memory Standards

According to the memory Hierarchy, the system-supported memory standards are defined below:

Level 1 2 3 4
Name Register Cache Main Memory Secondary
Memory
Size <1 KB less than 16 MB <16GB >100 GB
Implementation Multi-ports On-chip/SRAM DRAM (capacitor Magnetic
memory)
Access Time 0.25ns to 0.5ns 0.5 to 25ns 80ns to 250ns 50 lakh ns
Bandwidth 20000 to 1 lakh 5000 to 15000 1000 to 5000 20 to 150
MB
Managed by Compiler Hardware Operating System Operating
System
Backing From cache from Main from Secondary from ie
Mechanism Memory Memory

Example 4.1 MEMORY CHIPS AND THEIR CAPACITY:

How many chips are necessary to implement a 4 MBytes memory:


1) using 64 Kbit SRAM;
2) using 1Mbit DRAM;
3) 64 KBytes using 64 Kbit SRAM and the rest using 1Mbit DRAM.

Answer:
The number of chips is computed as:

Memory capacity (expressed in bits)


---------------------------------------------------------------------------------------
Chip capacity (expressed in bits)

3
7 The memory hierarchy (1)

4.1 The Principle of Locality

In running a program the memory is accessed for two reasons:

• read instructions;
• read/write data.

Memory is not uniformly accessed; addresses in some region are accessed more often than
others, and some addresses are accessed again shortly after the current access. In other
words programs tend to favor parts of the address space at any moment of time.

• temporal locality: an access to a certain address tend to be repeated shortly


thereafter;

• spatial locality: an access to a certain address tends to be followed by accesses


to nearby addresses.

4
7.2 Finite memory latency and performance

The numerical expression of this principle is given by the 90/10 rule of thumb: 90% of the
running time of a program is spent accessing 10% of the address space of that program. If
this is the case, then it is natural to think at a memory hierarchy: map somehow the most
used addresses to a fast memory that needs to represent roughly only 10% of the address
space, and the program will run for most of the time (90%) using that fast memory. The rest
of the memory can be slower because is accessed less, it has to be larger though. This is
not that difficult because slower memories are cheaper.

A memory hierarchy has several levels: the uppermost level is the closest to the CPU and
it is the fastest (to match the processor's speed) and the smallest; as we go downwards to
the bottom of the hierarchy, each level gets slower and larger as the previous one but with
a lower price per bit.

As for the data different levels of the hierarchy hold, each level is a subset of the level below
it, in that data in one level can be found in the level immediately below it. Memory items
(they may represent instructions or data) are brought into the higher level when they are
referred for the first time because there is a good chance they will be accessed again soon,
and migrate back to the lower level when room must be made for newcomers.

7.2 Finite memory latency and performance

In previous Chapter we discussed about the ideal CPI of instructions in the instruction set.
At that moment we assumed the memory is fast enough to deliver the item being accessed
without introducing wait states. If we look at Figure 5.1, we see that in state Q1 the
MemoryReady signal is tested to determine if the memory cycle is complete or not; if not,
then the Control Unit returns in state Q1, continuing to assert the control lines necessary for
a memory access. Every clock cycle in which MemoryReady = No increases the CPI for
that instruction with one. The same is true for load or store instructions. The bigger the
memory's response time is, the higher the real CPI for that instruction is. Suppose an
instruction has an ideal CPI of n (i.e. there is a sequence of n states in the state-diagram,
corresponding to this instruction), and k of them are addressing cycles; k=2 for load/store
instructions, and k=1 for all other ones in our instruction set. The ideal CPI for this
instruction is:

CPIideal = n (clock cycles per instruction)

If every addressing cycle introduces w waiting clock cycles, we have the real CPI for this
instruction:

5
4 The memory hierarchy (1)

CPIreal = n + k * w (clock cycles per instruction)

Example 4.2 IDEAL AND REAL CPI:

We want to calculate the real CPI for our instruction set; assume that the
ideal CPI is 4 (computed with some accepted instruction mix). Which is
the real CPI if every memory access introduces one wait cycle? Loads and
stores are 25% of the instructions being executed.

Answer: Using the above formulae we have:


n=4
w=1
k = 1in f1=75% of cases
k = 2in f2=25% of cases (the loads and stores)

We get:
CPIreal = 4 + (f1*1 + f2*2)*w
CPIreal = 4 + (0.75*1 + 0.25*2)*1
CPIreal = 4 + 1.25 = 5.25

A machine that had an ideal memory would run faster then the one in our
problem by:
CPI
real 5.25
---------------------- – 1 = ---------- – 1 = 0.31 = 31%
CPI 4
ideal
The above example should make clear what a big difference is between the
expectations and the reality. We must have a really fast memory close to
the CPU to get full advantage of the CPU's performance. As a final
comment, it may happen that the read and write behave differently, in that
they require different clock cycles to conclude; the formula giving CPIreal
will be then slightly modified.

6
7.3 Some Definitions

level i

smaller access time than at level i+1


smaller capacity than level i+1
higher price per bit than at level i+1

Block

level i+1

FIGURE 7.1 Two levels in a memory hierarchy. The unit of information that is transferred between levels
of hierarchy is called block.

7.3 Some Definitions

Information has to migrate between the levels of an hierarchy. The higher a


level is in the hierarchy, the smaller its capacity is: it can accommodate only
a a small part of the logical address space. Transfers between levels take
place in amounts called blocks, as can be seen in Figure 7.1.

A block may be:


• fixed size
• variable size.

7
4 The memory hierarchy (1)

Fixed size blocks are the most common; in this case the size of the memory
is a multiple of the block size.

Note that it is not necessary that blocks between different memory levels
all have the same size. It is possible that transfers between level i and i+1
are done with blocks of size bi, while transfers between level i+1 and i+2 is
done with a different block size bi+1. Generally the block size is a power of
2 number of bytes, but there is no rule for this, and, as a matter of fact,
deciding the block size is a difficult problem as we shall discuss soon.

The reason for having a memory hierarchy is that we want a memory that
behaves like a very fast one and is cheap as a slower one. For this to happen,
most of the memory accesses must be found in the upper level of the
hierarchy. In this case we say we have a hit. Otherwise, if the addressed item
is in a lower level of the hierarchy, we have a miss; it will take longer until
the addressed item gets to the CPU.

The hit time is the time it takes to access an item in the upper level of the
memory hierarchy; this time includes the time spent to determine if there is
a hit or a miss. In the case of a miss there is a miss penalty because the item
accessed has to be brought from the lower level in memory into the higher
level, and then the sought item delivered to the caller (this is usually the
CPU). The miss penalty includes two times:

• the access time for the first element of a block into the lower level
of the hierarchy;

• the transfer time for the remaining parts of the block; in the case
of a miss a whole block is replaced with a new one from the lower
level.

The hit rate is the fraction of the memory accesses that hit. The miss rate is
the fraction of the memory accesses that miss:
miss rate = 1 - hit rate
The hit rate (or the miss rate if you prefer) does not characterize the memory
hierarchy alone; it depends both upon the memory organization and the
program being run on the machine. For a given program and machine the hit
rate can be experimentally determined as follows: run the program and count
how many times the memory is accessed, say this number is N, and how
many times accesses are hits, say this number is Nh; then the hit ratio (H) is
given by:

8
7.4 Defining the performance for a memory hierarchy

The cost of a memory hierarchy can be computed if we know the price per
bit Ci and the capacity Si of every level in the hierarchy. Then the average
cost per bit is given by:

C 1 * S 1 + C 2 * S 2 + ... + Cn * Sn
C = --------------------------------------------------------------------------------------------
S 1 + S 2 + ... + S n

4.4 Defining the performance for a memory hierarchy

The goal of the designer is a machine as fast as possible. When it comes to


the memory hierarchy, we want an average access time from the memory
as small as possible. The average access time can't be smaller than the access
time of the memory in the highest level of the hierarchy, tA1.

For a two level memory hierarchy we have:

tav = hit_time + miss_rate * miss_penalty

where tav is the average memory access time. Do not forget that the hit time
is basically the access time of the memory at the first level in the hierarchy,
tA1, plus the time to detect if it is a hit or a miss.

Example 4.3 HIT TIME AND ACCESS TIME:

The hit time for a two level memory hierarchy is 40ns, the miss penalty is
400ns, and the hit rate is 99%. Which is the average access time for this
memory?

Answer:
The miss rate is:

miss_rate = 1 - hit_rate
miss_rate = 1 - 0.99 = 0.01

The average access time is:

tav = 40 + 0.01 * 400 = 44ns

greater by 10% than the hit time.

The hit time as well as the miss time can be expressed as absolute time, as
in the example above, or in clock cycles as in the example 7.4
9
4 The memory hierarchy (1)

miss penalty

access time

block size
miss rate

block size

FIGURE 7.2 The relation between block size and miss penalty / miss rate.

10
7.4 Defining the performance for a memory hierarchy

Example 7.4 HIT TIME AND ACCESS TIME:

The hit time for a memory is 1 clock cycles, and the miss penalty is 20
clock cycles. What should be the hit rate to have an average access time of
1.5 clock cycles.

Answer:
– hit_time
t
av
miss_rate = ----------------------------------
miss_penalty
1.5 – 1
miss_rate = ---------------- = 0.0025 = 2.5%
20

hit_rate = 1 - miss_rate = 97.5%

Figure 7.2 presents the general relation between the miss penalty and the block size as well as
the general appearance of a relation between the miss rate and the block size, for a given two
level memory hierarchy. The minimum value of the miss penalty equals the access time of the
memory in the lower level of the memory hierarchy; this happens if there is only one item
transferred from the lower level. As the block size increases, the miss penalty increases also,
every supplementary item being transferred taking the same amount of time.

On the other hand, the miss penalty decreases for a while when the block size increases, This is
due to the spatial locality: a larger block size increases the probability that neighboring items
will be found in the upper level of the memory. Above a certain block size, the miss rate starts
to increase; as the block size increases the upper level of the hierarchy can accommodate fewer
and fewer blocks: when the block being transferred contains more information than needed for
the spatial locality properties of the program, it means that time is being spent for useless
transfers, and that blocks containing useful information, which could be accessed soon (temporal
locality), are replaced in the upper level of the hierarchy.

As the goal of the memory hierarchy is to provide the best access time, the designer must find
the minimum of the product:

miss_rate * miss_penalty

11
4 The memory hierarchy (1)

7.5 Hardware/Software Support for a Memory Hierarchy

As we have already mentioned, the hit time includes the time necessary to
determine if the item being accessed is in the upper level of the memory
hierarchy (a hit) or not (a miss). Because this decision must take as little
time as possible, it has to be implemented in hardware.

A block transfer occurs at every miss. If the block transfer is short (tens of
clock cycles) then it is hardware handled. If the block transfer is large
(hundreds to thousands of clock cycles) then it can be software controlled.
Which could be the reason for such long lasting transfers? Basically this
happens when the difference between memory access times at two levels of
memory hierarchy are very large.

Example 7.5 ACCESS TIME AND CLOCK-RATE:

The typical access time for a hard-disk is 10ms. The CPU is running at a
50MHz clock rate. How many clock cycles does the access time represent?
How many clock cycles are necessary to transfer a 4KB block at a rate of
10MB/s?

Answer:
The clock cycle is given by:
1000
T  ns = ---------------------------------------
ck clock_rate[Mhz]

1000
T = ----------- = 20ns
ck 50

The number of clock cycles the access time represent is nA:



t 6  
A 10 * 10  ns
n = --------- = ----------------------------------- = 500 000 clock cycles
A T 20  ns
ck

The transfer time is tT:


block_size [Bytes]
t  s = ---------------------------------------------------
T transfer_rate[Bytes/s]

3
4 *10 –3
t = -------------------- = 4 *10 s = 4ms
T 6
10 *10

The number of clock cycles the transfer represents is nT:

12
4.6 How Does Data Migrate Between the Hierarchy's Levels

This example clearly shows that a block transfer from the disk can be
resolved in software, in the sense that it is the CPU that takes all necessary
actions to start the disk accessing process; a few thousand clock cycles are
around 1% of the disk access time.

When block transfers are short, up to tens of clock cycles, the CPU waits
until the transfer is complete. On the other hand, for long transfers, it
would be a waste to let the CPU wait until the transfer is complete; in this
case it is more appropriate to switch to another task (process) and works
until an interrupt from the accessed devices informs the CPU that the
transfer is complete; then the instruction that caused the miss can be
restarted (obviously there must be hardware + software support to restart
instructions).

4.6 How Does Data Migrate Between the Hierarchy's Levels

At every memory access it must be determined somehow if the access is a


hit or miss; as such the question that can be asked is:

• how is a block identified if it is or not in the upper level of the


hierarchy?

In the case of a miss data has to be brought from a lower level in the
hierarchy into a higher level; the question here is:

• where can the block be placed in the upper level?

Bringing a new block into the upper level means that it has to replace some
other block here; the question is:

• which block should be replaced on a miss?

As we mentioned, a lower level of the hierarchy contains the whole


information in its upper level; for this to be true we must know what happens
when a write takes place in the upper level:

• what is the write strategy?

These questions may be asked for any two neighbor levels of the hierarchy,
and they will help us take the proper decisions in design.

13
4.6 How Does Data Migrate Between the Hierarchy's Levels

14

You might also like