0% found this document useful (0 votes)
34 views42 pages

Module-2: Computer System Architecture

Read it.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views42 pages

Module-2: Computer System Architecture

Read it.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Lecture Notes on Computer System Architecture by Dr. P. R.

Tripathy

LECTURE NOTES
ON

COMPUTER SYSTEM
ARCHITECTURE
COURSE CODE: 24MC1TCA105

MODULE-2

PREPARED BY

Dr. Pravash Ranjan Tripathy


Dean Academics

GEC AUTONOMOUS COLLEGE,


BHUBANESWAR)

1
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

MODULE-II
Hierarchical Memory Technology:
The computer memory can be divided into 5 major hierarchies that are based on use as well as
speed. A processor can easily move from any one level to some other on the basis of its
requirements. These five hierarchies in a system’s memory are register, cache memory, main
memory, magnetic disc, and magnetic tape.

Design and Characteristics of Memory Hierarchy:

Memory Hierarchy, in Computer System Design, is an enhancement that helps in organizing the
memory so that it can actually minimize the access time. The development of the Memory
Hierarchy occurred on a behavior of a program known as locality of references. Here is a figure
that demonstrates the various levels of memory hierarchy clearly:

Memory Hierarchy Design

2
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

This Hierarchy Design of Memory is divided into two main types. They are:

External or Secondary Memory:


It consists of Magnetic Tape, Optical Disk, Magnetic Disk, i.e. it includes peripheral storage
devices that are accessible by the system’s processor via I/O Module.

Internal Memory or Primary Memory:


It consists of CPU registers, Cache Memory, and Main Memory. It is accessible directly by the
processor.

Characteristics of Memory Hierarchy:


One can infer these characteristics of a Memory Hierarchy Design from the figure given above:

1. Capacity:
It refers to the total volume of data that a system’s memory can store. The capacity
increases moving from the top to the bottom in the Memory Hierarchy.
2. Access Time:
It refers to the time interval present between the request for read/write and the data
availability. The access time increases as we move from the top to the bottom in the
Memory Hierarchy.
1. Performance:
When a computer system was designed earlier without the Memory Hierarchy Design,
the gap in speed increased between the given CPU registers and the Main Memory due
to a large difference in the system’s access time. It ultimately resulted in the system’s
lower performance, and thus, enhancement was required. Such a kind of enhancement
was introduced in the form of Memory Hierarchy Design, and because of this, the
system’s performance increased. One of the primary ways to increase the performance
of a system is minimising how much a memory hierarchy has to be done to manipulate
data.
1. Cost per bit:
The cost per bit increases as one moves from the bottom to the top in the Memory
Hierarchy, i.e. External Memory is cheaper than Internal Memory.

A memory unit is an essential component in any digital computer since it is needed for storing
programs and data.

Typically, a memory unit can be classified into two categories:

1. The memory unit that establishes direct communication with the CPU is called Main
Memory. The main memory is often referred to as RAM (Random Access Memory).

3
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

2. The memory units that provide backup storage are called Auxiliary Memory. For
instance, magnetic disks and magnetic tapes are the most commonly used auxiliary
memories.

Apart from the basic classifications of a memory unit, the memory hierarchy consists all of the
storage devices available in a computer system ranging from the slow but high-capacity
auxiliary memory to relatively faster main memory.

The following image illustrates the components in a typical memory hierarchy.

Auxiliary Memory:

Auxiliary memory is known as the lowest-cost, highest-capacity and slowest-access storage in a


computer system. Auxiliary memory provides storage for programs and data that are kept for
long-term storage or when not in immediate use. The most common examples of auxiliary
memories are magnetic tapes and magnetic disks.

A magnetic disk is a digital computer memory that uses a magnetization process to write,
rewrite and access data. For example, hard drives, zip disks, and floppy disks.

Magnetic tape is a storage medium that allows for data archiving, collection, and backup for
different kinds of data.

Main Memory:

The main memory in a computer system is often referred to as Random Access Memory
(RAM). This memory unit communicates directly with the CPU and with auxiliary memory
devices through an I/O processor.

4
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

The programs that are not currently required in the main memory are transferred into auxiliary
memory to provide space for currently used programs and data.

I/O Processor:

The primary function of an I/O Processor is to manage the data transfers between auxiliary
memories and the main memory.

Cache Memory:

The data or contents of the main memory that are used frequently by CPU are stored in the
cache memory so that the processor can easily access that data in a shorter time. Whenever
the CPU requires accessing memory, it first checks the required data into the cache memory. If
the data is found in the cache memory, it is read from the fast memory. Otherwise, the CPU
moves onto the main memory for the required data.

Main Memory:

The main memory acts as the central storage unit in a computer system. It is a relatively large
and fast memory which is used to store programs and data during the run time operations.

The primary technology used for the main memory is based on semiconductor integrated
circuits. The integrated circuits for the main memory are classified into two major units.

1. RAM (Random Access Memory) integrated circuit chips


2. ROM (Read Only Memory) integrated circuit chips

RAM integrated circuit chips:

The RAM integrated circuit chips are further classified into two possible operating
modes, static and dynamic.

The primary compositions of a static RAM are flip-flops that store the binary information. The
nature of the stored information is volatile, i.e. it remains valid as long as power is applied to
the system. The static RAM is easy to use and takes less time performing read and write
operations as compared to dynamic RAM.

The dynamic RAM exhibits the binary information in the form of electric charges that are
applied to capacitors. The capacitors are integrated inside the chip by MOS transistors. The
dynamic RAM consumes less power and provides large storage capacity in a single memory
chip.

RAM chips are available in a variety of sizes and are used as per the system requirement. The
following block diagram demonstrates the chip interconnection in a 128 * 8 RAM chip.

5
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

 A 128 * 8 RAM chip has a memory capacity of 128 words of eight bits (one byte) per
word. This requires a 7-bit address and an 8-bit bidirectional data bus.
 The 8-bit bidirectional data bus allows the transfer of data either from memory to
CPU during a read operation or from CPU to memory during a write operation.
 The read and write inputs specify the memory operation, and the two chip select
(CS) control inputs are for enabling the chip only when the microprocessor selects it.
 The bidirectional data bus is constructed using three-state buffers.
 The output generated by three-state buffers can be placed in one of the three
possible states which include a signal equivalent to logic 1, a signal equal to logic 0,
or a high-impedance state.

Note: The logic 1 and 0 are standard digital signals whereas the high-impedance state behaves like an open
circuit, which means that the output does not carry a signal and has no logic significance.

The following function table specifies the operations of a 128 * 8 RAM chip.

6
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

From the functional table, we can conclude that the unit is in operation only when CS1 = 1
and CS2 = 0. The bar on top of the second select variable indicates that this input is enabled
when it is equal to 0.

ROM integrated circuit:

The primary component of the main memory is RAM integrated circuit chips, but a portion of
memory may be constructed with ROM chips.

A ROM memory is used for keeping programs and data that are permanently resident in the
computer.

Apart from the permanent storage of data, the ROM portion of main memory is needed for
storing an initial program called a bootstrap loader. The primary function of the bootstrap
loader program is to start the computer software operating when power is turned on.

ROM chips are also available in a variety of sizes and are also used as per the system
requirement. The following block diagram demonstrates the chip interconnection in a 512 * 8
ROM chip.

 A ROM chip has a similar organization as a RAM chip. However, a ROM can only
perform read operation; the data bus can only operate in an output mode.
 The 9-bit address lines in the ROM chip specify any one of the 512 bytes stored in it.
 The value for chip select 1 and chip select 2 must be 1 and 0 for the unit to operate.
Otherwise, the data bus is said to be in a high-impedance state.

Auxiliary Memory:

An Auxiliary memory is known as the lowest-cost, highest-capacity and slowest-access storage


in a computer system. It is where programs and data are kept for long-term storage or when
not in immediate use. The most common examples of auxiliary memories are magnetic tapes
and magnetic disks.

7
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Magnetic Disks:

A magnetic disk is a type of memory constructed using a circular plate of metal or plastic coated
with magnetized materials. Usually, both sides of the disks are used to carry out read/write
operations. However, several disks may be stacked on one spindle with read/write head
available on each surface.

The following image shows the structural representation for a magnetic disk.

 The memory bits are stored in the magnetized surface in spots along the concentric
circles called tracks.
 The concentric circles (tracks) are commonly divided into sections called sectors.

Magnetic Tape:

Magnetic tape is a storage medium that allows data archiving, collection, and backup for
different kinds of data. The magnetic tape is constructed using a plastic strip coated with a
magnetic recording medium.

The bits are recorded as magnetic spots on the tape along several tracks. Usually, seven or nine
bits are recorded simultaneously to form a character together with a parity bit.

Magnetic tape units can be halted, started to move forward or in reverse, or can be rewound.
However, they cannot be started or stopped fast enough between individual characters. For
this reason, information is recorded in blocks referred to as records.

Associative Memory:

An associative memory can be considered as a memory unit whose stored data can be
identified for access by the content of the data itself rather than by an address or memory
location.

8
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Associative memory is often referred to as Content Addressable Memory (CAM).

When a write operation is performed on associative memory, no address or memory location is


given to the word. The memory itself is capable of finding an empty unused location to store
the word.

On the other hand, when the word is to be read from an associative memory, the content of
the word, or part of the word, is specified. The words which match the specified content are
located by the memory and are marked for reading.

The following diagram shows the block representation of an Associative memory.

From the block diagram, we can say that an associative memory consists of a memory array and
logic for 'm' words with 'n' bits per word.

The functional registers like the argument register A and key register K each have n bits, one for
each bit of a word. The match register M consists of m bits, one for each memory word.

The words which are kept in the memory are compared in parallel with the content of the
argument register.

The key register (K) provides a mask for choosing a particular field or key in the argument word.
If the key register contains a binary value of all 1's, then the entire argument is compared with
each memory word. Otherwise, only those bits in the argument that have 1's in their
corresponding position of the key register are compared. Thus, the key provides a mask for
identifying a piece of information which specifies how the reference to memory is made.

The following diagram can represent the relation between the memory array and the external
registers in an associative memory.

9
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

The cells present inside the memory array are marked by the letter C with two subscripts. The
first subscript gives the word number and the second specifies the bit position in the word. For
instance, the cell Cij is the cell for bit j in word i.

A bit Aj in the argument register is compared with all the bits in column j of the array provided
that Kj = 1. This process is done for all columns j = 1, 2, 3......, n.

If a match occurs between all the unmasked bits of the argument and the bits in word i, the
corresponding bit Mi in the match register is set to 1. If one or more unmasked bits of the
argument and the word do not match, Mi is cleared to 0.

Design of Memory Hierarchy:


In computers, the memory hierarchy primarily includes the following:

1. Registers
The register is usually an SRAM or static RAM in the computer processor that is used to hold the
data word that is typically 64 bits or 128 bits. A majority of the processors make use of a status
word register and an accumulator. The accumulator is primarily used to store the data in the
form of mathematical operations, and the status word register is primarily used for decision
making.

2. Cache Memory
The cache basically holds a chunk of information that is used frequently from the main
memory. We can also find cache memory in the processor. In case the processor has a single-

10
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

core, it will rarely have multiple cache levels. The present multi-core processors would have
three 2-levels for every individual core, and one of the levels is shared.

3. Main Memory
In a computer, the main memory is nothing but the CPU’s memory unit that communicates
directly. It’s the primary storage unit of a computer system. The main memory is very fast and a
very large memory that is used for storing the information throughout the computer’s
operations. This type of memory is made up of ROM as well as RAM.

4. Magnetic Disks
In a computer, the magnetic disks are circular plates that’s fabricated with plastic or metal with
a magnetized material. Two faces of a disk are frequently used, and many disks can be stacked
on a single spindle by read/write heads that are obtainable on every plane. The disks in a
computer jointly turn at high speed.

5. Magnetic Tape
Magnetic tape refers to a normal magnetic recording designed with a slender magnetizable
overlay that covers an extended, thin strip of plastic film. It is used mainly to back up huge
chunks of data. When a computer needs to access a strip, it will first mount it to access the
information. Once the information is allowed, it will then be unmounted. The actual access time
of a computer memory would be slower within a magnetic strip, and it will take a few minutes
for us to access a strip.

Inclusion, Coherence and Locality Properties:

In various fields of computer science and information systems, properties like Inclusion,
Coherence, and Locality are foundational principles that ensure data consistency, efficiency,
and logical organization. Understanding these properties is crucial for designing effective
algorithms, databases, distributed systems, and more.

Inclusion Property:

Definition:

The inclusion property ensures that elements of a subset or substructure are included within a
larger set or structure.

Applications:

 Set Theory:
o Example: For sets A= {1,2} and B={1,2,3,4}, A⊆B because all elements of A are in
B.

11
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

 File Systems:
o Example: In a Unix-like OS, if a directory grants read/write permissions to a user
group, all files within should inherit these permissions unless overridden.
 Hierarchical Data Models:
o Example: In an organizational structure, if an employee is part of the
"Engineering Department," they are also part of the "Technology Division" by
inclusion.

Key Points:

 Inclusion helps maintain the logical hierarchy.


 It ensures consistency within nested or hierarchical structures.
 Inclusion is crucial in access control, data hierarchy, and permissions management.

The inclusion property typically refers to the idea that all elements of a subset or substructure
should be included within a larger set or structure. In the context of databases or data models,
this could mean ensuring that a record or an entity is properly included in its respective group
or category.

For example, in hierarchical data models or set theory:

 If you have a set AAA that is a subset of set BBB, the inclusion property ensures that
every element of AAA is also an element of BBB. Mathematically, A⊆BA \subseteq
BA⊆B.

In the context of access control:

 Inclusion might mean that if a user has access to a higher-level category, they should
also have access to all subcategories within it.

Coherence Property:

Definition:

 The coherence property ensures consistency and logical alignment of data or processes
within a system, so that all elements work together harmoniously.

Applications:

 Cache Coherence in Distributed Systems:


o Example: In a multi-core processor system, cache coherence protocols ensure all
cores have consistent views of shared data.
 Database Integrity:

12
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

o Example: In relational databases, referential integrity ensures that a record in


one table correctly references a related record in another table.
 Software Design:
o Example: A coherent user interface in a software application ensures consistent
design and behavior across all screens, improving user experience.

Key Points:

 Coherence maintains logical consistency across related components.


 It prevents data conflicts, inconsistencies, and potential errors.
 Coherence is vital for system reliability, especially in distributed and multi-user
environments.

The coherence property is concerned with the consistency and logical alignment of data or
processes within a system. It ensures that the elements of a system work together in a
consistent and logically coherent manner.

In distributed systems or databases:

 Coherence might refer to cache coherence, where multiple copies of a cache must
remain consistent with each other.
 In data modeling, it ensures that the relationships between entities are logically sound
and that the data is consistent across the model.

In the context of logic and proof systems:

Coherence can refer to the logical consistency of a set of propositions or proofs. For instance,
all components of a proof or argument should support and not contradict each other.

Locality Property:

Definition: The locality property involves confining operations, computations, or data access to
local resources to reduce the need for extensive communication or data transfer across a
system.

Applications:

 Memory Management (Locality of Reference):


o Example: Programs often access the same memory locations repeatedly, so
caching frequently accessed data locally can greatly enhance performance.
 Distributed Computing:
o Example: In a distributed system, tasks should be executed close to the data they
operate on to minimize network latency and improve efficiency.
 Content Delivery Networks (CDNs):

13
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

o Example: CDNs cache content on servers geographically close to users, reducing


load times and bandwidth usage.

Key Points:

 Locality improves system performance by minimizing latency and reducing the load on
network resources.
 It is essential in memory management, distributed systems, and networking.
 Properly leveraging locality can significantly enhance user experience and system
efficiency.

The locality property relates to the idea that operations, computations, or data should be
confined to or make use of local resources or data whenever possible. This reduces the need for
extensive communication or data transfer across a system, thereby improving efficiency.

In computer science:

 Locality of reference refers to the tendency of a processor to access the same set of
memory locations repetitively over a short period. This is key to cache memory
performance.

In distributed systems or parallel computing:

 The locality property ensures that processes or data are kept close to where they are
needed, minimizing the latency and overhead associated with remote access or
communication.

In networking:

 Locality can refer to the principle that local communications (within the same network
or geographic area) should be optimized, reducing the load on broader network
infrastructure.

These properties are foundational in ensuring that systems are designed efficiently, logically,
and in a manner that maximizes performance and consistency.

Summary:

 Inclusion ensures that substructures are correctly represented within a larger structure,
maintaining hierarchy and integrity.
 Coherence ensures that different components of a system work together logically and
consistently, which is crucial for reliability.
 Locality optimizes system performance by keeping operations local, thereby reducing
unnecessary overhead and improving efficiency.

14
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Applications in Real-world Systems:

 Inclusion is widely used in hierarchical databases, access control systems, and nested
structures in programming.
 Coherence is critical in distributed computing, caching systems, and software design for
ensuring consistent operation across all components.
 Locality is leveraged in memory management, distributed computing, and content
delivery systems to enhance performance and reduce delays.

Cache Memory:

The data or contents of the main memory that are used frequently by CPU are stored in the cache
memory so that the processor can easily access that data in a shorter time. Whenever the CPU
needs to access memory, it first checks the cache memory. If the data is not found in cache
memory, then the CPU moves into the main memory.

Cache memory is placed between the CPU and the main memory. The block diagram for a cache
memory can be represented as:

The cache is the fastest component in the memory hierarchy and approaches the speed of CPU
components.

Cache memory is organised as distinct set of blocks where each set contains a small fixed
number of blocks.

15
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

As shown in the above sets are represented by the rows. The example contains N sets and each
set contains four blocks. Whenever an access is made to cache, the cache controller does not
search the entire cache in order to look for a match. Rather, the controller maps the address to
a particular set of the cache and therefore searches only the set for a match.

If a required block is not found in that set, the block is not present in the cache and cache
controller does not search it further. This kind of cache organization is called set associative
because the cache is divided into distinct sets of blocks. As each set contains four blocks the
cache is said to be four way set associative.

The basic operation of a cache memory is as follows:

 When the CPU needs to access memory, the cache is examined. If the word is found in
the cache, it is read from the fast memory.
 If the word addressed by the CPU is not found in the cache, the main memory is
accessed to read the word.
 A block of words one just accessed is then transferred from main memory to cache
memory. The block size may vary from one word (the one just accessed) to about 16
words adjacent to the one just accessed.
 The performance of the cache memory is frequently measured in terms of a quantity
called hit ratio.
 When the CPU refers to memory and finds the word in cache, it is said to produce a hit.
 If the word is not found in the cache, it is in main memory and it counts as a miss.
 The ratio of the number of hits divided by the total CPU references to memory (hits plus
misses) is the hit ratio.

Levels of memory:
Level 1
 It is a type of memory in which data is stored and accepted that are immediately stored
in CPU. Most commonly used register is accumulator, Program counter, address register
etc.
Level 2
 It is the fastest memory which has faster access time where data is temporarily stored
for faster access.
Level 3
 It is memory on which computer works currently. It is small in size and once power is off
data no longer stays in this memory.
Level 4
 It is external memory which is not as fast as main memory but data stays permanently in
this memory.

16
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Cache Mapping:
As we know that the cache memory bridges the mismatch of speed between the main memory
and the processor. Whenever a cache hit occurs,

 The word that is required is present in the memory of the cache.


 Then the required word would be delivered from the cache memory to the CPU.
And, whenever a cache miss occurs,

 The word that is required isn’t present in the memory of the cache.
 The page consists of the required word that we need to map from the main memory.
 We can perform such a type of mapping using various different techniques of cache
mapping.
Let us discuss different techniques of cache mapping in this article.

Process of Cache Mapping:


The process of cache mapping helps us define how a certain block that is present in the main
memory gets mapped to the memory of a cache in the case of any cache miss.

In simpler words, cache mapping refers to a technique using which we bring the main memory
into the cache memory. Here is a diagram that illustrates the actual process of mapping:

Important Note:

 The main memory gets divided into multiple partitions of equal size, known as the
frames or blocks.
 The cache memory is actually divided into various partitions of the same sizes as that of
the blocks, known as lines.
 The main memory block is copied simply to the cache during the process of cache
mapping, and this block isn’t brought at all from the main memory.

17
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Techniques of Cache Mapping;


One can perform the process of cache mapping using these three techniques given as follows:

1. Direct Mapping

2. Fully Associative Mapping

3. K-way Set Associative Mapping

1. Direct Mapping:
In the case of direct mapping, a certain block of the main memory would be able to map a
cache only up to a certain line of the cache. The total line numbers of cache to which any
distinct block can map are given by the following:

Cache line number = (Address of the Main Memory Block ) Modulo (Total number of lines in
Cache)
For example,

 Let us consider that particular cache memory is divided into a total of ‘n’ number of
lines.
 Then, the block ‘j’ of the main memory would be able to map to line number only of the
cache (j mod n).

18
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

The Need for Replacement Algorithm


In the case of direct mapping,

 There is no requirement for a replacement algorithm.


 It is because the block of the main memory would be able to map to a certain line of the
cache only.
 Thus, the incoming (new) block always happens to replace the block that already exists,
if any, in this certain line.
Division of Physical Address
In the case of direct mapping, the division of the physical address occurs as follows

Direct Mapping:

In direct mapping, the cache consists of normal high-speed random-access memory. Each
location in the cache holds the data, at a specific address in the cache. This address is given by
the lower significant bits of the main memory address. This enables the block to be selected
directly from the lower significant bit of the memory address. The remaining higher significant
bits of the address are stored in the cache with the data to complete the identification of the
cached data.

19
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

As shown in the above figure, the address from processor is divided into two field a tag and an
index.

The tag consists of the higher significant bits of the address and these bits are stored with the
data in cache. The index consists of the lower significant b of the address. Whenever the
memory is referenced, the following sequence of events occurs

1. The index is first used to access a word in the cache.


2. The tag stored in the accessed word is read.
3. This tag is then compared with the tag in the address.
4. If two tags are same this indicates cache hit and required data is read from the cache
word.
5. If the two tags are not same, this indicates a cache miss. Then the reference is made to
the main memory to find it.

For a memory read operation, the word is then transferred into the cache. It is possible to pass
the information to the cache and the process simultaneously.

In such a case, the main memory address consists of a tag, an index and a word within a line. All
the words within a line in the cache have the same stored tag

The index part in the address is used to access the cache and the stored tag is compared with
required tag address.

20
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

For a read operation, if the tags are same, the word within the block is selected for transfer to
the processor. If tags are not same, the block containing the required word is first transferred
to the cache. In direct mapping, the corresponding blocks with the same index in the main
memory will map into the same block in the cache, and hence only blocks with different indices
can be in the cache at the same time. It is important that all words in the cache must have
different indices. The tags may be the same or different.

2. Fully Associative Mapping:


In the case of fully associative mapping,

 The main memory block is capable of mapping to any given line of the cache that’s
available freely at that particular moment.
 It helps us make a fully associative mapping comparatively more flexible than direct
mapping.
For Example
Let us consider the scenario given as follows:

Here, we can see that,

 Every single line of cache is available freely.


 Thus, any main memory block can map to a line of the cache.
 In case all the cache lines are occupied, one of the blocks that exists already needs to be
replaced.
The Need for Replacement Algorithm:
In the case of fully associative mapping,

 The replacement algorithm is always required.

21
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

 The replacement algorithm suggests a block that is to be replaced whenever all the
cache lines happen to be occupied.
 So, replacement algorithms such as LRU Algorithm, FCFS Algorithm, etc., are employed.
Division of Physical Address:
In the case of fully associative mapping, the division of the physical address occurs as follows:

3. K-way Set Associative Mapping


In the case of k-way set associative mapping,

 The grouping of the cache lines occurs into various sets where all the sets consist of k
number of lines.
 Any given main memory block can map only to a particular cache set.
 However, within that very set, the block of memory can map any cache line that is freely
available.
 The cache set to which a certain main memory block can map is basically given as
follows:
Cache set number = (Block Address of the Main Memory) Modulo (Total Number of sets
present in the Cache)

For Example
Let us consider the example given as follows of a two-way set-associative mapping:

22
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

In this case,

 k = 2 would suggest that every set consists of two cache lines.


 Since the cache consists of 6 lines, the total number of sets that are present in the cache
= 6 / 2 = 3 sets.
 The block ‘j’ of the main memory is capable of mapping to the set number only (j mod 3)
of the cache.
 Here, within this very set, the block ‘j’ is capable of mapping to any cache line that is
freely available at that moment.
 In case all the available cache lines happen to be occupied, then one of the blocks that
already exist needs to be replaced.
The Need for Replacement Algorithm:
In the case of k-way set associative mapping,

 The k-way set associative mapping refers to a combination of the direct mapping as well
as the fully associative mapping.
 It makes use of the fully associative mapping that exists within each set.
 Therefore, the k-way set associative mapping needs a certain type of replacement
algorithm.
Division of Physical Address:
In the case of fully k-way set mapping, the division of the physical address occurs as follows:

Special Cases:

 In case k = 1, the k-way set associative mapping would become direct mapping. Thus,
Direct Mapping = one-way set associative mapping
 In the case of k = The total number of lines present in the cache, then the k-way set
associative mapping would become fully associative mapping.

Set Associative Mapping -

In set associative mapping a cache is divided into a set of blocks. The number of blocks in a set
is known as associatively or set size. Each block in each set has a stored tag. This tag together
with index completely identifies the block.

23
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Thus, set associative mapping allows a limited number of blocks, with the same index and
different tags. An example of four way set associative cache having four blocks in each set is
shown in the following figure

In this type of cache, the following steps are used to access the data from a cache:

1. The index of the address from the processor is used to access the set.
2. Then the comparators are used to compare all tags of the selected set with the incoming
tag.
3. If a match is found, the corresponding location is accessed.
4. If no match is found, an access is made to the main memory.

The tag address bits are always chosen to be the most significant bits of the full address, the
block address bits are the next significant bits and the word/byte address bits are the least
significant bits. The number of comparators required in the set associative cache is given by the
number of blocks in a set. The set can be selected quickly and all the blocks of the set can be
read out simultaneously with the tags before waiting for the tag comparisons to be made. After
a tag has been identified, the corresponding block can be selected.

Fully associative mapping:

In fully associative type of cache memory, each location in cache stores both memory address
as well as data.

24
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Whenever a data is requested, the incoming memory addresses a simultaneously compared


with all stored addresses using the internal logic the associative memory.

If a match is found, the corresponding is read out. Otherwise, the main memory is accessed if
address is not found in cache.

This method is known as fully associative mapping approach because cached data is related to
the main memory by storing both memory address and data in the cache. In all organisations,
data can be more than one word as shown in the following figure.

A line constitutes four words, each word being 4 bytes. In such case, the least significant part of
the address selects the particular byte, the next part selects the word, and the remaining bits
from the address. These address bits are compared to the address in the cache. The whole line

25
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

can be transferred to and from the cache in one transaction if there are sufficient data paths
between the main memory and the cache. With only one data word path, the words of the line
have to be transferred in separate transactions.

The main advantage of fully associative mapped cache is that it provides greatest flexibility of
holding combinations of blocks in the cache and conflict for a given cache.

It suffers from certain disadvantages:

1. It is expensive method because of the high cost of associative memory.


2. It requires a replacement algorithm in order to select a block to be removed whenever
cache miss occurs.
3. Such an algorithm must be implemented in hardware to maintain a high speed of
operation.

The fully associative mechanism is usually employed by microprocessors with small internal
cache.

Techniques for Reducing Cache Misses:

Reducing cache misses is crucial for improving the performance of programs, especially in
systems where memory access time is a significant bottleneck. Cache misses occur when the
data needed by the CPU is not found in the cache, requiring the data to be fetched from a
slower level of memory. Here are some techniques for reducing cache misses:

1. Exploiting Locality of Reference

 Temporal Locality: This principle suggests that if a data item is accessed once, it is likely
to be accessed again soon. To exploit temporal locality:
o Reusing Data: Write programs that reuse recently accessed data, so it stays in
the cache longer.
o Loop Fusion: Combine loops that access the same data to enhance temporal
locality.
 Spatial Locality: This principle indicates that if a data item is accessed, nearby data
items are likely to be accessed soon. To exploit spatial locality:
o Data Structuring: Organize data structures (like arrays) so that related data is
stored contiguously in memory.
o Loop Interchange: Change the nesting order of loops to access data in memory
sequentially, improving spatial locality.

2. Blocking (Tiling)

26
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

 Technique: Break down large problems (e.g., matrix multiplication) into smaller blocks
or tiles that fit into the cache. Process these smaller blocks fully before moving to the
next, ensuring that data loaded into the cache is reused multiple times.
 Example: In matrix multiplication, rather than multiplying entire rows by columns, the
matrices are divided into smaller submatrices (blocks), and these blocks are multiplied,
reducing cache misses by keeping a small part of the matrices in cache.

3. Loop Optimizations

 Loop Interchange: Change the order of nested loops to access data in a cache-friendly
manner, i.e., row-major order if the array is stored in row-major format.
o Example: Instead of accessing matrix elements column-wise (which can cause
cache misses due to the way data is stored in memory), access them row-wise.
 Loop Fusion (Jamming): Combine two adjacent loops that operate on the same data set
into a single loop to reduce the cache misses.
o Example: If two separate loops both iterate over the same array, fusing them
into one loop can ensure that data remains in cache between operations.
 Loop Unrolling: Increase the number of operations performed within a single iteration
of the loop, reducing the overhead of loop control and improving cache usage.
o Example: Instead of processing one array element per loop iteration, process
multiple elements, reducing the number of iterations and increasing data reuse.

4. Prefetching

 Hardware Prefetching: Modern processors often include hardware that automatically


prefetches data that it anticipates will be needed soon based on access patterns.
 Software Prefetching: Programmers can manually insert prefetch instructions in their
code to load data into the cache before it is actually needed.
o Example: In a loop, use the prefetch instruction to load data from memory into
the cache a few iterations ahead of when it will be used.

5. Data Alignment

 Align Data to Cache Line Boundaries: Aligning data structures to cache line boundaries
can reduce the number of cache lines required to store the data, thus reducing cache
misses.
o Example: Ensure that arrays or structures start at memory addresses that are
multiples of the cache line size, avoiding unnecessary cache line splits.

6. Reducing Cache Conflict Misses

 Cache Associativity: Increase the associativity of the cache if possible. Higher


associativity means that a single cache set can hold more blocks, reducing the likelihood
of conflict misses.

27
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

 Array Padding: Add padding to data structures to avoid conflict misses due to multiple
frequently accessed data items mapping to the same cache set.
o Example: In some cases, adding a small amount of padding between array
elements or structure members can prevent multiple elements from mapping to
the same cache line.

7. Optimizing Data Structures

 Use of Smaller Data Types: Smaller data types occupy less space in the cache,
potentially allowing more data to fit within the cache and reducing misses.
o Example: Use float instead of double if precision is not a concern, or use packed
data structures.
 Compact Data Structures: Use data structures that minimize memory overhead (like
tightly packed arrays or custom data structures) to maximize cache efficiency.
o Example: Replace a linked list (which has poor spatial locality due to pointer-
based storage) with a contiguous array or vector for better cache performance.

8. Cache-Conscious Data Layouts

 Structure of Arrays (SoA) vs. Array of Structures (AoS): Depending on access patterns,
using SoA instead of AoS can be more cache-friendly, particularly when you need to
process one field across many structures.
o Example: If you frequently access the x field of a list of 3D points (where each
point has x, y, z coordinates), storing all x values contiguously (SoA) might reduce
cache misses compared to interleaving x, y, z (AoS).

9. Reducing Working Set Size

 Optimize Algorithms: Choose or design algorithms that reduce the amount of data that
needs to be simultaneously in memory (working set), fitting more of the working set
into the cache.
o Example: Replace a brute-force algorithm that processes large datasets with a
more efficient algorithm that reduces the memory footprint.

10. Minimizing Cold Misses

 Initialize Data: Properly initialize data structures so that when they are first accessed,
they do not cause unnecessary cache misses.
 Use Warm-Up Phases: In some applications, a warm-up phase can be used to pre-load
important data into the cache before the actual computation starts.

11. Cache Partitioning

28
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

 Partitioning Cache: If the workload is known to access specific data patterns, cache
partitioning can be used to allocate certain parts of the cache to different data sets,
minimizing conflict misses.
o Example: In real-time systems, dedicating specific cache regions to critical tasks
can ensure consistent cache performance.

Virtual Memory Organization:


Virtual Memory (VM) Concept is similar to the Concept of Cache Memory. While Cache solves
the speed up requirements in memory access by CPU, Virtual Memory solves the Main Memory
(MM) Capacity requirements with a mapping association to Secondary Memory i. e Hard Disk.
Both Cache and Virtual Memory are based on the Principle of Locality of Reference. Virtual
Memory provides an illusion of unlimited memory being available to the Processes/
Programmers.

In a VM implementation, a process looks at the resources with a logical view and the CPU looks
at it from a Physical or real view of resources. Every program or process begins with its starting
address as ‘0’ (Logical view). However, there is only one real '0' address in Main Memory.
Further, at any instant, many processes reside in Main Memory (Physical view). A Memory
Management Hardware provides the mapping between logical and physical view.

VM is hardware implementation and assisted by OS’s Memory Management Task. The basic
facts of VM are:

 All memory references by a process are all logical and dynamically translated by
hardware into physical.

29
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

 There is no need for the whole program code or data to be present in Physical
memory and neither the data or program need to be present in contiguous locations
of Physical Main Memory. Similarly, every process may also be broken up into pieces
and loaded as necessitated.
 The storage in secondary memory need not be contiguous. (Remember your single
file may be stored in different sectors of the disk, which you may observe while
doing defrag).
 However, the Logical view is contiguous. Rests of the views are transparent to the
user.

Virtual Memory Design factors:

Any VM design has to address the following factors choosing the options available.

 Type of implementation – Segmentation, Paging, Segmentation with Paging


 Address Translation – Logical to Physical
 Address Translation Type – Static or Dynamic Translation
o Static Translation – Few simpler programs are loaded once and may be executed
many times. During the lifetime of these programs, nothing much changes and
hence the Address Space can be fixed.
o Dynamic Translation – Complex user programs and System programs use a
stack, queue, pointers, etc., which require growing spaces at run time. Space is
allotted as the requirement comes up. In such cases, Dynamic Address
Translation is used. In this chapter, we discuss only Dynamic Address Translation
Methods.
 A Page/Segment table to be maintained as to what is available in MM
 Identification of the Information in MM as a Hit or Page / Segment Fault
 Page/Segment Fault handling Mechanism
 Protection of pages/ Segments in Memory and violation identification
 Allocation / Replacement Strategy for Page/Segment in MM –Same as Cache Memory.
FIFO, LIFO, LRU and Random are few examples.
Segmentation:

A Segment is a logically related contiguous allocation of words in MM. Segments varies in


length. A segment corresponds to logical entities like a Program, stack, data, etc. A word in a

30
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

segment is addressed by specifying the base address of the segment and the offset within the
segment as in figure.

A segment table is required to be maintained with the details of those segments in MM and
their status. Figure shows typical entries in a segment table. A segment table resides in the OS
area in MM. The sharable part of a segment, i.e. with other programs/processes is created as a
separate segment and the access rights for the segment are set accordingly. Presence bit
indicates that the segment is available in MM. The Change bit indicates that the content of the
segment has been changed after it was loaded in MM and is not a copy of the Disk version.
Please recall in multilevel hierarchical memory, the lower level has to be in coherence with the
immediately higher level. The address translation in segmentation implementation is as shown
in figure. The virtual address generated by the program is required to be converted into a
physical address in MM. The segment table help achieve this translation.

31
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Generally, a Segment size coincides with the natural size of the program/data. Although this is
an advantage on many occasions, there are two problems to be addressed in this regard.

1. Identifying a contiguous area in MM for the required segment size is a complex process.
2. As we see chunks are identified and allotted as per requirement. There is a possibility
that there may be some gaps of memory in small chunks which are too small to be
allotted for a new segment. At the same time, the sum of such gaps may become huge
enough to be considered as undesirable. These gaps are called external fragmentation.
External fragments are cleared by a special process like Compaction by OS.

Paging:

Paging is another implementation of Virtual Memory. The logical storage is marked as Pages of
some size, say 4KB. The MM is viewed and numbered as page frames. Each page frame equals
the size of Pages. The Pages from the logical view are fitted into the empty Page Frames in MM.
This is synonymous to placing a book in a bookshelf. Also, the concept is similar to cache blocks
and their placement. Figure explains how two program’s pages are fitted in Page Frames in
MM. As you see, any page can get placed into any available Page Frame. Unallotted Page
Frames are shown in white.

This mapping is necessary to be maintained in a Page Table. The mapping is used during
address translation. Typically a page table contains virtual page address, corresponding physical
frame number where the page is stored, Presence bit, Change bit and Access rights (Refer
figure). This Page table is referred to check whether the desired Page is available in the MM.
The Page Table resides in a part of MM. Thus every Memory access requested by CPU will refer
memory twice – once to the page table and second time to get the data from accessed location.
This is called the Address Translation Process and is detailed in figure.

32
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Page size determination is an important factor to obtain Maximum Page Hits and Minimum
Thrashing. Thrashing is very costly in VM as it means getting data from Disk, which is 1000
times likely to be slower than MM.

In the Paging Mechanism, Page Frames of fixed size are allotted. There is a possibility that some
of the pages may have contents less than the page size, as we have in our printed books. This
causes unutilized space (fragment) in a page frame. By no means, this unutilized space is usable
for any other purpose. Since these fragments are inside the allotted Page Frame, it is
called Internal Fragmentation.

Additional Activities in Address Translation:

 In segmentation, the length of the segment mentioned in the segment table is


compared with the offset. If the Offset exceeds it is a Segment Violation and an error is
generated to this effect.
 The control bits are meant to be used during Address Translation.
o The presence bit is verified to know that the requested segment/page is
available in the MM.

33
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

o The Change bit indicates that the segment/page in main memory is not a true
copy of that in Disk; if this segment/page is a candidate for replacement, it is to
be written onto the disk before replacement. This logic is part of the Address
Translation mechanism.
o Segment/Page access rights are checked to verify any access violation. Ex: one
with Read-only attribute cannot be allowed access for WRITE, or so.
 The requested Segment/Page not in the respective Table, it means, it is not available in
MM and a Segment/Page Fault is generated. Subsequently what happens is,
o The OS takes over to READ the segment/page from DISK.
o A Segment needs to be allotted from the available free space in MM. If Paging,
an empty Page frame need to be identified.
o In case, the free space/Page frame is unavailable, Page Replacement algorithm
plays the role to identify the candidate Segment/Page Frame.
o The Data from Disk is written on to the MM
o The Segment /Page Table is updated with the necessary information that a new
block is available in MM

Translation Look-aside Buffer (TLB):

Every Virtual address Translation requires two memory references,

 once to read the segment/page table and


 once more to read the requested memory word.

TLB is a hardware functionality designed to speedup Page Table lookup by reducing one extra
access to MM. A TLB is a fully associative cache of the Page Table. The entries in TLB correspond
to the recently used translations. TLB is sometimes referred to as address cache. TLB is part of
the Memory Management Unit (MMU) and MMU is present in the CPU block.

TLB entries are similar to that of Page Table. With the inclusion of TLB, every virtual address is
initially checked in TLB for address translation. If it is a TLB Miss, then the page table in MM is
looked into. Thus, a TLB Miss does not cause Page fault. Page fault will be generated only if it is
a miss in the Page Table too but not otherwise. Since TLB is an associative address cache in CPU,
TLB hit provides the fastest possible address translation; Next best is the page hit in Page Table;
worst is the page fault.

Having discussed the various individual Address translation options, it is to be understood that
in a Multilevel Hierarchical Memory all the functional structures coexist. I. e. TLB, Page Tables,
Segment Tables, Cache (Multiple Levels), Main Memory and Disk. Page Tables can be many and
many levels too, in which case, few Page tables may reside in Disk. In this scenario, what is the
hierarchy of verification of tables for address translation and data service to the CPU?

34
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Address Translation verification sequence starts from the lowest level i.e.

TLB -> Segment / Page Table Level 1 -> Segment / Page Table Level n

Once the address is translated into a physical address, then the data is serviced to CPU. Three
possibilities exist depending on where the data is.

Case 1 - TLB or PT hit and also Cache Hit - Data returned from CPU to Cache

Case 2 - TLB or PT hit and Cache Miss - Data returned from MM to CPU and Cache

Case 3 - Page Fault - Data from disk loaded into a segment / page frame in MM; MM returns
data to CPU and Cache

It is simple, in case of Page hit either Cache or MM provides the Data to CPU readily. The
protocol between Cache and MM exists intact. If it is a Segment/Page fault, then the routine is
handled by OS to load the required data into Main Memory. In this case, data is not in the
cache too. Therefore, while returning data to CPU, the cache is updated treating it as a case of
Cache Miss.

Advantages of Virtual Memory:

 Generality - ability to run programs that are larger than the size of physical memory.
 Storage management - allocation/deallocation either by Segmentation or Paging
mechanisms.
 Protection - regions of the address space in MM can selectively be marked as Read Only,
Execute
 Flexibility - portions of a program can be placed anywhere in Main Memory without
relocation
 Storage efficiency -retain only the most important portions of the program in memory

35
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

 Concurrent I/O -execute other processes while loading/dumping page. This increases
the overall performance
 Expandability - Programs/processes can grow in virtual address space.
 Seamless and better Performance for users.

Mapping and Management Techniques:

Mapping Functions:
The mapping functions are used to map a particular block of main memory to a particular block
of cache. This mapping function is used to transfer the block from main memory to cache
memory. Three different mapping functions are available:

Direct mapping:
A particular block of main memory can be brought to a particular block of cache memory. So, it
is not flexible.

Associative mapping:
In this mapping function, any block of Main memory can potentially reside in any cache block
position. This is much more flexible mapping method.

Block-set-associative mapping:
In this method, blocks of cache are grouped into sets, and the mapping allows a block of main
memory to reside in any block of a specific set. From the flexibility point of view, it is in
between to the other two methods.

All these three mapping methods are explained with the help of an example.
Consider a cache of 4096 (4K) words with a block size of 32 words. Therefore, the cache is
organized as 128 blocks. For 4K words, required address lines are 12 bits. To select one of the
block out of 128 blocks, we need 7 bits of address lines and to select one word out of 32 words,
we need 5 bits of address lines. So the total 12 bits of address is divided for two groups, lower 5
bits are used to select a word within a block, and higher 7 bits of address are used to select any
block of cache memory.

36
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Let us consider a main memory system consisting 64K words. The size of address bus is 16 bits.
Since the block size of cache is 32 words, so the main memory is also organized as block size of
32 words. Therefore, the total number of blocks in main memory is 2048 (2K x 32 words = 64K
words). To identify any one block of 2K blocks, we need 11 address lines. Out of 16 address
lines of main memory, lower 5 bits are used to select a word within a block and higher 11 bits
are used to select a block out of 2048 blocks.

Number of blocks in cache memory is 128 and number of blocks in main memory is 2048, so at
any instant of time only 128 blocks out of 2048 blocks can reside in cache memory. Therefore,
we need mapping function to put a particular block of main memory into appropriate block of
cache memory.

Direct Mapping Technique:


The simplest way of associating main memory blocks with cache block is the direct mapping
technique. In this technique, block k of main memory maps into block k modulo m of the cache,
where m is the total number of blocks in cache. In this example, the value of m is 128. In direct
mapping technique, one particular block of main memory can be transferred to a particular
block of cache which is derived by the modulo function.

Since more than one main memory block is mapped onto a given cache block position,
contention may arise for that position. This situation may occur even when the cache is not full.
Contention is resolved by allowing the new block to overwrite the currently resident block. So
the replacement algorithm is trivial.

The detail operation of direct mapping technique is as follows:


The main memory address is divided into three fields. The field size depends on the memory
capacity and the block size of cache. In this example, the lower 5 bits of address is used to
identify a word within a block. Next 7 bits are used to select a block out of 128 blocks (which is
the capacity of the cache). The remaining 4 bits are used as a TAG to identify the proper block
of main memory that is mapped to cache.

37
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

When a new block is first brought into the cache, the high order 4 bits of the main memory
address are stored in four TAG bits associated with its location in the cache. When the CPU
generates a memory request, the 7-bit block address determines the corresponding cache
block. The TAG field of that block is compared to the TAG field of the address. If they match, the
desired word specified by the low-order 5 bits of the address is in that block of the cache.

If there is no match, the required word must be accessed from the main memory, that is, the
contents of that block of the cache is replaced by the new block that is specified by the new
address generated by the CPU and correspondingly the TAG bit will also be changed by the high
order 4 bits of the address. The whole arrangement for direct mapping technique is shown in
the figure below.

Figure: Direct-mapping cache:


Associated Mapping Technique:
In the associative mapping technique, a main memory block can potentially reside in any cache
block position. In this case, the main memory address is divided into two groups, low-order bits
identify the location of a word within a block and high-order bits identify the block. In the
example here, 11 bits are required to identify a main memory block when it is resident in the
cache , high-order 11 bits are used as TAG bits and low-order 5 bits are used to identify a word

38
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

within a block. The TAG bits of an address received from the CPU must be compared to the TAG
bits of each block of the cache to see if the desired block is present.

In the associative mapping, any block of main memory can go to any block of cache, so it has
got the complete flexibility and we have to use proper replacement policy to replace a block
from cache if the currently accessed block of main memory is not present in cache. It might not
be practical to use this complete flexibility of associative mapping technique due to searching
overhead, because the TAG field of main memory address has to be compared with the TAG
field of the entire cache block. In this example, there are 128 blocks in cache and the size of
TAG is 11 bits. The whole arrangement of Associative Mapping Technique is shown in the figure
below.

Figure : Associated Mapping Cache


Block-Set-Associative Mapping Technique:
This mapping technique is intermediate to the previous two techniques. Blocks of the cache are
grouped into sets, and the mapping allows a block of main memory to reside in any block of a
specific set. Therefore, the flexibility of associative mapping is reduced from full freedom to a
set of specific blocks. This also reduces the searching overhead, because the search is restricted
to number of sets, instead of number of blocks. Also the contention problem of the direct
mapping is eased by having a few choices for block replacement.

39
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

Consider the same cache memory and main memory organization of the previous example.
Organize the cache with 4 blocks in each set. The TAG field of associative mapping technique is
divided into two groups, one is termed as SET bit and the second one is termed as TAG bit. Each
set contains 4 blocks, total number of set is 32. The main memory address is grouped into three
parts: low-order 5 bits are used to identifies a word within a block. Since there are total 32 sets
present, next 5 bits are used to identify the set. High-order 6 bits are used as TAG bits.

The 5-bit set field of the address determines which set of the cache might contain the desired
block. This is similar to direct mapping technique, in case of direct mapping, it looks for block,
but in case of block-set-associative mapping, it looks for set. The TAG field of the address must
then be compared with the TAGs of the four blocks of that set. If a match occurs, then the block
is present in the cache; otherwise the block containing the addressed word must be brought to
the cache. This block will potentially come to the corresponding set only.

Since, there are four blocks in the set, we have to choose appropriately which block to be
replaced if all the blocks are occupied. Since the search is restricted to four block only, so the
searching complexity is reduced. The whole arrangement of block-set-associative mapping
technique is shown in the figure below.

Figure : Block-set Associated mapping Cache with 4 blocks per set

40
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

It is clear that if we increase the number of blocks per set, then the number of bits in SET field is
reduced. Due to the increase of blocks per set, complexity of search is also increased. The
extreme condition of 128 blocks per set requires no set bits and corresponds to the fully
associative mapping technique with 11 TAG bits. The other extreme of one block per set is the
direct mapping method.

Replacement Algorithms:
When a new block must be brought into the cache and all the positions that it may occupy are
full, a decision must be made as to which of the old blocks is to be overwritten. In general, a
policy is required to keep the block in cache when they are likely to be referenced in near
future. However, it is not easy to determine directly which of the block in the cache are about
to be referenced. The property of locality of reference gives some clue to design good
replacement policy.

Least Recently Used (LRU) Replacement policy:


Since program usually stay in localized areas for reasonable periods of time, it can be assumed
that there is a high probability that blocks which have been referenced recently will also be
referenced in the near future. Therefore, when a block is to be overwritten, it is a good decision
to overwrite the one that has gone for longest time without being referenced. This is defined as
the least recently used (LRU) block. Keeping track of LRU block must be done as computation
proceeds.

Consider a specific example of a four-block set. It is required to track the LRU block of this four-
block set. A 2-bit counter may be used for each block.

When a hit occurs, that is, when a read request is received for a word that is in the cache, the
counter of the block that is referenced is set to 0. All counters which values originally lower
than the referenced one are incremented by 1 and all other counters remain unchanged.

When a miss occurs, that is, when a read request is received for a word and the word is not
present in the cache, we have to bring the block to cache.

41
Lecture Notes on Computer System Architecture by Dr. P. R. Tripathy

There are two possibilities in case of a miss:

1. If the set is not full, the counter associated with the new block loaded from the main

memory is set to 0, and the values of all other counters are incremented by 1.

2. If the set is full and a miss occurs, the block with the counter value 3 is removed , and the

new block is put in its place. The counter value is set to zero. The other three block

counters are incremented by 1.


It is easy to verify that the counter values of occupied blocks are always distinct. Also it is trivial
that highest counter value indicates least recently used block.

First In First Out (FIFO) replacement policy:


A reasonable rule may be to remove the oldest from a full set when a new block must be
brought in. While using this technique, no updation is required when a hit occurs.

When a miss occurs and the set is not full, the new block is put into an empty block and the
counter values of the occupied block will be increment by one.

When a miss occurs and the set is full, the block with highest counter value is replaced by new
block and counter is set to 0, counter value of all other blocks of that set is incremented by 1.
The overhead of the policy is less, since no updation is required during hit.

Random replacement policy:


The simplest algorithm is to choose the block to be overwritten at random. Interestingly
enough, this simple algorithm has been found to be very effective in practice.

42

You might also like