Lecturer Notes Cte 214 Computer Architecture
Lecturer Notes Cte 214 Computer Architecture
CTE 214
GENERAL OBJECTIVES:
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-
sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number
of bits in a word (the word size, word width, or word length) is an important characteristic of any specific
processor design or computer architecture.
The size of a word is reflected in many aspects of a computer's structure and operation; the majority of
the registers in a processor are usually word sized and the largest piece of data that can be transferred to and
from the working memory in a single operation is a word in many (not all) architectures. The largest
possible address size, used to designate a location in memory, is typically a hardware word (here, "hardware
word" means the full-sized natural word of the processor, as opposed to any other definition used).
The size of a word can sometimes differ from the expected due to backward compatibility with earlier
computers. If multiple compatible variations or a family of processors share a common architecture and
instruction set but differ in their word sizes, their documentation and software may become notationally
complex to accommodate the difference.
Uses of words
Holders for fixed point, usually integer, numerical values may be available in one or in several different
sizes, but one of the sizes available will almost always be the word. The other sizes, if any, are likely to be
multiples or fractions of the word size. The smaller sizes are normally used only for efficient use of
memory; when loaded into the processor, their values usually go into a larger, word sized holder.
Addresses
Holders for memory addresses must be of a size capable of expressing the needed range of values but not be
excessively large, so often the size used is the word though it can also be a multiple or fraction of the word
size.
Registers
Processor registers are designed with a size appropriate for the type of data they hold, e.g. integers, floating
point numbers or addresses. Many computer architectures use "general purpose registers" that can hold any
of several types of data, these registers must be sized to hold the largest of the types, historically this is the
word size of the architecture though increasingly special purpose, larger, registers have been added to deal
with newer types.
Memory-processor transfer
When the processor reads from the memory subsystem into a register or writes a register's value to memory,
the amount of data transferred is often a word. In simple memory subsystems, the word is transferred over
the memory data bus, which typically has a width of a word or half-word. In memory subsystems that
use caches, the word-sized transfer is the one between the processor and the first level of cache; at lower
levels of the memory hierarchy larger transfers (which are a multiple of the word size) are normally used.
In a given architecture, successive address values designate successive units of memory; this unit is the unit
of address resolution. In most computers, the unit is either a character (e.g. a byte) or a word. (A few
computers have used bit resolution.) If the unit is a word, then a larger amount of memory can be accessed
using an address of a given size at the cost of added complexity to access individual characters. On the other
hand, if the unit is a byte, then individual characters can be addressed (i.e. selected during the memory
operation).
Instructions
Machine instructions are normally the size of the architecture's word, such as in RISC architectures, or a
multiple of the "char" size that is a fraction of it. This is a natural choice since instructions and data usually
share the same memory subsystem. In Harvard architectures the word sizes of instructions and data need not
be related, as instructions and data are stored in different memories; for example, the processor in the 1ESS
electronic telephone switch had 37-bit instructions and 23-bit data words.
As computer designs have grown more complex, the central importance of a single word size to an
architecture has decreased. Although more capable hardware can use a wider variety of sizes of data, market
forces exert pressure to maintain backward compatibility while extending processor capability. As a result,
what might have been the central word size in a fresh design has to coexist as an alternative size to the
original word size in a backward compatible design. The original word size remains available in future
designs, forming the basis of a size family.
2
In the mid-1970s, DEC designed the VAX to be a successor of the PDP-11. They used word for a 16-bit
quantity, while longword referred to a 32-bit quantity. This was in contrast to earlier machines, where the
natural unit of addressing memory would be called a word, while a quantity that is one half a word would be
called a halfword. In fitting with this scheme, a VAX quadword is 64 bits.
Another example is the x86 family, of which processors of three different word lengths (16-bit, later 32- and
64-bit) have been released. As software is routinely ported from one word-length to the next, some APIs and
documentation define or refer to an older (and thus shorter) word-length than the full word length on the
CPU that software may be compiled for. Also, similar to how bytes are used for small numbers in many
programs, a shorter word (16 or 32 bits) may be used in contexts where the range of a wider word is not
needed (especially where this can save considerable stack space or cache memory space). In general, new
processors must use the same data word lengths and virtual address widths as an older processor to
have binary compatibility with that older processor.
Often carefully written source code – written with source code compatibility and software portability in
mind – can be recompiled to run on a variety of processors, even ones with different data word lengths or
different address widths or both.
Von-Neumann Model
Von-Neumann proposed his computer architecture design in 1945 which was later known as Von-Neumann
Architecture. It consisted of a Control Unit, Arithmetic, and Logical Memory Unit (ALU), Registers and
Inputs/Outputs.
Von Neumann architecture is based on the stored-program computer concept, where instruction data and
program data are stored in the same memory. This design is still used in most computers produced today.
The Arithmetic and Logic Unit (ALU) performs the required micro-operations for executing the instructions.
In simple words, ALU allows arithmetic (add, subtract, etc.) and logic (AND, OR, NOT, etc.) operations to
be carried out.
Control Unit
The Control Unit of a computer system controls the operations of components like ALU, memory and
input/output devices. The Control Unit consists of a program counter that contains the address of the
instructions to be fetched and an instruction register into which instructions are fetched from memory for
execution.
Registers
Registers refer to high-speed storage areas in the CPU. The data processed by the CPU are fetched from the
registers.
The following is the list of registers that plays a crucial role in data processing.
Registers Description
MAR (Memory Address This register holds the memory location of the data that needs to be accessed.
Register)
MDR (Memory Data Register) This register holds the data that is being transferred to or from memory.
AC (Accumulator) This register holds the intermediate arithmetic and logic results.
PC (Program Counter) This register contains the address of the next instruction to be executed.
CIR (Current Instruction This register contains the current instruction during processing.
Register)
Buses
Buses are the means by which information is shared between the registers in a multiple-register
configuration system. A bus structure consists of a set of common lines, one for each bit of a register,
through which binary information is transferred one at a time. Control signals determine which register is
selected by the bus during each particular register transfer.
Von-Neumann Architecture comprised of three major bus systems for data transfer.
4
Bus Description
Address Bus Address Bus carries the address of data (but not the data) between the
processor and the memory.
Data Bus Data Bus carries data between the processor, the memory unit and the
input/output devices.
Memory Unit
A memory unit is a collection of storage cells together with associated circuits needed to transfer
information in and out of the storage. The memory stores binary information in groups of bits called words.
The internal structure of a memory unit is specified by the number of words it contains and the number of
bits in each word.
The General Purpose Computer System is the modified version of the Von-Neumann Architecture. In simple
words, we can say that a general purpose computer system is a modern day architectural representation of
Computer System.
The CPU (Central Processing Unit) consists of the ALU (Arithmetic and Logic Unit), Control Unit and
various processor registers. The CPU, Memory Unit and I/O subsystems are interconnected by the system
bus which includes data, address, and control-status lines. The following image shows how CPU, Memory
Unit and I/O subsystems are connected through common single bus architecture.
CHAPTER TWO
MEMORY ORGANIZATION OF COMPUTER SYSTEM
5
Introduction
In computer architecture, a bus is a collection of wires, chips, and slots inside the computer through which
data are transmitted from one part of the computer to another to and from (in and out) from peripheral
devices. It is often compared to a highway (pathways in the computer on which data travels). It is a set of
parallel distinct wires, serving different purposes, which allow devices attached to it to communicate with
the CPU.
The functions of the bus are:
Control bus
Address bus
Data bus
Control Bus
The control bus carries the control signal. The control signal is used for controlling and coordinating the
various activities across the computer. It is generated from the control unit within the CPU. Different
architectures result in a differing number of lines of wires within the control bus, as each line is used to
perform a specific task. For instance, different specific lines are used for each of the read, write, and reset
requests. These are not a group of lines like address bus and data bus, but individual lines that provide a
pulse to indicate a microprocessor operation. The control unit generates a specific control signal for every
operation, such as memory read or input/ output operation. This signal is also used to identify a device type,
with which the microprocessor intends to communicate.
Address Bus
The address bus carries a memory address within the device. It allows the CPU to reference memory
locations within the device. It connects the CPU and other peripherals and carries only memory addresses. In
a computer system, each peripheral or memory location is identified by a numerical value, called an address
and the address bus is used to carry this numerical value as well as it also contains a few control lines to
carry control commands. The address bus is unidirectional, bits flow in one direction, from the processor to
a peripheral or a memory location. The address bus contains the connections between the processor and
memory that carry the signals relating to the addresses which the CPU is processing at the time, such as the
locations that the CPU is reading from or writing to. The processor uses the address bus to perform,
identifying a peripheral or a memory location.
If the address bus could carry 8 bits at a time, the CPU could address only (i.e. 2^8) 256 bytes of RAM.
Most of the early PCs had 20-bit address buses, so CPU could address 2^20 bytes (1 MB) of data. Today
with 32-bit address busses CPU can address 4 GB (over billion bytes) of RAM. Wider the bus path, more
information can be processed at a time, and hence it also affects the processing speed of a computer.
Data Bus
Data bus transfers data from one location to another across the computer. The meaningful data which is to
be sent or retrieved from a device is placed on these lines. The CPU uses a data bus to transfer data. It may
be a 16-bit or 32-bit data bus. It is an electrical path that connects the CPU, memory, and other hardware
devices on the motherboard. These lines are bidirectional, data flow in both directions between the processor
and memory and peripheral devices.
The number of wires in the bus affects the speed at which data can travel between hardware components just
as the number of lanes on a highway affects the time it takes people to reach their destination. Each wire can
6
transfer 1 bit of data at a time, an 8 wire bus can move 8-bit at a time, which is 1-byte data at a time. A 16-
bit bus can transfer 2 bytes. 32-bit can transfer 4 bytes etc. Intel 80286 microprocessor used 16 bits of the
data bus. Intel 80386 used a 32-bit data bus. As the data bus width grows larger, more data can be
transferred.
The transmission of the data on bus lines takes place between approximately 1M baud for the
microcomputer to about 1000 M
baud or more for the large more
expensive computers. (1
baud = 1 bit/sec. )
Communication between the
different units of the processing
system is carried out along the
address and data bus and also along
various control lines. All
control operations are governed
by the master timing source and
clock (Quartz crystal oscillator).
The term memory can be defined as a collection of data in a specific format. It is used to store instructions
and process data. The memory comprises a large array or group of words or bytes, each with its own
location. The primary purpose of a computer system is to execute programs. These programs, along with the
information they access, should be in the main memory during execution. The CPU fetches instructions
from memory according to the value of the program counter.
To achieve a degree of multiprogramming and proper utilization of memory, memory management is
important. Many memory management methods exist, reflecting various approaches, and the effectiveness
of each algorithm depends on the situation.
7
Importance of Memory Management
Allocate and de-allocate memory before and after process execution.
To keep track of used memory space by processes.
To minimize fragmentation issues.
To proper utilization of main memory.
To maintain data integrity while executing process.
Memory Management is the process of controlling and coordinating computer memory, assigning portions
known as blocks to various running programs to optimize the overall performance of the system.
It is the most important function of an operating system that manages primary memory. It helps processes to
move back and forward between the main memory and execution disk. It helps OS to keep track of every
memory location, irrespective of whether it is allocated to some process or it remains free.
It allows you to check how much memory needs to be allocated to processes that decide which
processor should get memory at what time.
Tracks whenever inventory gets freed or unallocated. According to it will update the status.
It allocates the space to application routines.
It also make sure that these applications do not interfere with each other.
Helps protect different processes from each other
It places the programs in memory so that memory is utilized to its full extent.
The memory management techniques can be classified into following main categories:
The main memory is a combination of two main portions- one for the
operating system and other for the user program. We can implement/achieve
contiguous memory allocation by dividing the memory partitions into fixed
size partitions.
8
Non-Contiguous memory allocation is basically a method on the contrary to contiguous allocation
method, allocates the memory space present in different locations to the process as per it’s requirements.
As all the available memory space is in a distributed pattern so the freely available memory space is also
scattered here and there. This technique of memory allocation helps to reduce the wastage of memory,
which eventually gives rise to Internal and external fragmentation.
Overhead is minimum as not much address More Overheads are there as there are more
4.
translations are there while executing a process. address translations.
It is of five types:
It is of two types: 1. Paging
9. 2. Multilevel Paging
1. Fixed(or static) partitioning
3. Inverted Paging
2. Dynamic partitioning
4. Segmentation
5. Segmented Paging
10. It could be visualized and implemented using It could be implemented using Linked Lists.
9
S.NO. Contiguous Memory Allocation Non-Contiguous Memory Allocation
Arrays.
Simple to implement.
Easy to manage and design.
In a Single contiguous memory management scheme, once a process is loaded, it is given full
processor's time, and no other processor will interrupt it.
Wastage of memory space due to unused memory as the process is unlikely to use all the
available memory space.
The CPU remains idle, waiting for the disk to load the binary image into the main memory.
It can not be executed if the program is too large to fit the entire available main memory space.
It does not support multiprogramming, i.e., it cannot handle multiple programs simultaneously.
Multiple Partitioning
The single Contiguous memory management scheme is inefficient as it limits computers to execute only one
program at a time resulting in wastage in memory space and CPU time. The problem of inefficient CPU use
can be overcome using multiprogramming that allows more than one program to run concurrently. To switch
between two processes, the operating systems need to load both processes into the main memory. The
operating system needs to divide the available main memory into multiple parts to load multiple processes
into the main memory. Thus multiple processes can reside in the main memory simultaneously.
10
The multiple partitioning schemes can be of two types:
Fixed Partitioning
Dynamic Partitioning
Fixed Partitioning: The main memory is divided into several fixed-sized partitions in a fixed partition
memory management scheme or static partitioning. These partitions can be of the same size or different
sizes. Each partition can hold a single process. The number of partitions determines the degree of
multiprogramming, i.e., the maximum number of processes in memory. These partitions are made at the
time of system generation and remain fixed after that.
Simple to implement.
Easy to manage and design.
Dynamic Partitioning: The dynamic partitioning was designed to overcome the problems of a fixed
partitioning scheme. In a dynamic partitioning scheme, each process occupies only as much memory as they
require when loaded for processing. Requested processes are allocated memory until the entire physical
memory is exhausted or the remaining space is insufficient to hold the requesting process. In this scheme the
partitions used are of variable size, and the number of partitions is not defined at the system generation time.
Simple to implement.
Easy to manage and design.
In a Non-Contiguous memory management scheme, the program is divided into different blocks and loaded
at different portions of the memory that need not necessarily be adjacent to one another. This scheme can be
classified depending upon the size of blocks and whether the blocks reside in the main memory or not.
Paging: Paging is a technique that eliminates the requirements of contiguous allocation of main memory. In
this, the main memory is divided into fixed-size blocks of physical memory called frames. The size of a
frame should be kept the same as that of a page to maximize the main memory and avoid external
fragmentation.
Advantages of paging:
What is Swapping?
Benefits of Swapping
What is Fragmentation?
Processes are stored and removed from memory, which creates free memory space, which are too small to
use by other processes.
After sometimes, that processes not able to allocate to memory blocks because its small size and memory
blocks always remain unused is called fragmentation. This type of problem happens during a dynamic
memory allocation system when free blocks are quite small, so it is not able to fulfill any request.
12
CACHE MEMORY IN COMPUTER ORGANIZATION
Levels of Memory
Level 1 or Register: It is a type of memory in which data is stored and accepted that are immediately
stored in the CPU. The most commonly used register is Accumulator, Program counter, Address
Register, etc.
Level 2 or Cache memory: It is the fastest memory that has faster access time where data is temporarily
stored for faster access.
Level 3 or Main Memory: It is the memory on which the computer works currently. It is small in size
and once power is off data no longer stays in this memory.
Level 4 or Secondary Memory: It is external memory that is not as fast as the main memory but data
stays permanently in this memory.
Cache Performance
When the processor needs to read or write a location in the main memory, it first checks for a corresponding
entry in the cache.
If the processor finds that the memory location is in the cache, a Cache Hit has occurred and data is
read from the cache.
If the processor does not find the memory location in the cache, a cache miss has occurred. For a
cache miss, the cache allocates a new entry and copies in data from the main memory, and then the
request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.
Hit Ratio(H) = hit / (hit + miss) = no. of hits/total accesses
Miss Ratio = miss / (hit + miss) = no. of miss/total accesses = 1 - hit ratio(H)
We can improve Cache performance using higher cache block size, and higher associativity, reduce miss
rate, reduce miss penalty, and reduce the time to hit in the cache.
Cache Mapping
There are three different types of mapping used for the purpose of cache memory which is as follows:
Direct Mapping
Associative Mapping
13
Set-Associative Mapping
1. Direct Mapping
The simplest technique, known as direct mapping, maps
each block of main memory into only one possible
cache line. or In Direct mapping, assign each memory
block to a specific line in the cache. If a line is
previously taken up by a memory block when a new
block needs to be loaded, the old block is trashed. An
address space is split into two parts index field and a tag
field. The cache is used to store the tag field whereas the
rest is stored in the main memory. Direct mapping`s
performance is directly proportional to the Hit ratio.
i = j modulo m
where
i = cache line number
j = main memory block number
m = number of lines in the cache
For purposes of cache access, each main
memory address can be viewed as
consisting of three fields. The least
significant w bits identify a unique word or
byte within a block of main memory. In
most contemporary machines, the address is
s
at the byte level. The remaining s bits specify one of the 2 blocks of main memory. The cache logic
interprets these s bits as a tag of s-r bits (the most significant portion) and a line field of r bits. This latter
field identifies one of the m=2r lines of the cache. Line offset is index bits in the direct mapping.
2. Associative Mapping
14
2. Set-Associative Mapping
This form of mapping is an enhanced form of direct
mapping where the drawbacks of direct mapping are
removed. Set associative addresses the problem of
possible thrashing in the direct mapping method. It
does this by saying that instead of having exactly one
line that a block can map to in the cache; we will group a few lines together creating a set. Then a block in
memory can map to any one of the lines of a specific set. Set-associative mapping allows each word that is
present in the cache can have two or more words in the main memory for the same index address. Set
associative cache mapping combines the best of direct and associative cache mapping techniques. In set
associative mapping the index bits are given by the set offset bits. In this case, the cache consists of a
number of sets, each of which consists of a number of lines.
Relationships in the Set-Associative Mapping can be defined as:
m=v*k
i= j mod v
where
i = cache set number
j = main memory block number
v = number of sets
m = number of lines in the cache number of sets
k = number of lines in each set
.
Application of Cache Memory
Here are some of the applications of Cache Memory.
1. Primary Cache: A primary cache is always located on the processor chip. This cache is small and its
access time is comparable to that of processor registers.
2. Secondary Cache: Secondary cache is placed between the primary cache and the rest of the memory. It
is referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed on the processor chip.
3. Spatial Locality of Reference: Spatial Locality of Reference says that there is a chance that the element
will be present in close proximity to the reference point and next time if again searched then more close
proximity to the point of reference.
4. Temporal Locality of Reference: Temporal Locality of Reference uses the Least recently used
algorithm will be used. Whenever there is page fault occurs within a word will not only load the word in
the main memory but the complete page fault will be loaded because the spatial locality of reference rule
says that if you are referring to any word next word will be referred to in its register that’s why we load
complete page table so the complete block will be loaded.
15
Data is stored on a temporary basis in Cache Memory.
Whenever the system is turned off, data and instructions stored in cache memory get destroyed.
The high cost of cache memory increases the price of the Computer System.
MICROCONTROLLER
Data width 8-bit microcontroller has a data width 16-bit microcontroller has a data width equal
equal to 8-bits. Hence, its CPU can to 16-bits, which means its CPU can process
process only 8-bit in parallel. 16-bits of data in parallel.
Processing 8-bit microcontrollers have low 16-bit microcontrollers have high processing
16
power processing power; hence they can power; thus, they are capable of handling
handle only simple instructions. complex instructions.
Memory 8-bit microcontroller has low memory 16-bit microcontroller has high storage
capacity capacity. capacity.
Clock speed 8-bit microcontrollers have low clock 16-bit microcontrollers have higher clock
speeds. Hence, they process data speeds, allowing them to process data faster.
slowly.
Programming 8-bit microcontrollers are generally 16-bit microcontrollers are programmed using
language programmed using low-level assembly high-level languages such as C, C++.
language and high-level C language.
Power 8-bit microcontrollers consume less 16-bit microcontrollers consume more power
consumption power. than 8-bit microcontrollers.
Instruction set 8-bit microcontrollers have simple 16-bit microcontrollers have complex
instruction sets, making them suitable instruction sets. Hence, they are suitable to
to execute simple operations. execute complex operations.
Efficiency 8-bit microcontrollers have low 16-bit microcontrollers are more efficient than
efficiency. 8-bit microcontrollers.
Suitability 8-bit microcontrollers are suitable for 16-bit microcontrollers are suitable for
simple and small size applications. complex and large size applications.
Number of I/O 8-bit microcontrollers support lesser 16-bit microcontrollers support more I/O
number of I/O peripherals. peripherals.
Price 8-bit microcontrollers are less 16-bit microcontrollers are more expensive.
expensive.
Applications 8-bit microcontrollers are mainly used 16-bit microcontrollers are mainly used in
in simple applications like home and complex applications like in industrial
office appliances, medical automation, robotics, control systems,
instruments, toys, etc. automobiles, telecommunication systems, etc.
17
In computing, there are two types of processors existing, i.e., 32-bit and 64-bit processors. These types of
processors tell us how much memory a processor can access from a CPU register. For instance,
A 32-bit system can access 232 different memory addresses, i.e 4 GB of RAM or physical memory ideally, it
can access more than 4 GB of RAM also.
A 64-bit system can access 264 different memory addresses, i.e actually 18-Quintillion bytes of RAM. In
short, any amount of memory greater than 4 GB can be easily handled by it.
Most computers made in the 1990s and early 2000s were 32-bit machines. The CPU register stores memory
addresses, which is how the processor accesses data from RAM. One bit in the register can reference an
individual byte in memory, so a 32-bit system can address a maximum of 4 GB (4,294,967,296 bytes) of
RAM. The actual limit is often less than around 3.5 GB since part of the register is used to store other
temporary values besides memory addresses. Most computers released over the past two decades were built
on a 32-bit architecture, hence most operating systems were designed to run on a 32-bit processor.
A 64-bit register can theoretically reference 18,446,744,073,709,551,616 bytes, or 17,179,869,184 GB (16
exabytes) of memory. This is several million times more than an average workstation would need to access.
What’s important is that a 64-bit computer (which means it has a 64-bit processor) can access more than 4
GB of RAM. If a computer has 8 GB of RAM, it better has a 64-bit processor. Otherwise, at least 4 GB of
the memory will be inaccessible by the CPU.
A major difference between 32-bit processors and 64-bit processors is the number of calculations per
second they can perform, which affects the speed at which they can complete tasks. 64-bit processors can
come in dual-core, quad-core, six-core, and eight-core versions for home computing. Multiple cores allow
for an increased number of calculations per second that can be performed, which can increase the processing
power and help make a computer run faster. Software programs that require many calculations to function
smoothly can operate faster and more efficiently on the multi-core 64-bit processors, for the most part.
Limited by the maximum amount of RAM Can take advantage of more memory,
Performance
it can access enabling faster performance
Compatibility Can run 32-bit and 16-bit applications Can run 32-bit and 64-bit applications
Address Space Uses 32-bit address space Uses 64-bit address space
18
hardware-level protection
Can handle multiple tasks but with limited Can handle multiple tasks more efficiently
Multitasking
efficiency
Gamers can easily play High graphical games like Modern Warfare, GTA V, or use high-end
software like Photoshop or CAD which takes a lot of memory since it makes multi-tasking with big
software, easy and efficient for users. However, upgrading the video card instead of getting a 64-bit
processor would be more beneficial.
Note:
A computer with a 64-bit processor can have a 64-bit or 32-bit version of an operating system
installed. However, with a 32-bit operating system, the 64-bit processor would not run at its full
capability.
On a computer with a 64-bit processor, we can’t run a 16-bit legacy program. Many 32-bit programs
will work with a 64-bit processor and operating system, but some older 32-bit programs may not
function properly, or at all, due to limited or no compatibility.
CISC: The CISC approach attempts to minimize the number of instructions per program but at the cost
of an increase in the number of cycles per instruction.
19
Earlier when programming was done using assembly language, a need was felt to make instruction do more
tasks because programming in assembly was tedious and error-prone due to which CISC architecture
evolved but with the uprise of high-level language dependency on assembly reduced RISC architecture
prevailed.
Characteristic of RISC –
1. Simpler instruction, hence simple instruction decoding.
2. Instruction comes undersize of one word.
3. Instruction takes a single clock cycle to get executed.
4. More general-purpose registers.
5. Simple Addressing Modes.
6. Fewer Data types.
7. A pipeline can be achieved.
Characteristic of CISC –
1. Complex instruction, hence complex instruction decoding.
2. Instructions are larger than one-word size.
3. Instruction may take more than a single clock cycle to get executed.
4. Less number of general-purpose registers as operations get performed in memory itself.
5. Complex Addressing Modes.
6. More Data types.
Example – Suppose we have to add two 8-bit numbers:
CISC approach: There will be a single command or instruction for this like ADD which will perform
the task.
RISC approach: Here programmer will write the first load command to load data in registers then it
will use a suitable operator and then it will store the result in the desired location.
So, add operation is divided into parts i.e. load, operate, store due to which RISC programs are longer and
require more memory to get stored but require fewer transistors due to less complex command.
Difference –
RISC CISC
Can perform only Register to Register Arithmetic Can perform REG to REG or REG to MEM or MEM
operations to MEM
20
RISC CISC
An instruction executed in a single clock cycle Instruction takes more than one clock cycle
An instruction fit in one word. Instructions are larger than the size of one word
Simple and limited addressing modes. Complex and more addressing modes.
The number of instructions are less as compared to The number of instructions are more as compared to
CISC. RISC.
Here, Addressing modes are less. Here, Addressing modes are more.
Advantages of RISC:
Simpler instructions: RISC processors use a smaller set of simple instructions, which makes them easier to
decode and execute quickly. This results in faster processing times.
Faster execution: Because RISC processors have a simpler instruction set, they can execute instructions
faster than CISC processors.
Lower power consumption: RISC processors consume less power than CISC processors, making them
ideal for portable devices.
Disadvantages of RISC:
More instructions required: RISC processors require more instructions to perform complex tasks than
CISC processors.
Increased memory usage: RISC processors require more memory to store the additional instructions
needed to perform complex tasks.
Higher cost: Developing and manufacturing RISC processors can be more expensive than CISC processors.
21
Advantages of CISC:
Reduced code size: CISC processors use complex instructions that can perform multiple operations,
reducing the amount of code needed to perform a task.
More memory efficient: Because CISC instructions are more complex, they require fewer instructions to
perform complex tasks, which can result in more memory-efficient code.
Widely used: CISC processors have been in use for a longer time than RISC processors, so they have a
larger user base and more available software.
Disadvantages of CISC:
Slower execution: CISC processors take longer to execute instructions because they have more complex
instructions and need more time to decode them.
More complex design: CISC processors have more complex instruction sets, which makes them more
difficult to design and manufacture.
Higher power consumption: CISC processors consume more power than RISC processors because of their
more complex instruction sets.
An instruction is a set of codes that the computer processor can understand. The code is usually in 1s and
0s, or machine language. It contains instructions or tasks that control the movement of bits and bytes within
the processor.
Types of Instruction Set
Generally, there are two types of instruction set used in computers.
Reduced Instruction set Computer (RISC)
A number of computer designers recommended that computers use fewer instructions with simple
constructs so that they can be executed much faster within the CPU without having to use memory as often.
This type of computer is called a Reduced Instruction Set Computer.
The concept of RISC involves an attempt to reduce execution time by simplifying the instruction set of
computers.
Characteristics of RISC
The characteristics of RISC are as follows −
Relatively few instructions.
Relatively few addressing modes.
Memory access limited to load and store instructions.
All operations done within the register of the CPU.
Single-cycle instruction execution.
22
Fixed length, easily decoded instruction format.
Hardwired rather than micro programmed control.
A characteristic of RISC processors’ ability is to execute one instruction per clock cycle. This is done by
overlapping the fetch, decode and execute phases of two or three instructions by using a procedure referred
as pipelining.
Complex Instruction Set Computer (CISC)
CISC is a computer where a single instruction can perform numerous low-level operations like a load from
memory and a store from memory, etc. The CISC attempts to minimize the number of instructions per
program but at the cost of an increase in the number of cycles per instruction.
The design of an instruction set for a computer must take into consideration not only machine language
constructs but also the requirements imposed on the use of high level programming languages.
The goal of CISC is to attempt to provide a single machine instruction for each statement that is written in a
high level language.
Characteristics of CISC
The characteristics of CISC are as follows −
A large number of instructions typically from 100 to 250 instructions.
Some instructions that perform specialized tasks and are used infrequently.
A large variety of addressing modes- typically from 5 to 20 different modes.
Variable length instruction formats.
Instructions that manipulate operands in memory.
Instruction Codes
Computer instructions are the basic components of a machine language program. They are also known as
macro operations, since each one is comprised of sequences of micro operations. Each instruction initiates a
sequence of micro operations that fetch operands from registers or memory, possibly perform arithmetic,
logic, or shift operations, and store results in registers or memory. Instructions are encoded as binary
instruction codes. Each instruction code contains of a operation code, or opcode, which designates the
overall purpose of the instruction (e.g. add, subtract, move, input, etc.). The number of bits allocated for the
opcode determined how many different instructions the architecture supports. In addition to the opcode,
many instructions also contain one or more operands, which indicate where in registers or memory the data
required for the operation is located. For example, and add instruction requires two operands, and a not
instruction requires one.
15 12 11 6 5 0
The opcode and operands are most often encoded as unsigned binary numbers in order to minimize the
number of bits used to store them. For example, a 4-bit opcode encoded as a binary number could represent
up to 16 different operations. The control unit is responsible for decoding the opcode and operand bits in the
instruction register, and then generating the control signals necessary to drive all other hardware in the CPU
to perform the sequence of microoperations that comprise the instruction.
The Indirect Cycle is always followed by the Execute Cycle. The Interrupt
Cycle is always followed by the Fetch Cycle. For both fetch and execute
cycles, the next cycle depends on the state of the system.
24
We assumed a new 2-bit
register called Instruction
Cycle Code (ICC). The
ICC designates the state
of processor in terms of
which portion of the cycle
it is in:-
00 : Fetch Cycle
01 : Indirect Cycle
10 : Execute Cycle
11 : Interrupt Cycle
At the end of the each
cycles, the ICC is set
appropriately. The above
flowchart of Instruction
Cycle describes the
complete sequence of
micro-operations, depending only on the instruction sequence and the interrupt pattern(this is a simplified
example). The operation of the processor is described as the performance of a sequence of micro-operation.
Different Instruction Cycles:
Step 1: The address in the program counter is moved to the memory address register(MAR), as this is the
only register which is connected to address lines of the system bus.
Step 2: The address in MAR is placed on the address bus, now the control unit issues a READ command on
the control bus, and the result appears on the data bus and is then copied
into the memory buffer register(MBR). Program counter is incremented by
one, to get ready for the next instruction. (These two action can be
performed simultaneously to save
time)
Step 3: The content of the MBR is moved to the instruction
register(IR).
Thus, a simple Fetch Cycle consist of three steps and four
micro-operation. Symbolically, we can write these sequence of events
as follows:-
Step 1: The address field of the instruction is transferred to the MAR. This is used to fetch the address of
the operand.
Step 2: The address field of the IR is updated from the
MBR.(So that it now contains a direct addressing rather
than indirect addressing)
Step 3: The IR is now in the state, as if indirect
addressing has not been occurred.
Note: Now IR is ready for the execute cycle, but it skips that cycle for a moment to consider
the Interrupt Cycle .
Above, is instruction adds the content of location X to register R. Corresponding micro-operation will
be:-
Advantages:
1. Standardization: The instruction cycle provides a standard way for CPUs to execute instructions, which
allows software developers to write programs that can run on multiple CPU architectures. This
standardization also makes it easier for hardware designers to build CPUs that can execute a wide range
of instructions.
2. Efficiency: By breaking down the instruction execution into multiple steps, the CPU can execute
instructions more efficiently. For example, while the CPU is performing the execute cycle for one
instruction, it can simultaneously fetch the next instruction.
3. Pipelining: The instruction cycle can be pipelined, which means that multiple instructions can be in
different stages of execution at the same time. This improves the overall performance of the CPU, as it
can process multiple instructions simultaneously.
Disadvantages:
1. Overhead: The instruction cycle adds overhead to the execution of instructions, as each instruction must
go through multiple stages before it can be executed. This overhead can reduce the overall performance
of the CPU.
27
2. Complexity: The instruction cycle can be complex to implement, especially if the CPU architecture and
instruction set are complex. This complexity can make it difficult to design, implement, and debug the
CPU.
3. Limited parallelism: While pipelining can improve the performance of the CPU, it also has limitations.
For example, some instructions may depend on the results of previous instructions, which limits the
amount of parallelism that can be achieved. This can reduce the effectiveness of pipelining and limit the
overall performance of the CPU.
Issues of Different Instruction Cycles :
Here are some common issues associated with different instruction cycles:
1. Pipeline hazards: Pipelining is a technique used to overlap the execution of multiple instructions by
breaking them into smaller stages. However, pipeline hazards occur when one instruction depends on the
completion of a previous instruction, leading to delays and reduced performance.
2. Branch prediction errors: Branch prediction is a technique used to anticipate which direction a
program will take when encountering a conditional branch instruction. However, if the prediction is
incorrect, it can result in wasted cycles and decreased performance.
3. Instruction cache misses: Instruction cache is a fast memory used to store frequently used instructions.
Instruction cache misses occur when an instruction is not found in the cache and needs to be retrieved
from slower memory, resulting in delays and decreased performance.
4. Instruction-level parallelism limitations: Instruction-level parallelism is the ability of a processor to
execute multiple instructions simultaneously. However, this technique has limitations as not all
instructions can be executed in parallel, leading to reduced performance in some cases.
5. Resource contention: Resource contention occurs when multiple instructions require the use of the
same resource, such as a register or a memory location. This can lead to delays and reduced performance
if the processor is unable to resolve the contention efficiently.
Addressing Modes
The operation field of an instruction specifies the operation to be performed. This operation must be
executed on some data stored in computer register as memory words. The way the operands are chosen
during program execution is dependent on the addressing mode of the instruction. The addressing mode
specifies a rule for interpreting or modifying the address field of the instruction between the operand is
activity referenced. Computers use addressing mode technique for the purpose of accommodating one or
both of the following provisions.
(1) To give programming versatility to the uses by providing such facilities as pointer to memory, counters
for top control, indexing of data, and program relocation.
(2) To reduce the number of bits in the addressing fields of the instruction.
Immediate
Direct
Indirect
Register
Register Indirect
Displacement
Stack
28
All computer architectures provide more than one of these addressing modes. The question arises as to how
the control unit can determine which addressing mode is being used in a particular instruction. Several
approaches are used. Often, different opcodes will use different addressing modes. Also, one or more bits in
the instruction format can be used as a mode field. The value of the mode field determines which addressing
mode is to be used.
What is the interpretation of effective address. In a system without virtual memory, the effective address will
be either a main memory address or a register. In a virtual memory system, the effective address is a virtual
address or a register. The actual mapping to a physical address is a function of the paging mechanism and is
invisible to the programmer.
Opcode
Mode
Address
Immediate Addressing: The simplest form of addressing is immediate addressing, in which the operand is
actually present in the instruction: OPERAND = A
This mode can be used to define and use constants or set initial values of variables. The advantage of
immediate addressing is that no memory reference other than the instruction fetch is required to obtain the
operand. The disadvantage is that the size of the number is restricted to the size of the address field, which,
in most instruction sets, is small compared with the world length.
Direct Addressing: A very simple form of addressing is direct addressing, in which the address field
contains the effective address of the operand: EA = A .
Indirect Addressing: With direct addressing, the length of the address field is usually less than the word
length, thus limiting the address range. One solution is to have the address field refer to the address of a
word in memory, which in turn contains a full-length address of the operand. This is known as indirect
addressing: EA = (A)
Register Addressing:
Register addressing is similar to direct addressing. The only difference is that the address field refers to a
register rather than a main memory address: EA = R. The advantages of register addressing are that only a
small address field is needed in the instruction and no memory reference is required. The disadvantage of
register addressing is that the address space is very limited. The exact register location of the operand in case
of Register Addressing
Register Indirect Addressing: Register indirect addressing is similar to indirect addressing, except that the
address field refers to a register instead of a memory location. It requires only one memory reference and no
special calculation. EA = (R) Register indirect addressing uses one less memory reference than indirect
addressing. Because, the first information is available in a register which is nothing but a memory address.
From that memory location, we use to get the data or information. In general, register access is much more
faster than the memory access.
29
Displacement Addressing: A very powerful mode of addressing combines the capabilities of direct
addressing and register indirect addressing, which is broadly categorized as displacement addressing: EA =
A + (R) Displacement addressing requires that the instruction have two address fields, at least one of which
is explicit. The value contained in one address field (value = A) is used directly. The other address field, or
an implicit reference based on opcode, refers to a register whose contents are added to A to produce the
effective address. Three of the most common use of displacement addressing are:
• Relative addressing
• Base-register addressing
• Indexing
Relative Addressing: For relative addressing, the implicitly referenced register is the program counter (PC).
That is, the current instruction address is added to the address field to produce the EA. Thus, the effective
address is a displacement relative to the address of the instruction.
Base-Register Addressing: The reference register contains a memory address, and the address field
contains a displacement from that address. The register reference may be explicit or implicit. In some
implementation, a single segment/base register is employed and is used implicitly. In others, the
programmer may choose a register to hold the base address of a segment, and the instruction must reference
it explicitly.
Indexing: The address field references a main memory address, and the reference register contains a
positive displacement from that address. In this case also the register reference is sometimes explicit and
sometimes implicit. Generally index register are used for iterative tasks, it is typical that there is a need to
increment or decrement the index register after each reference to it. Because this is such a common
operation, some system will automatically do this as part of the same instruction cycle. This is known as
auto-indexing. We may get two types of auto-indexing: -one is auto-incrementing and the other one is -auto-
decrementing. If certain registers are devoted exclusively to indexing, then auto-indexing can be invoked
implicitly and automatically. If general purpose register are used, the auto index operation may need to be
signaled by a bit in the instruction.
EA = A + (R)
R = (R) + 1
EA = A + (R)
R = (R) - 1
In some machines, both indirect addressing and indexing are provided, and it is possible to employ both in
the same instruction. There are two possibilities: The indexing is performed either before or after the
indirection. If indexing is performed after the indirection, it is termed post indexing
EA = (A) + (R)
30
First, the contents of the address field are used to access a memory location containing an address. This
address is then indexed by the register value.
With pre indexing, the indexing is performed before the indirection: EA = ( A + (R)
An address is calculated, the calculated address contains not the operand, but the address of the operand.
Stack Addressing: A stack is a linear array or list of locations. It is sometimes referred to as a pushdown
list or last-in-first-out queue. A stack is a reserved block of locations. Items are appended to the top of the
stack so that, at any given time, the block is partially filled. Associated with the stack is a pointer whose
value is the address of the top of the stack. The stack pointer is maintained in a register. Thus, references to
stack locations in memory are in fact register indirect addresses. The stack mode of addressing is a form of
implied addressing. The machine instructions need not include a memory reference but implicitly operate on
the top of the stack.
INTERRUPTS
An interrupt is a signal from a device attached to a computer or from a program within the computer that
requires the operating system to stop and figure out what to do next.
Interrupt systems are nothing but while the CPU can process the programs if the CPU needs any IO
operation. Then, it is sent to the queue and it does the CPU process. Later on Input/output (I/O) operation is
ready.
The I/O devices interrupt the data which is available and does the remaining process; like that interrupts are
useful. If interrupts are not present, the CPU needs to be in idle state for some time, until the IO operation
needs to complete. So, to avoid the CPU waiting time interrupts are coming into picture.
Processor handle interrupts
Whenever an interrupt occurs, it causes the CPU to stop executing the current program. Then, comes the
control to interrupt handler or interrupt service routine.
These are the steps in which ISR handles interrupts. These are as follows −
Step 1 − When an interrupt occurs let assume processor is executing i' th instruction and program counter
will point to the next instruction (i+1)th.
Step 2 − When an interrupt occurs the program value is stored on the process stack and the program counter
is loaded with the address of interrupt service routine.
Step 3 − Once the interrupt service routine is completed the address on the process stack is popped and
placed back in the program counter.
Step 4 − Now it executes the resume for (i+1)th line.
Types of interrupts
There are two types of interrupts which are as follows −
Hardware interrupts
The interrupt signal generated from external devices and i/o devices are made interrupt to CPU when the
instructions are ready.
For example − In a keyboard if we press a key to do some action this pressing of the keyboard generates a
signal that is given to the processor to do action, such interrupts are called hardware interrupts.
Hardware interrupts are classified into two types which are as follows −
31
Maskable Interrupt − The hardware interrupts that can be delayed when a highest priority interrupt
has occurred to the processor.
Non Maskable Interrupt − The hardware that cannot be delayed and immediately be serviced by the
processor.
Software interrupts
The interrupt signal generated from internal devices and software programs need to access any system call
then software interrupts are present.
Software interrupt is divided into two types. They are as follows −
Normal Interrupts − The interrupts that are caused by the software instructions are called software
instructions.
Exception − Exception is nothing but an unplanned interruption while executing a program. For
example − while executing a program if we got a value that is divided by zero is called an exception.
Interrupts
The interrupt is a signal emitted by hardware or software when a process or an event needs immediate
attention. It alerts the processor to a high-priority process requiring interruption of the current working
process. In I/O devices one of the bus control lines is dedicated for this purpose and is called the Interrupt
Service Routine (ISR).
When a device raises an interrupt at let’s say process i, the processor first completes the execution of
instruction i. Then it loads the Program Counter (PC) with the address of the first instruction of the ISR.
Before loading the Program Counter with the address, the address of the interrupted instruction is moved to
a temporary location. Therefore, after handling the interrupt the processor can continue with process i+1.
While the processor is handling the interrupts, it must inform the device that its request has been recognized
so that it stops sending the interrupt request signal. Also, saving the registers so that the interrupted process
can be restored in the future, increases the delay between the time an interrupt is received and the start of the
execution of the ISR. This is called Interrupt Latency.
Software Interrupts:
A sort of interrupt called a software interrupt is one that is produced by software or a system as opposed to
hardware. Traps and exceptions are other names for software interruptions. They serve as a signal for the
operating system or a system service to carry out a certain function or respond to an error condition.
A particular instruction known as a “interrupt instruction” is used to create software interrupts. When the
interrupt instruction is used, the processor stops what it is doing and switches over to a particular interrupt
handler code. The interrupt handler routine completes the required work or handles any errors before
handing back control to the interrupted application.
Hardware Interrupts:
In a hardware interrupt, all the devices are connected to the Interrupt Request Line. A single request line is
used for all the n devices. To request an interrupt, a device closes its associated switch. When a device
requests an interrupt, the value of INTR is the logical OR of the requests from individual devices.
32
The sequence of events involved in handling an IRQ:
1. Devices raise an IRQ.
2. The processor interrupts the program currently being executed.
3. The device is informed that its request has been recognized and the device deactivates the request signal.
4. The requested action is performed.
5. An interrupt is enabled and the interrupted program is resumed.
Handling Multiple Devices:
When more than one device raises an interrupt request signal, then additional information is needed to
decide which device to be considered first. The following methods are used to decide which device to select:
Polling, Vectored Interrupts, and Interrupt Nesting. These are explained as following below.
1. Polling: In polling, the first device encountered with the IRQ bit set is the device that is to be serviced
first. Appropriate ISR is called to service the same. It is easy to implement but a lot of time is wasted by
interrogating the IRQ bit of all devices.
2. Vectored Interrupts: In vectored interrupts, a device requesting an interrupt identifies itself directly by
sending a special code to the processor over the bus. This enables the processor to identify the device
that generated the interrupt. The special code can be the starting address of the ISR or where the ISR is
located in memory and is called the interrupt vector.
3. Interrupt Nesting: In this method, the I/O device is organized in a priority structure. Therefore, an
interrupt request from a higher priority device is recognized whereas a request from a lower priority
device is not. The processor accepts interrupts only from devices/processes having priority.
Processors’ priority is encoded in a few bits of PS (Process Status register). It can be changed by program
instructions that write into the PS. The processor is in supervised mode only while executing OS routines. It
switches to user mode before executing application programs.
33