0% found this document useful (0 votes)
7 views

HandOut_Computer Architecture and Organization_OKAI

The document is a lecture note for the course CE 271 Computer Organization and Architecture at the University of Mines and Technology, compiled by George Essah Yaw Okai in May 2021. It outlines the course description, objectives, structure, and various chapters covering topics such as computer architecture, CPU functions, memory systems, and input/output processes. The document serves as a comprehensive guide for students to understand the fundamental principles of computer organization and architecture.

Uploaded by

meshablinx38
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

HandOut_Computer Architecture and Organization_OKAI

The document is a lecture note for the course CE 271 Computer Organization and Architecture at the University of Mines and Technology, compiled by George Essah Yaw Okai in May 2021. It outlines the course description, objectives, structure, and various chapters covering topics such as computer architecture, CPU functions, memory systems, and input/output processes. The document serves as a comprehensive guide for students to understand the fundamental principles of computer organization and architecture.

Uploaded by

meshablinx38
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

UNIVERSITY OF MINES AND TECHNOLOGY

TARKWA
FACULTY OF ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LECTURE NOTE

CE 271 COMPUTER ORGANIZATION AND


ARCHITECTURE

COMPILED BY

GEORGE ESSAH YAW OKAI

MAY 2021

0
CONTENTS
COURSE DESCRIPTION AND OUTLINE ........................................................................................................5

COURSE OBJECTIVES .................................................................................................................................6

COURSE PRESENTATION ...........................................................................................................................6

REFERENCES AND RECOMMENDED TEXTBOOKS .......................................................................................6

COURSE ASSESSMENT ...............................................................................................................................6

ATTENDANCE ............................................................................................................................................7

CHAPTER ONE ...........................................................................................................................................8

INTRODUCTION TO COMPUTERS ORGANIZATION AND ARCHITECTURE.....................................................8

1.1 Chapter Objectives and Expected Results ..................................................................................8

1.2 Computer Architecture and Organization ..................................................................................8

1.2.1 Computer system structure and functions: ...............................................................................9

1.3 The Subsystems of a Computer................................................................................................11

1.3.1 The Organizational Structure of the Computer System ...........................................................13

1.4 Information representation in the computer ...........................................................................15

1,4.1 Basic Information Units ..........................................................................................................15

1.4.2 Numeric System .....................................................................................................................16

CHAPTER TWO ........................................................................................................................................21

THE COMPUTER SYSTEM AND FUNCTIONS..............................................................................................21

2.1 Chapter objectives and expected results ..................................................................................21

2.2 Computer Components ...........................................................................................................21

2.3 Computer function ..................................................................................................................22

2.3.1 Interrupts ...............................................................................................................................24

2.4 Interconnection Structure........................................................................................................26

2.5 Computer Bus Interconnections ..............................................................................................27

2.5.1 Bus Structure ..........................................................................................................................27

CHAPTER THREE......................................................................................................................................31

THE CENTRAL PROCESSING UNIT (CPU) ...................................................................................................31

3.1 Chapter objectives and expected results ..................................................................................31

1
3.2 Processor Structure and Function ............................................................................................31

3.2.1 The structure of the CPU ........................................................................................................32

3.2.2 CPU Registers .........................................................................................................................34

3.2.3 Control Unit (CU): Control and timing section .........................................................................38

3.2.4 Instruction Pipelining ..............................................................................................................47

3.2.5 RISC and CISC Instruction Set Architecture ..............................................................................51

3.3 Computer Arithmetic ...............................................................................................................52

3.3.1 Arithmetic Logic Unit (ALU).....................................................................................................53

3.4 Computer Memory and Data Representation ..........................................................................54

3.4.1 Integer Representation ...........................................................................................................55

3.4.2 Floating-Point Number Representation...................................................................................63

3.5 Floating-point Arithmetic.........................................................................................................71

3.6 Character Encoding .................................................................................................................72

3.7 Instruction Set: Characteristics and functions ..........................................................................74

3.7.1 Arithmetic Instruction Sets .....................................................................................................75

3.7.2 Logical Instructions .................................................................................................................76

3.7.3 Data Transfer Instructions ......................................................................................................76

3.7.4 String Instructions ..................................................................................................................77

3.7.5 Branch Instructions.................................................................................................................77

3.7.6 Subroutine Call or Program Flow Control Instructions .............................................................77

3.7.7Return Instructions ..................................................................................................................77

3.7.8 Miscellaneous Instructions .....................................................................................................78

3.8 Addressing Modes ...................................................................................................................78

3.8.1 Immediate Addressing ............................................................................................................78

3.8.2 Direct Addressing ...................................................................................................................79

3.8.3 Register Addressing ................................................................................................................79

3.8.4 Register Indirect Addressing ...................................................................................................80

3.8.5 Displacement/ Inherent Addressing ........................................................................................80

CHAPTER FOUR .......................................................................................................................................82

COMPUTER PERFORMANCE EVALUATION...............................................................................................82

2
4.1 Chapter objectives and expected results ..................................................................................82

4.2 Computer Evolution.................................................................................................................82

4.3 Performance Assessment ........................................................................................................88

4.3.1 Clock Speed and Instructions per Second ................................................................................89

4.3.2 Performance Enhancement Calculations: Amdahl's Law .........................................................91

CHAPTER FIVE .........................................................................................................................................94

THE COMPUTER MEMORY SYSTEM .........................................................................................................94

5.1 Chapter objectives and expected results ..................................................................................94

5.2 Memory System ......................................................................................................................94

5.2.1 Characteristics of Memory System..........................................................................................95

5.2.2 Memory Types Used ...............................................................................................................96

5.2.3 Memory Organization of Computer System ............................................................................98

5.3 Memory Hierarchy...................................................................................................................99

5.3.1 Memory Performance .......................................................................................................... 101

5.4 Cache Memory ...................................................................................................................... 102

5.4.1 Separate Data and Instruction Caches................................................................................... 103

5.4.2 Cache Organisation............................................................................................................... 105

5.4.3 Replacement Algorithms ...................................................................................................... 108

5.4.4 Memory Write Strategies ..................................................................................................... 109

5.5 Virtual Memory ..................................................................................................................... 111

Virtual Memory Organization ........................................................................................................ 112

CHAPTER SIX ......................................................................................................................................... 115

THE I/O OF A COMPUTER SYSTEM......................................................................................................... 115

6.1 Chapter objectives and expected results ................................................................................ 115

6.2 Input/output Subsystem of a Computer................................................................................. 115

6.3 I/O Modules .......................................................................................................................... 116

6.4 I/O processes......................................................................................................................... 118

6.4.1 Programmed I/O................................................................................................................... 119

6.4.2 Interrupt-driven I/O .............................................................................................................. 120

6.4.3 Direct Memory Access (DMA) ............................................................................................... 122

3
6.5 Operating System Support ..................................................................................................... 122

6.5.1 OS Scheduling....................................................................................................................... 123

6.5.2 OS Memory Management..................................................................................................... 125

4
COURSE DESCRIPTION AND OUTLINE
Computer Organization and Architecture course is designed to help students appreciate the Science
and Technologies behind the proliferation of the computer over the last decade.

This course would provide students with the fundamentals of how computers perform their
functions.

This course is structured to cover the following areas:

 INTRODUCTION

 Organisation and Architecture

 Structure and Function


 COMPUTER EVOLUTION AND PERFORMANCE

 A brief history of computers


 Evolution of the Intel x86 Architecture

 Performance assessment
 COMPUTER SYSTEM

 Computer components

 Computer functions
 Interconnection structure

 Bus Interconnection
 THE CENTRAL PROCESSING UNIT (CPU)
 PARALLEL PROCESSING SYSTEM

 MEMORY SYSTEM
 INPUT/ OUTPUT

 OPERATING SYSTEM SUPPORT

 COMPUTER ARITHMETIC

 INSTRUCTION SETS etc.

5
COURSE OBJECTIVES
This course is designed for undergraduate student. The course is intended to provide students with
the fundamentals knowledge of computer organization and architecture, the factors influencing the
design of hardware and software elements to computer systems.

The goals of the course are to provide students with the basic knowledge of the followings:

 How the computers work, basic principles


 How to analyze the performance
 How computers are designed and built
 Issues affecting modern processors (caches and pipelines)

COURSE PRESENTATION
The course is presented through lectures supported with handouts and tutorials. The tutorial will be
in the form of problem solving and discussions and will constitute an integral part of each lecture.
The student can best understand and appreciate the subject by attending all lectures and laboratory
work, by practising, reading references and handouts and by completing all assignments and course
work on schedule.

REFERENCES AND RECOMMENDED TEXTBOOKS


 Stallings, W. (2015). Computer Organization and Architecture (10th Edition). NJ: Pearson
Education

 Patterson D. A. and Hennessy J.L.( 2012), Computer Organization and Design: The
Hardware and Software Interface (4th Edition),

 Honey D. M. and Morgan-Kaufmann S. L., (2007) Digital Design & Computer


Architecture,

COURSE ASSESSMENT

Factor Weight Location Date Time


Grading System

Exercises (3) 15 % In class 30 min/Each

Attendance 10 % In class Random

Quizzes 15 % DTBA Date to be Announced 2 Hrs

6
Final Exam 60 % (TBA) To Be Announced 3 Hrs
(TBA)

80-100% = A, 70-79.9%=B, 60-69.9%=C, 50-59.9%=D, 0-49.9%=Fail

ATTENDANCE
 UMaT rules and regulations say that, attendance is MANDANTORY for every student. A
total of FIVE (5) attendances shall be taken at random to the 10%. The only acceptable
excuse for absence is the one authorized by the Dean of Student on their prescribed form.
However, a student can also ask permission from me to be absent from a particular class
with a tangible reason. A student who misses all the five random attendances marked
WOULD not be allowed to take the final exams

7
CHAPTER ONE

INTRODUCTION TO COMPUTERS ORGANIZATION AND


ARCHITECTURE

1.1 Chapter Objectives and Expected Results


The objectives of this chapter are to:

o Provide basic understanding of computer architecture and organization


o Discuss and give the various function of subsystems of a computer
o Understand how the computer execute instruction

1.2 Computer Architecture and Organization


In describing a computer system, a distinction is often made between computer architecture and
computer organization. Although it is difficult to give precise definition for these terms, a
consensus exists about the general areas covered by each.

 Computer architecture refers to those attributes of a system visible to a programmer, or


put another way, those attributes that have a direct impact on the logical execution of a
program.
 Computer organization refers to the operational units and their interconnection that
realize the architecture specification.

Examples of architecture attributes include the instruction set, the number of bit to represent various
data types (e.g.., numbers, and characters), I/O mechanisms, and technique for addressing memory.
Organization attributes include those hardware details transparent to the programmer, such as
control signals, interfaces between the computer and peripherals, and the memory technology used.
As an example, it is an architectural design issue whether a computer will have a multiply
instruction. It is an organizational issue whether that instruction will be implemented by a special
multiply unit or by a mechanism that makes repeated use of the add unit of the system. The
organization decision may be bases on the anticipated frequency of use of the multiply instruction,
the relative speed of the two approaches, and the cost and physical size of a special multiply unit.
Historically, and still today, the distinction between architecture and organization has been an
important one. Many computer manufacturers offer a family of computer model, all with the same
architecture but with differences in organization. Consequently, the different models in the family

8
have different price and performance characteristics. Furthermore, architecture may survive many
years, but its organization changes with changing technology.

1.2.1 Computer system structure and functions:


A computer is a complex system. Contemporary computers contain million of elementary
electronic components. How, then, can one clearly describe them? The key is to recognize the
hierarchic nature of most complex system. A hierarchic system is a set of interrelated subsystem,
each of the later, in turn, hierarchic in structure until we reach some lowest level of elementary
subsystem. The hierarchic nature of complex systems is essential to both their design and their
description. The designer need only deal with a particular level of the system at a time. At each
level, the system consists of a set of components and their interrelationships. The behaviour at each
level depends only on a simplified, abstracted characterization of the system at the nest lower level.
At each level, the designer is concerned with structure and function:

 Structure: The way in which the components are interrelated.


 Function: The operation of each individual component as part of the structure.

In term of description, we have two choices: starting at the bottom and building up to a complete
description or with a top view and decomposing the system, describing their structure and function,
and proceed to successively lower layer of the hierarchy.

Functions of computer system:

In general terms, there are four main functions of a computer:

 Data processing
 Data storage
 Data movement
 Control

9
Figure 1.3: A functional view of the computer

The computer, of course, must be able to process data. The data may take a wide variety of forms,
and the range of processing requirements is broad. However, we shall see that there are only a few
fundamental methods or types of data processing. It is also essential that a computer store data.
Even if the computer is processing data on the fly (i.e., data come in and get processed, and the
results go right out), the computer must temporarily store at least those pieces of data that are being
worked on at any given moment. Thus, there is at least a short-term data storage function. Files of
data are stored on the computer for subsequent retrieval and update. The computer must be able to
move data between itself and the outside world. The computer’s operating environment consists of
devices that serve as either sources or destinations of data. When data are received from or
delivered to a device that is directly connected to the computer, the process is known as input-
output (I/O) and the device is referred to as a peripheral. When data are moved over longer
distances, to or from a remote device, the process is known as data communications. Finally, there
must be control of their three functions. Ultimately, this control is exercised by the individual who
provides the computer with instructions. Within the computer system, a control unit manages the
computer’s resources and orchestrates the performance of its functional parts in response to those
instructions. At this general level of discussion, the number of possible operations that can be

10
performed is few. The Figure 1.3 depicts the four possible types of operations. The computer can
function as a data movement device (Figure 1.4(a)), simply transferring data from one peripheral
or communications line to another. It can also function as a data storage device (Figure 1.4(b)),
with data transferred from the external environment to computer storage (read) and vice versa
(write). The final two diagrams show operations involving data processing, on data either in storage
or in route between storage and the external environment.

Figure 1.4: Possible computer operations

1.3 The Subsystems of a Computer


Throughout their brief history, the physical appearance of computers has changed dramatically, but
their basic function, to store and execute a series of instructions has remained the same.
Contemporary computer designs are based on concepts developed by John Von Neumann at the

11
Institute for Advanced Studies Princeton. Such a design is referred to as the Von Neumann
architecture and is based on three key concepts:

 Data and instructions are stored in a single read -write memory.


 The contents of this memory are addressable by location, without regard to the type of data
contained there.
 Execution occurs in a sequential fashion (unless explicitly modified) from one instruction
to the next.

The Von Neumann model of computer architecture characterizes the basic computer system as
consisting of four functional units. Figure 1.5 is the simplest possible depiction of a computer. The
computer is an entity that interacts in some fashion with its external environment. In general, all of
its linkages to the external environment can be classified as peripheral devices or communication
lines.

 Central Processing Unit (CPU): Controls the operation of the computer and performs
its data processing functions. Often simply referred to as processor.
 Main Memory: Stores data.
 I/O: Moves data between the computer and its external environment.
 System Interconnection: Some mechanism that provides for communication among
CPU, main memory, and I/O.

Figure 1.5: The computer top-level structure

There may be one or more of each of the above components. Traditionally, there has been just
a single CPU. In recent years, there has been increasing use of multiple processors, in a single
system. Each of these components will be examined in some detail later. However, for our

12
purpose, the most interesting and in some ways the most complex component is the CPU; i ts
structure is depicted in Figure 1.6. Its major structural components are:

 Control Unit (CU): Controls the operation of the CPU and hence the computer.
 Arithmetic and Logic Unit (ALU): Performs computer’s data processing functions.
 Register: Provides storage internal to the CPU.
 CPU Interconnection: Some mechanism that provides for communication among the
control unit, ALU, and register.

Each of these components will be examined in some detail later.

Figure 1.6: The CPU

1.3.1 The Organizational Structure of the Computer System


The general organisational structures of a general-purpose computer system consist of the
following:

 Input Device - The unit of the computer system that is responsible for getting data from the
user to the computer and computer to user. Data going from user to the computer is called
“input”. Examples of input devices are: mouse, keyboard, touch screen, voice recognition
devices
 Output Device - The unit of the computer system used to transmit data from the computer
memory to the user. Examples of output device are: monitor printer and speakers
 Storage Device (Memory) - Although a computer has several types of memory, the memory
referred to in the Von Neumann model is the main memory, also called Random Access
Memory (RAM). Main memory is used by the computer for storing a program and its data

13
while the program is running. What distinguishes a computer from a calculator is the ability
to run a stored program; main memory allows the computer to do that. The Random Access
Memory (RAM) can be thought of as a sequence of boxes, called cells, each of which can
hold a certain amount of data. RAM is constructed from circuitry that can hold data in the
form of an electronic charge that is either high or low. Conceptually, a high charge
represents the number 1 and a low charge the number 0. RAM must be coded in binary - in
terms of 0's and 1's. One of the high or low charges stored in memory (one 0 or one 1) is
called a bit and 8 bits is called a byte. For every computer, each memory cell can hold a
certain fixed number of bits, usually 8.
 The CPU - Inside the computer system, the remaining two components of the Von
Neumann Architecture are found on the CPU (Central Processing Unit) chip or MP
(Microprocessor) chip. The CPU consists of four main parts:
o Control Unit (CU)
o Arithmetic / Logic Unit (ALU)
o Register blocks (RB)
o Cache memory

Control Unit (CU) - The Control Unit is responsible for the order of execution of command
instructions in a program. It controls the sequencing and timing of all operations. It contains a
"clock," that is actually a quartz crystal that oscillates at a regular frequency when electrical power
is applied. The clock emits an electronic signal for each oscillation. Each separate operation is
synchronized to the clock signal.

Arithmetic /Logic Unit (ALU) - On the CPU chip is circuitry for performing arithmetic and logical
calculations. It can be thought of as being similar to a calculator, except that, in addition to normal
mathematical operations, it can also do logical (true/false) operations, such as comparing two
numbers to see which one is larger. Logical operations are important in computer programming.

Register blocks (RB) –The part of the central processing unit for storing numbers for manipulation
purposes or storing results of all operations executed by the CPU.

Cache –The part of central processing unit for increasing processor speed by reducing time for
unproductive operations executed by the CPU.

14
Figure 1.7: General organization of a computer

RAM ROM CD DRIVE

SYSTEM BUS

INPUT
ALU RB OUTPUT
DEVICE
DEVICE

CACHE
CU

1.4 Information representation in the computer

1,4.1 Basic Information Units


In order for the PC to process information, it is necessary that this information be in special cells
called registers. The registers are groups of 8 or 16 flip-flops. A flip-flop is a device capable of
storing two levels of voltage, a low one, regularly 0.5 volts, and another one, commonly of 5 volts.
The low level of energy in the flip-flop is interpreted as off or 0, and the high level as on or 1.
These states are usually known as bits, which are the smallest information unit in a computer.

 A single digit is called a bit.


 Four bits grouped together are called a nibble.
 Eight bits grouped together form a byte.

A very important characteristic of any microprocessor of a computer is the size of the accumulator
or registers. Microprocessors commonly use 8-bit accumulators or registers. The word size of the
microprocessor is then 8 bits. Microprocessors have word lengths of 4, 8, 16, or even 32 bits.

1 BIT = 0 or 1

4 BITS = 0000 or 1111

1 NIBBLE = 4BITS

15
The nibble is four bits or half a byte. Note that it has a maximum value of 15 (1111 = 15). This is
the basis for the hexadecimal (base 16) number system which is used as it is far easier to understand.

8 BITS = 1 BYTE

1 BYTE = 00000000 or 11111111

2 NIBBLES = 1 BYTE

A byte is 8 bits or 2 nibbles. A byte has a maximum value of FFh (255 decimal). Because a byte is
2 nibbles the hexadecimal representation is two hex digits in a row i.e. 3Dh. The byte is also that
size of the 8-bit registers which we will be covering later.

16 BITS = 0000000000000000 or 1111111111111111

16 BITS = 2 BYTES = 1WORD

A word is two bytes that are stuck together. A word has a maximum value of FFFFh (65,536).
Since a word is four nibbles, it is represented by four hex digits. This is the size of the 16 -bit
registers.

1.4.2 Numeric System


The numeric system we use daily is the decimal system, but this system is
not convenient for machines or computers since the information is codified in the shape of on (1)
or off (0) bits; this way of codifying takes us to the necessity
of knowing the positional calculation which will allow us to express a
number in any base where we need it.

Binary System:

Digital computers use binary numbers. The binary or base 2, number system uses only the digits 0
and 1. These binary digits are called bits. In the computer’s electronic circuits a 0 bit is represented
by a LOW voltage, whereas a 1 bit corresponds to a HIGH voltage.

Human beings are trained to understand the decimal number system. The decimal or base 10,
system has 10 digits (0-9). The decimal number system also has a place value characteristic. For
instance, Figure 2.1 shows that the decimal 1327 equals one 1000 plus three 100s two 10s plus
seven 1s (1000+300+20+7=1327).

16
Table 1.1 Place value in a decimal number

Powers of 10 103 102 101 100

Place value 1000s 100s 10s 1s

Decimal 1 3 2 7

Decimal 1000 + 300 + 20 + 7 = 1327

The binary number system also has a place value characteristic. The decimal value for the first
eight binary places is shown in Figure 2.2. The binary number 10110110 (say one, zero, one, one,
zero, one, one, zero) is then converted to its decimal equivalent of 182.

Table 1.2 Place value in a binary number

Power of 2 27 26 25 24 23 22 21 20

Place value 128s 64s 32s 16s 8s 4s 2s 1s

Binary 1 0 1 1 0 1 1 0

Decimal 128 + 32 + 16 + 4 + 2 =182

Decimal –to-binary conversion: To convert the decimal number 43 to a binary number. The
method that will be explained uses the successive division of two,
keeping the residue as a binary digit and the result as the next number to
divide.

43/2=21 and its residue is 1

21/2=10 and its residue is 1

10/2=5 and its residue is 0

5/2=2 and its residue is 1

2/2=1 and its residue is 0

1/2=0 and its residue is 1


Building the number from the bottom, we get that the binary result is
101011

Hexadecimal system:

A typical microcomputer location might hold the binary number 10011110. This long string of 0s
and 1s is hard to remember and difficult to enter on a keyboard. The number 10011110 2 could be
converted to a decimal number. Upon conversion it is found that 10011110 2 equal 15810. This

17
conversion process takes too long. Most microcomputer systems use hexadecimal notation to
simplify remembering and entering numbers such as 10011110.

The hexadecimal, or base 16, number system uses the 16 symbols 0 through 9, A, B, C, D, E, and
F. Decimal, hexadecimal, and binary equivalents are shown in Figure 2.3.

Table1.3 Counting in decimal, hexadecimal, and binary number system

Binary

8s 4s 2s 1s
Decimal Hexadecimal

0 0 0 0 0 0

1 1 0 0 0 1

2 2 0 0 1 0

3 3 0 0 1 1
4 4 0 1 0 0
5 5 0 1 0 1

6 6 0 1 1 0
7 7 0 1 1 1

8 8 1 0 0 0
9 9 1 0 0 1

10 A 1 0 1 0

11 B 1 0 1 1
12 C 1 1 0 0

13 D 1 1 0 1
13 E 1 1 1 0

15 F 1 1 1 1

Note from Fig. 2.3 that each hexadecimal symbol can represent a unique combination of 4 bits.
The binary number 10011110 could then be represented as 9E in hexadecimal. That is, the 1001
part of the binary number equals 9, according to Fig. 2.3, and the 1110 part of the binary number

18
equals E in hexadecimal. Therefore 10011110 2 equal 9E16. Remember that the subscripts give the
base of the number.

o To convert binary number to its hexadecimal equivalent, start at the less significant bit and
divide the binary number into groups of 4 bits each. Then replace each 4-bit group with its
equivalent hex digit.
o The reverse process is also done to convert hex number to its binary equivalent. That is
replacing the hex number with its equivalent binary numbers

Example:

1. Converting the number 111010 to hexadecimal (hex)

Start at the LSB and divide the binary number into 4-bit group, and then replace each 4-bit group
with its equivalent hex number.

The 10102 equals A in hex and 0011 2 equals 3 in hex, therefore 111010 2 = 3A16

2. Convert hex number 7F to its binary equivalent


Each hex number is replaced with its 4-bit binary equivalent, 7 = 0111 2 and F = 11112
Therefore 7F16 = 011111112

To convert the hexadecimal number 2C6E to its decimal equivalent we use the place values of each
hex number shown in Figure 2.4

Table 1.4 Hexadecimal-to-decimal conversions

Powers of 16 163 162 161 160

Place value 409s 256s 16s 1s

Hexadecimal 2 C 6 E

Decimal 2*409 + 12*256 + 6*16 + 14*1 = 11374 10

Octal number system:

Octal notation, like hexadecimal, is used to represent binary numbers. Octal uses the symbols 0 to
7 and therefore is called the base 8 number system. The table in Figure 2.5 shows the decimal,
octal, and binary equivalents
19
Table 1.5 Counting in decimal, octal, and binary number systems

Binary

Decimal Octal 8s 2s 1s

0 0 0 0 0

1 1 0 0 1

2 2 0 1 0

3 3 0 1 1

4 4 1 0 0

5 5 1 0 1

6 6 1 1 0

7 7 1 1 1

To convert the binary number 11111000100 to its octal equivalent, the procedure is illustrated in
below.

Starting at the LSB of the binary number, divide the number into 3-bit groups. Next, convert each
3-bit group into its equivalent octal digit.

From the binary number 11111000100, the number 011 =3, 111 = 7, 000 =0 and 100 = 4 in octal,
therefore 11111000100 2 = 37048

20
CHAPTER TWO

THE COMPUTER SYSTEM AND FUNCTIONS

2.1 Chapter objectives and expected results


The objective of this chapter is examining the various component of the computer system

 Processor,
 Memory,
 I/O, and
 The interconnections.

Each of these components is looked at in detail.

At the end of this chapter students are expected to know:

o The detail functions of the various components of the computer and its functional units,
o How these components are interconnected to perform as a unit system.

2.2 Computer Components

Figure 2.1 Computer Components

Figure 2.1 illustrates the top-level components of a computer and suggests the interactions among
them. The CPU exchanges data with memory. For this purpose, it typically makes use of two
internal (to the CPU) registers: a memory address register (MAR), which specifies the address in
memory for the next read or write, and a memory buffer register (MBR), which contains the data

21
to be written into memory or receives the data read from memory. An I/O addresses register
(I/OAR) specifies a particular I/O device. An I/O buffer (I/OBR) register is used for the exchange
of data between an I/O module and the CPU.
A memory module consists of a set of locations, defined by sequentially numbered addresses. Each
location contains a binary number that can be interpreted as either an instruction or data. An I/O
module transfers data from external devices to CPU and memory, and vice versa. It contains
internal buffers for temporarily holding these data until they can be sent on.

2.3 Computer function


The basic function performed by a computer is execution of a program, which consists of a set of
instructions stored in memory. Instruction processing consists of two steps: The processor reads
(fetches) instructions from memory one at a time and executes each instruction. Program execution
consists of repeating the process of instruction fetch and instruction execution. The Instruction
execution may involve several operations and depends on the nature of the instruction.

The processing time required for a single instruction to execute is called an instruction cycle.
Using the simplified two-step description given previously, the instruction cycle is depicted in
Figure 2.2

Figure 2.2: Basic instruction cycle

The two steps are referred to as the fetch cycle and the execute cycle. Program execution halts only
if the machine is turned off, some sort of unrecoverable error occurs, or a program instruction that
halts the computer is encountered.

Fetch Cycle:

 Program Counter (PC) holds address of next instruction to fetch


 Processor fetches instruction from memory location pointed to by PC
 Increment PC (Unless told otherwise)
 Instruction loaded into Instruction Register (IR)
 Processor interprets instruction and performs required actions

22
At the beginning of each instruction cycle, the processor fetches an instruction from memory. In a
typical processor, a register called the program counter (PC) holds the address of the instruction to
be fetched next. Unless told otherwise, the processor always increments the PC after each
instruction fetch so that it will fetch the next instruction in sequence. The fetched instruction is
loaded into a register in the processor known as the instruction register (IR). The instruction
contains bits that specify the action the processor is to take. The processor interprets the instruction
and performs the required action.

Execute Cycle:

In general, the required actions fall into four categories:

 Processor-memory: Data may be transferred from processor to memory or from memory


to processor.
 Processor-I/O: Data may be transferred to or from a peripheral device by transferring
between the processor and an I/O module.
 Data processing: The processor may perform some arithmetic or logic operation on data.
 Control: An instruction may specify that the sequence of execution be altered.
 An instruction’s execution may involve a combination of these actions.

Figure 2.3 provides a more detailed look at the basic instruction cycle. The figure is in the form of
a state diagram.

Figure 2.3: Instruction cycle state diagram

For any given instruction cycle, some stales may be null and others may be visited more than once.
The states can be described as follows:

 Instruction Address Calculation (IAC): Determine the address of the next instruction to
be executed. Usually, this involves adding a fixed number to the address of the previous
23
instruction. For example, if each instruction is 16 bits long and memory is organized into
16-bit words, then add 1 to the previous address. If, instead, memory is organized as
individually addressable 8-bit bytes, then add 2 to the previous address.
 Instruction Fetch (IF): Read instruction from its memory location into the processor.
 Instruction Operation Decoding (IOD): Analyze instruction to determine type of operation
to he performed and operand(s) to be used.
 Operand Address Calculation (OAC): If the operation involves reference to an operand in
memory or available via I/O. then determine the address of the operand.
 Operand Fetch (OF): Fetch the operand from memory or read it in from I/O,
 Data Operation (DO): Perform the operation indicated in the instruction.
 Operand Store (OS): Write the result into memory or out to I/O

2.3.1 Interrupts
A mechanism provided by virtually all computers by which the I/O and memory modules may
cause an interruption to the normal processing sequence of the processor is referred to as an
Interrupt. Interrupts are provided primarily as a way to improve processing efficiency since most
external devices are much slower than the processor. Table 2.1 lists the most common classes of
interrupts.
Table 2.1 Classes of interrupts
1. Program Generated by some condition that occurs as a result of
an instruction execution, such as arithmetic overflow,
division by zero, attempt to execute an illegal machine
instruction, or reference outside a user’s allowed
memory space.
2. Timer Generated by a timer within the processor. This allows
the operating system to perform certain functions on a
regular basis.
3. I/O Generated by an I/O controller, to signal normal
completion of an operation.
4. Hardware failure Generated by a failure such as power failure or
memory parity error.

With interrupts, the processor can be engaged in executing other instructions while an I/O operation
is in progress. When the external device becomes ready to be serviced that is, when it is ready to
accept more data from the processor, the I/O module for that external device sends an interrupt
request signal to the processor. The processor responds by suspending operation of the current
program, branching off to a program called an Interrupt Service Routine (ISR) to service that

24
particular I/O device, known as an interrupt handler, and resuming the original execution after the
device is serviced.
For interrupts to be accommodated, an interrupt cycle is added to the instruction cycle, as shown
in Figure 2.4.

Figure 2.4 Instruction cycle with Interrupts

In the interrupt cycle, the processor checks to see if any interrupts have occurred, indicated by the
presence of an interrupt signal. If no interrupts are pending, the processor proceeds to the fetch
cycle and fetches the next instruction of the current program. If an interrupt is pending, the
processor does the following:
 It suspends execution of the current program being executed and saves its context. This
means saving the address of the next instruction to be executed (current contents of the
program counter) and any other data relevant to the processor’s current activity.
 It sets the program counter to the starting address of an interrupt handler routine.
The processor now proceeds to the fetch cycle and fetches the first instruction in the interrupt
handler program, which will service the interrupt. The interrupt handler program is generally part
of the operating system. Typically, this program determines the nature of the interrupt and performs
whatever actions are needed. The handler determines which I/O module generated the interrupt and
may branch to a program that will write more data out to that I/O module. When the interrupt
handler routine is completed, the processor can resume execution of the user program at the point
of interruption.
Multiple interrupts can occur and as such there are two approaches that can be taken to dealing
with multiple interrupts:
1. Disable interrupts while an interrupt is being processed.

25
A disabled interrupt simply means that the processor can and will ignore that interrupt request
signal. If an interrupt occurs during this time, it generally remains pending and will be checked by
the processor after the processor has enabled interrupts.
2. Define priorities for interrupts
This allows an interrupt of higher priority to cause a lower-priority interrupt handler to be itself
interrupted.

2.4 Interconnection Structure


A computer consists of a set of modules of three basic types processor, memory and I/O that
communicate with each other. In effect, a computer is a network of basic modules. Thus, there
must be paths for connecting the modules. The collection of paths connecting the various modules
is called the interconnection structure. The design of this structure will depend on the exchanges
that must be made between modules. Figure 2.5 suggests the types of exchanges that are needed
by indicating the major forms of input and output for each module type:

 Memory
 Input/output
 CPU

Figure 2.5 Computer Modules

The interconnection structure must support the following types of transfers:

 Memory to processor: The processor reads an instruction or a unit of data from memory.
 Processor to memory: The processor writes a unit of data to memory.

26
 I/O to processor: The processor reads data from an I/O device via an I/O module.
 Processor to I/O: The processor sends data to the I/O device.
 I/O to or from memory: for these two cases, an I/O module is allowed to exchange data
directly with memory, without going through the processor, using direct memory access
(DMA).

2.5 Computer Bus Interconnections


A bus is a communication pathway connecting two or more devices. A key characteristic of a bus
is that it is a shared transmission medium. Multiple devices connect to the bus, and a signal
transmitted by any one device is available for reception by all other devices attached to the bus
(broadcast). If two devices transmit during the same time period, their signals will overlap and
become garbled. Thus, only one device at a time can successfully transmit.

A bus consists of multiple communication pathways, or lines. Each line is capable of transmitting
signals representing binary 1 and binary 0. Taken together, several lines of a bus can be used to
transmit binary digits simultaneously (in parallel). For example, an 8 -bit unit of data can be
transmitted over eight bus lines.

Computer systems contain a number of different buses that provide pathways between components
at various levels of the computer system hierarchy. A bus that connects major computer
components (processor, memory, I/O) is called a System Bus. The most common computer
interconnection structures are based on the use of one or more system buses.

2.5.1 Bus Structure


A system bus consists, typically, of from about 50 to hundreds of separate lines. Each line is
assigned a particular meaning or function. Although there are many different bus designs, on any
bus the lines can be classified into three functional groups shown in figure 2.6:

o Data,
o Address, and
o Control lines.

Figure 2.6 Bus Interconnection scheme


27
In addition, there may he power distribution lines that supply power to the attached modules.

The data lines (Data bus)

 Provide a path for moving, data between system modules. These lines, collectively, are
called the data bus.
 The width of the data bus: The data bus may consist of from 32 to hundreds of separate
lines, the number of lines being referred to as the width of the data bus. Because each line
can carry only 1 bit at a time, the number of lines determines how many bits can be
transferred at a lime. The width of the data bus is a key factor in determining overall system
performance. For example, if the data bus is 8 bits wide and each instruction is 16 bits long,
then the processor must access the memory module twice during each instruction cycle.

The address lines (Address bus)

 Address lines are used to designate the source or destination of the data on the data bus.
For example, if the processor wishes to read a word (8, 16. or 32 bits) of data from memory,
it puts the address of the desired word on the address lines.
 The width of the address bus: determines the maximum possible memory capacity of the
system. Furthermore, the address lines are generally also used to address I/O ports.

The control lines (Control bus)

 Control buses are used to control the access to and the use of the data and address lines.
Because the data and address lines are shared by all components, there must be a means
of controlling their use. Control signals transmit both command and liming information
between system modules. Timing signals indicate the validity of data and address
information.
 Command signals specify operations to be performed. Typical control lines include the
following:
 Memory write: Causes data on the bus to be written into the addressed location.
 Memory read: Causes data from the addressed location to be placed on the bus.
 I/O write: Causes data on the bus to be output to the addressed I/O port.
 I/O read: Causes data from the addressed I/O port to be placed on the bus.
 Transfer ACK: Indicates that data have been accepted from or placed on the bus.
 Bus request: Indicates that a module needs to gain control of the bus.
 Bus grant: Indicates that a requesting module has been granted control of the bus.
 Interrupt request: Indicates that an interrupt is pending.

28
 Interrupt ACK: Acknowledges that the pending interrupt has been recognized.
 Clock: Used to synchronize operations.
 Reset: Initializes all modules.

Multiple-Bus Hierarchies

If a great number of devices are connected to the bus, performance suffers. There are two main
causes:

 The more devices attached to the bus, the greater the bus length and hence the greater the
propagation delay. This delay determines the time it takes for devices to coordinate the use
of the bus. When control of the bus passes from one device to another frequently, these
propagation delays can noticeably affect performance.
 The bus may become a bottleneck as the aggregate data transfer demand approaches the
capacity of the bus. This problem can be countered to some extent by increasing the data
rate that the bus can carry and by using wider buses (e.g., increasing the data bus from 32
to 64 bit). However, because the data rates generated by attached devices (e.g. graphics
and video controllers, network interfaces) are growing rapidly, this is a race that a single
bus is ultimately destined to lose.

Accordingly, most computer systems use multiple buses, generally laid out in a hierarchy. A typical
traditional structure is shown in Figure 2.7. There is a local bus that connects the processor to a
cache memory and that may support one or more local devices. The cache memory controller
connects the cache not only to this local bus, but to a system bus to which are attached all of the
main memory modules. It is possible to connected I/O controllers directly onto the system bus. A
more efficient solution is to make use of one or more expansion buses for this purpose. An
expansion bus interface buffers data transfers between the system bus and the I/O controllers on
the expansion bus. This arrangement allows the system to support a wide variety of I/O devices
and at the same time insulate memory-to-processor traffic from I/O traffic.

Traditional (ISA) (with cache):

29
Figure 2.7: Traditional bus architecture

30
CHAPTER THREE

THE CENTRAL PROCESSING UNIT (CPU)

3.1 Chapter objectives and expected results


This chapter examines the internal structure and function of the processor. The processor consists
of registers, the arithmetic and logic unit, the instruction execution unit, a control unit, and the
interconnections among these components. Architectural issues, such as instruction set design and
data types, are covered. The part also looks at organizational issues, such as pipelining.
At the end of this chapter students are expected to know:
 The functionality of the arithmetic and logic unit (ALU);
 Instruction Sets: Characteristics and Functions;
 Instruction Sets: Addressing Modes and Formats;
 Processor Structure and Function;
 Reduced Instruction Set Computers; and
 Instruction-Level Parallelism and Superscalar Processors

3.2 Processor Structure and Function


The primary functional unit of any microcomputer system is called the Central Processing Unit, or
the CPU. The microprocessor unit forms the CPU in the computer system. This is the unit of the
computer where all instruction inputted in a computer are processed and executed.

The primary functions of the Central Processing Unit (CPU) or the processor are:

 Fetch, decode, and execute program instructions in the proper order.


 Transfer data to and from memory, and to and from the input/output sections of the
computer.
 Respond to external interrupts.
 Provide overall timing and control signals for the entire system.

To do these things, it should be clear that the CPU needs to store some data temporarily. It must
remember the location of the last instruction so that it can know where to get the next instruction.
It needs to store instructions and data temporarily while an instruction is being executed. In other
words, the CPU needs a small internal memory.

31
3.2.1 The structure of the CPU
Generally, the Central Processing Unit or the processor of a microcomputer will contain storage
elements called registers and computational circuitry called ALU (Arithmetic and Logic Unit). The
CPU needs to store some data temporarily. It must remember the location of the last instruction so
that it can know where to get the next instruction. It needs to store instructions and data temporarily
while an instruction is being executed. In other words, the CPU needs small internal memories
which are the registers. It will also contain instruction-decoding circuitry and control and timing
section. The processor will also have the necessary input and output connections. The major
components of the CPU are an arithmetic and logic unit (ALU) and a control unit (CU). The ALU
does the actual computation or processing of data. The control unit controls the movement of data
and instructions into and out of the CPU and controls the operation of the ALU. In addition, the
CPU has internal memory, consisting of a set of storage locations, called registers.

Main units of the CPU or processor are listed below:

 Registers
 ALU (Arithmetic and Logic Unit)
 Instruction decoder
 Control Unit (Control and timing unit)
 Inputs/outputs

Figure 3.1 is a simplified view of a CPU, indicating its connection to the rest of the system via the
system bus.

Figure 3.1: The CPU with the System Bus

Figure 3.2 is a slightly more detailed view of the CPU. The data transfer and logic control paths
are indicated, including an element labelled internal CPU-bus. This element is needed to transfer
data between the various registers and the ALU because the ALU in fact operates only on data in
the internal CPU memory.

32
Figure 3.2: CPU Internal Structure

A typical vivid architecture of the CPU is shown the diagram below

33
Figure 3.3: Simplified CPU architecture

3.2.2 CPU Registers


The CPU of a microcomputer has an internal storage area or memory called registers. These internal
storage areas are places in the CPU where a number can be stored and manipulated. Registers come
in different forms depending on the processor been used. There are the 8-bit, 16-bit and on the 386
and above processors 32-bit.

Register organization of the CPU:

Within the CPU, there is a set of registers that function as a level of memory above main memory
and cache in the hierarchy. The registers in the CPU perform two roles:

 User-visible registers: These enable the machine- or assembly-language programmer to


minimize main memory references by optimizing use of registers.

34
 Control and status registers: These are used by the control unit to control the operation of
the CPU and by privileged, operating system programs to control the execution of
programs.

There is not a clean separation of registers into these two categories. For example, on some
machines the program counter is user visible (e.g., Pentium), but on many it is not (e.g., PowerPC).
For purposes of the following discussion, however, we will use these categories.

User-Visible Registers:

A user-visible register is one that may be referenced by means of the machine language that the
CPU executes. We can characterize these in the following categories:

 General purpose
 Data
 Address
 Condition codes

General-purpose registers: can be assigned to a variety of functions by the programmer.


Sometimes their use within the instruction set is orthogonal to the operation. That is, any general -
-purpose register can contain the operand for any Opcode. This provides true general-purpose
register use. Often, however, there are restrictions. For example, there may be dedicated registers
for floating-point and stack operations. In some cases, general-purpose registers can be used for
addressing functions (e.g. register indirect, displacement). In other cases, there is a partial or clean
separation between data registers and address registers.

Data registers: may be used only to hold data and cannot be employed in the calculation of an
operand address.

Address registers: may be somewhat general purpose, or they may be devoted to a particular
addressing mode. Examples include the following:

 Segment pointers: In a machine with segmented addressing, a segment register holds the
address of the base of the segment. There may be multiple registers for example, one for
the operating system and one for the current process.
 Index registers: These are used for indexed addressing and may be auto-indexed.
 Stack pointer: A ‘stack’ is a small area of reserved memory used to store the data in the
CPU’s registers when:
(1) System calls are made by a process to operating system routines;

35
(2) When hardware interrupts generated by input/output (I/O) transactions on peripheral
devices;
(3) When a process initiates an I/O transfer; and
(4) When a process rescheduling event occurs on foot of a hardware timer interrupt. This
transfer of register contents is called a ‘context switch’.

The stack pointer is the register which holds the address of the most recent ‘stack’ entry. Hence,
when a system call is made by a process (to say print a document) and its context is stored on the
stack, the called system routine uses the stack pointer to reload the register contents when it is
finished printing. Thus, the process can continue where it left off.

Condition codes register (also referred to as flags): Condition codes are bits set by the CPU
hardware as the result of operations. For example, an arithmetic operation may produce a positive,
negative, zero, or overflow result. In addition to the result itself being stored in a register or
memory, a condition code is also set. The code may subsequently be tested as part of a conditional
branch operation.

Control and Status Registers:

There are a variety of CPU registers that are employed to control the operation of the CPU. Most
of these, on most machines, are not visible to the user. Some of them may be visible to machine
instructions executed in a control or operating system mode. Of course, different machines will
have different register organizations and use different terminology. We list here a reasonably
complete list of register types, with a brief description.

Four registers are essential to instruction execution:

 Program or Instruction counter (PC): The Program Counter (PC) is the register that
stores the address in primary memory (RAM or ROM) of the next instruction to be executed.
 Instruction register (IR): Contains the instruction most recently fetched. When the Bus
Interface Unit receives an instruction, it transfers it to the Instruction Register for
temporary storage.
 Memory address registers (MAR): Contains the address of a location in memory.
 Memory buffer register (MBR): Contains a word of data lo be written to memory or the
word most recently read.

Typically, the CPU updates the Program Counter (PC) after each instruction fetch so that the
Program Counter always points to the next instruction to be executed. A branch or skip instruction

36
will also modify the contents of the Program Counter. The fetched instruction is loaded into an
Instruction Register (IR), where the Opcode and operand specified are analysed. Data are
exchanged with memory using the MAR (memory address register) and MBR (memory buffer
register). In a bus organized system, the MAR connects directly to the address bus, and the MBR
connects directly to the data bus. User-visible registers, in turn, exchange data with the MBR. The
four registers just mentioned are used for the movement of data between the CPU and memory.
Within the CPU, data must be presented to the ALU for processing. The ALU may have direct
access to the MBR and user-visible registers. Alternatively, there may be additional buffering
registers at the boundary to the ALU: these registers serve as input and output registers for the ALU
and exchange data with the MBR and user-visible registers.

All CPU designs include a register or set of registers, often known as the program status word
(PSW), that contain status information. The PSW typically contains condition codes plus other
stains information. Common fields or flags include the following:

 Sign: Contains the sign bit of the result of the last arithmetic operation.
 Zero: Set when the result is 0.
 Carry: Set if an operation resulted in a carry (addition) into or borrow (subtraction) out of
a high-order hit. Used for multiword arithmetic operations.
 Equal: Set if a logical compare result is equality.
 Overflow: Used to indicate arithmetic overflow
 Interrupt enable/disable: Used to enable or disable interrupts.
 Supervisor: Indicates whether the CPU is executing in supervisor or user mode. Certain
privileged instructions can be executed only in supervisor mode, and certain areas of
memory can be accessed only in supervisor mode.

A number of other registers related to status and control might be found in a particular CPU design.
In addition to the PSW, there may be a pointer to a block of memory containing additional status
information (e.g., process control blocks).

Examples of register organization of the Central Processing Unit is shown in figure 3.4

37
Figure 3.4: Example of microprocessor registers organizations

3.2.3 Control Unit (CU): Control and timing section


This is the most complex section of the CPU. It affects and sequences all events within the CPU
and the entire microcomputer. As we already know every program instruction can be divided into
fetch, decode, and execute stages. Each of these stages can be further subdivided into a series of
tiny steps that might be referred to as a micro-program. The micro-program for each instruction
resides in the instruction-decoding section and is carried out by the control and timing section of
the CPU.

Control of the Processor:

As a result of our analysis in the preceding section, we have decomposed the behaviour or
functioning of the processor into elementary operations, called micro-operations. By reducing the
operation of the processor to its most fundamental level, we are able to define exactly what it is
that the control unit must cause to happen. Thus, we can define the functional requirements for the
control unit those functions that the control unit must perform. A definition of these functional
requirements is the basis for the design and implementation of the control unit.

With the information at hand, the following three-step process leads to a characterization of the
control unit:

38
 Define the basic elements of the processor.
 Describe the micro-operations that the processor performs.
 Determine the functions that the control unit must perform to cause the micro-operations
to be performed.

We have already performed steps 1 and 2. Let us summarize the results. First, the basic functional
elements of the processor are the following:

 ALU
 Registers
 Internal data paths
 External data paths
 Control unit

Some thought should convince you that this is a complete list. The ALU is the functional essence
of the computer. Registers are used to stoic data internal to the processor. Some registers contain
status information needed to manage instruction sequencing (e.g., a program status word). Others
contain data that go to or come from the ALU, memory, and I/O modules. Internal data paths are
used to move data between registers and between register and ALU. External data paths link
registers to memory and I/O modules, often by means of a system bus. The control unit causes
operations to happen within the processor.

The execution of a program consists of operations involving these processor elements. As we have
seen, these operations consist of a sequence of micro-operations. All micro-operations fall into one
of the following categories:

 Transfer data from one register to another.


 Transfer data from a register to an external interface (e.g., system bus).
 Transfer data from an external interface lo a register.
 Perform an arithmetic or logic operation, using registers for input and output.

All of the micro-operations needed to perform one instruction cycle, including all of the micro-
operations to execute every instruction in the instruction set, fall into one of these categories.

We can now be somewhat more explicit about the way in which the control unit functions. The
control unit performs two basic tasks:

 Sequencing: The control unit causes the processor lo step through a series of micro-
operations in the proper sequence, based on the program being executed.
39
 Execution: The control unit causes each micro-operation to be performed.

The preceding is a functional description of what the control unit does. The key to how the control
unit operates is the use of control signals.

Control Signals

We have defined the elements that make up the processor (ALU, registers, data paths) and the
micro-operations that are performed. For the control unit to perform its function, it must have inputs
that allow it to determine the slate of the system and outputs that allow it to control the behaviour
of the system. These are the external specifications of the control unit. Internally, the control unit
must have the logic required to perform its sequencing and execution functions.

Figure 3.5 is a general model of the control unit, showing all of its inputs and outputs.

Figure 3.5: Model of Control Unit

The inputs are as follows:

 Clock: This is how the control unit "keeps time." The control unit causes one micro-
operation (or a set of simultaneous micro-operations) to be performed for each clock pulse.
This is sometimes referred to as the processor cycle time or the clock cycle time.
 Instruction register: The Opcode of the current instruction is used to determine which
micro-operations lo perform during the execute cycle.
 Flags: These are needed by the control unit to determine the status of the processor and the
outcome of previous ALU operations. For example, for the increment and skip-if-zero (ISZ)
instruction, the control until will increment the PC if the zero flag is set.
 Control signals from control bus: The control bus portion of the system bus provides signals
to the control unit, such as interrupt signals and acknowledgments.

40
The outputs are as follows:

 Control signals within the processor: These are two types: those that cause data to be moved
from one register to another, and those that activate specific ALU functions.
 Control signals to control bus: These are also of two types: control signals lo memory, and
control signals lo the I/O modules.

The new element that has been introduced in this figure is the control signal. Three types of control
signals are used: those that activate an ALU function, those that activate a data path, and those that
are signals on the external system bus or other external interface. All of these signals are ultimately
applied directly as binary inputs lo individual logic gates.

Let us consider again the fetch cycle to see how the control unit maintains control. The control unit
keeps track of where it is in the instruction cycle. At a given point, it knows that the fetch cycle is
to be performed next. The first step is to transfer the contents of the PC to the MAR. The control
unit does this by activating the control signal that opens the gates between the bits of the PC and
the bits of the MAR. The next step is to read a word from memory into the MBR and increment
the PC. The control unit does this by sending the following control signals simultaneously:

 A control signal that opens gates, allowing the contents of the MAR onto the address bus
 A memory read control signal on the control bus
 A control signal that opens the gates, allowing the contents of the data bus to be stored in
the MBR
 Control signals to logic that adds 1 to the contents of the PC and store the result back to the
PC

Following this, the control unit sends a control signal that opens gates between the MBR and the
IR.

This completes the fetch cycle except for one thing: The control unit must decide whether to
perform an indirect cycle or an execute cycle next. To decide this, it examines the IR to see if an
indirect memory reference is made.

Basic concept of microprogramming of the processor:

Micro-operations: We have already seen that the programs are executed as a sequence of
instructions, each instruction consists of a series of steps that make up the instruction cycle fetch,
decode, etc. Each of these steps is, in turn, made up of a smaller series of steps called micro-
operations.

41
Micro-operation execution: Each step of the instruction cycle can be decomposed into micro-
operation primitives that are performed in a precise time sequence. Each micro-operation is
initiated and controlled based on the use of control signals / lines coming from the control unit.

 Controller the data to move from one register to another


 Controller the activate specific ALU functions

Micro-instruction: Each instruction of the processor is translated into a sequence of lower-level


micro-instructions. The process of translation and execution are to as microprogramming

Microprogramming: The concept of microprogramming was developed by Maurice Wilkes in


1951, using diode matrices for the memory element. A micro-program consists of a sequence of
micro-instructions in a microprogramming. Micro-programmed Control Unit is a relatively logic
circuit that is capable of sequencing through micro-instructions and generating control signal to
execute each micro-instruction.

Control Unit: The control Unit is an important portion of the processor. The control unit issues
control signals external to the processor to cause data exchange with memory and I/O unit. The
control Unit issues also control signals internal to the processor to move data between registers, to
perform the ALU and other internal operations in processor. In a hardwired control unit, the
controls signals are generated by a micro-instruction are used to controller register transfers and
ALU operations. Control Unit design is then the collection and the implementation of all of the
needed control signals for the micro-instruction executions.

Micro-Operation
The execution of a program consists of the sequential execution of instructions. Each instruction is
executed during an instruction cycle made up of shorter sub-cycles (e.g., fetch, indirect, execute,
interrupt). The performance of each sub-cycle involves one or more shorter operations, that is,
micro-operations. Micro-operations are the functional, or atomic, operations of a processor.

Figure 3.6: Constituent Elements of Program Execution


42
The Fetch Cycle
We begin by looking at the fetch cycle, which occurs at the beginning of each instruction cycle and
causes an instruction to be fetched from memory. Four registers are involved:
 Memory address register (MAR): Is connected to the address lines of the system bus. It
specifies the address in memory for a read or writes operation.
 Memory buffer register (MBR): Is connected to the data lines of the system bus. It contains
the value to be stored in memory or the last value read from memory.
 Program counter (PC): Holds the address of the next instruction to be fetched.
 Instruction register (IR): Holds the last instruction fetched.
Let us look at the sequence of events for the fetch cycle from the point of view of its effect on the
processor registers. An example appears in Figure 3.7.

Figure 3.7: Sequence of Events, Fetch Cycle

 At the beginning of the fetch cycle, the address of the next instruction to be executed is in
the program counter (PC); in this case, the address is 1100100.
 The first step is to move that address to the memory address register (MAR) because this is
the only register connected to the address lines of the system bus.
 The second step is to bring in the instruction. The desired address (in the MAR) is placed
on the address bus, the control unit issues a READ command on the control bus, and the
result appears on the data bus and is copied into the memory buffer register (MBR). We
also need to increment the PC by 1 to get ready for the next instruction. Because these two
actions (read word from memory, add 1 to PC) do not interfere with each other, we can do
them simultaneously to save time.
 The third step is to move the contents of the MBR to the instruction register (IR). This frees
up the MBR for use during a possible indirect cycle.

Thus, the simple fetch cycle actually consists of three steps and four micro-operations. Each micro-
operation involves the movement of data into or out of a register. So long as these movements do

43
not interfere with one another, several of them can take place during one step, saving lime.
Symbolically, we can write this sequence of events as follows:

t1: MAR <= (PC)

t2: MBR <= Memory

PC <= (PC) + l

t3: IR <= (MBR),

Where, l is the instruction length. We need to make several comments about this sequence. We
assume that a clock is available for timing purposes and that it emits regularly spaced clock pulses.
Each clock pulse defines a time unit. Thus, all time units are of equal duration. Each micro-
operation can be performed within the time of a single time unit. The notation (t1, t2, t3) represents
successive time units. In words, we have

 First time unit: Move contents of PC to MAR.


 Second time unit:
 Move contents of memory location specified by MAR to MBR.
 Increment by l the contents of the PC.
 Third time unit: Move contents of MBR to IR.

Note that the second and third micro-operations both take place during the second time unit. The
third micro-operation could have been grouped with the fourth with-out affecting the fetch
operation:

t1: MAR <= (PC)

t2: MBR <= Memory

t3: PC <= (PC) + l

IR <= (MBR)

The groupings of micro-operations must follow two simple rules:

1. The proper sequence of events must be followed. Thus (MAR <= (PC)) must proceed (MBR <=
Memory) because the memory read operation makes use of the address in the MAR.

44
2. Conflicts must be avoided. One should not attempt to read to and write from the same register
in one-time unit, because the results would be unpredictable. For example, the micro-operations
(MBR <= Memory) and (IR <= MBR) should not occur during the same time unit.

A final point worth noting is that one of the micro-operations involves an addition. To avoid
duplication of circuitry, this addition could be performed by the ALU. The use of the ALU may
involve additional micro-operations, depending on the functionality of the ALU and the
organization of the processor.

The Indirect Cycle

Once an instruction is fetched, the next step is to fetch source operands. Continuing our simple
example, let us assume a one-address instruction format, with direct and indirect addressing
allowed. If the instruction specifies an indirect address, then an indirect cycle must precede the
execute cycle. The data flow includes the following micro-operations:

t1: MAR <= (IR (Address))

t2: MBR <= Memory

t3: IR (Address) <= (MBR (Address))

The address field of the instruction is transferred to the MAR. This is then used to fetch the address
of the operand. Finally, the address field of the IR is updated from the MBR, so that it now contains
a direct rather than an indirect address. The IR is now in the same state as if indirect addressing
had not been used, and it is ready for the execute cycle.

The Execute Cycle

The fetch, indirect, and interrupt cycles are simple and predictable. Each involves a small, fixed
sequence of micro-operations and, in each ease, the same micro-opera-tions are repealed each time
around. This is not true of the execute cycle. For a machine with N different opcode, there are N
different sequences of micro-operations that can occur. Let us consider several hypothetical
examples.

First, consider an add instruction:

ADD R1, X; which adds the contents of the location X to register Rl. The following sequence of
micro-operations might occur:

t1: MAR <= (IR (address))

45
t2: MBR <= Memory

t3: Rl <= (Rl) + (MBR)

We begin with the IR containing the ADD instruction. In the first step, the address portion of the
IR is loaded into the MAR. Then the referenced memory location is read. Finally, the contents of
R1 and MBR are added by the ALU. Again, this is a simplified example. Additional micro-
operations may be required to extract the register reference from the IR and perhaps to stage the
ALU inputs or outputs in some intermediate registers. Let us look at two more complex examples.
A common instruction is increment and skips if zero:

ISZ X

The content of location X is incremented by 1. If the result is 0, the next instruction is skipped. A
possible sequence of micro-operations is

t1: MAR <= (CR (address))

t2: MBR <= Memory

t3: MBR <= (MBR) - 1

t4: Memory <= (MBR)

If ((MBR) = 0) then (PC <= (PC) + I)

The new feature introduced here is the conditional action. The PC is incremented if (MBR) = 0;
this test and action can be implemented as one micro-operation. Note also that this micro-operation
can be performed during the same time unit during which the updated value in MBR is stored back
to memory.

Finally, consider a subroutine call instruction. As an example, consider a branch-and-save-address


instruction:

BSA X

The address of the instruction that follows the BSA instruction is saved in location X, and execution
continues al location X - l. The saved address will later be used for return. This is a straightforward
technique for providing subroutine calls. the following micro-operations suffice:

t1: MAR <= (IR (address))

MBR <= (PC)

46
t2: PC <= (IR (address)) Memory <= (MBR)

t3: PC <= (PC) + I

The address in the PC at the start of the instruction is the address of the next instruction in sequence.
This is saved at the address designated in the IK. The latter address is also incremented to provide
the address of the instruction for the next instruction cycle.

3.2.4 Instruction Pipelining


An instruction has a number of stages. The various stages can be worked on simultaneously through
various blocks of production. This is a pipeline. This process is also referred as instruction
pipelining. Figure 3.8 shows the pipelining of two independent stages: fetch instruction and
execution instruction. The first stage fetches an instruction and buffers it. While the second stage
is executing the instruction, the first stage takes advantage of any unused memory cycles to fetch
and buffer the next instruction. This process will speed up instruction execution.

Figure 3.8: Two stages Instruction Pipeline

Pipeline basic principle:

The decomposition of the instruction processing by 6 stages is the following:

 Fetch Instruction (FI): Read the next expected introduction into a buffer
 Decode Instruction (DI): Determine the Opcode and the operand specified
 Calculate Operands (CO): Calculate the effective address of each source operand. This
may involve displacement; register indirect, indirect or other forms of address calculations.
 Fetch Operands (FO): Fetch each operand from memory. Operands in register need not be
fetched.
 Execute Instruction (EI): Perform the indicated operation and store the result, if any, in
the specified destination operand location.
 Write Operand (WO): Store result in memory.

47
Using the assumption of equal duration of time for various stages, figure 3.9 shows that a six stage
pipeline can be used to reduce the execution time for 9 instructions from 54 time units to 14 time
units.

Figure 3.9: Timing diagram for instruction pipeline operation

Also, the diagram assumes that all of the stages can be performed in parallel, in particular, it is
assumed that there are no memory conflicts. The processor makes use of instruction pipelining to
speed up executions, pipelining invokes breaking up the instruction cycle into a number of separate
stages in a sequence. However, the occurrence of branches and independencies between
instructions completes the design and use of pipeline.

Pipeline Performance and Limitations:

With the pipelining approach, as a form of parallelism, a “good” design goal of any system is to
have all of its components performing useful work all of the time, we can obtain a high efficiency.
The instruction cycle state diagram clearly shows the sequence of operations that take place in
order to execute a single instruction.

This strategy can give the following:

 Perform all tasks concurrently, but on different sequential instructions


 The result is temporal parallelism.
 Result is the instruction pipeline.

Pipeline performance:

The cycle time of an instruction pipeline can be determined as:

T=max [T i] + d = T m + d with 1 ik
48
Where:

Tm = Maximum stage delay through stage

k = number of stages in instruction pipeline

d = time delay of a latch.

In general, the time delay d is equivalent to a clock pulse and T m >> d. Suppose that n instruction
is processed with no branched.

 The total time required T k to execute all n instruction is:

Tk= [k + (n-1)]

 The speedup factor for the instruction pipeline compared to execution without the pipeline
is defined as:

S K = T1 /TK = nk τ / [k + (n − 1)] τ = nk /k + (n − 1)

 An ideal pipeline divides a task into k independent sequential subtasks


 Each subtask requires 1-time unit to complete
 The task itself then requires k time units to complete. For n iterations of the task, the
execution times will be:
 With no pipelining: nk time units
 With pipelining: k + (n-1) time units

Speedup of a k-stage pipeline is thus S = nk / [k+ (n-1)] ==> k (for large n)

Pipeline Limitations:

Several factors serve to limit the pipeline performance. If the six stages are not of equal duration,
there will be some waiting involved at various pipeline stages. Another difficulty is the condition
branch instruction or the unpredictable event is an interrupt. Other problem arises that the memory
conflicts could occur. So, the system must contain logic to account for the type of conflict.

 Pipeline depth
 Data dependencies also factor into the effective length of pipelines
 Logic to handle memory and register use and to control the overall pipeline increases
significantly with increasing pipeline depth
 If the speedup is based on the number of stages, why not build lots of stages?
 Each stage uses latches at its input (output) to buffer the next set of inputs
49
 Data dependencies
 Pipelining must insure that computed results are the same as if computation was performed
in strict sequential order
 With multiple stages, two instructions “in execution” in the pipeline may have data
dependencies. So, we must design the pipeline to prevent this.
 Data dependency examples:

A= B+C

D=E+A

C=GxH

A= D/H

Data dependencies limit when an instruction can be input to the pipeline.

 Branching

One of the major problems in designing an instruction pipeline is assuring a steady flow of
instructions to initial stages of the pipeline. However, 15-20% of instructions in an assembly-level
stream are (conditional) branches. Of these, 60-70% takes the branch to a target address. Until the
instruction is actually executed, it is impossible to determine whether the branch will be taken or
not.

 Prefetch branch target


 When the branch instruction is decoded, begin to fetch the branch target instruction and
place in a second Prefetch buffer
 If the branch is not taken, the sequential instructions are already in the pipe, so there is not
loss of performance
 If the branch is taken, the next instruction has been prefetched and results in minimal branch
penalty (don’t have to incur a memory read operation at the end of the branch to fetch the
instruction)
 Delayed branch
 Minimize the branch penalty by finding valid instructions to execute in the pipeline while
the branch address is being resolved.
 It is possible to improve performance by automatically rearranging instruction within a
program, so that branch instruction occur later than actually desired

50
 Compiler is tasked with reordering the instruction sequence to find enough independent
instructions (write to the conditional branch) to feed into the pipeline after the branch that
the branch penalty is reduced to zero

3.2.5 RISC and CISC Instruction Set Architecture


There are two types of fundamental CPU architecture: Complex Instruction Set Computers (CISC)
and Reduced Instruction Set Computers (RISC). CISC is the most prevalent and established
microprocessor architecture, while RISC is a relative newcomer. Intel’s 80x86 and Pentium
microprocessor families are CISC-based, although RISC-type functionality has been incorporated
into Pentium CPUs. Motorola’s 68000 families of microprocessors is another example of this type
of architecture. Sun Microsystems’ SPARC microprocessors and MIPS R2000, R3000 and R4000
families dominate the RISC end of the market; however, Motorola’s PowerPC, G4, Intel’s i860,
and Analog Devices Inc.’s digital signal processors (DSP) are in wide use. In the PC/Workstation
market, Apple Computers and Sun employ RISC microprocessors as their choice of CPU.

The difference between the two architectures is the relative complexity of the instruction sets and
underlying electronic and logic circuits in CISC microprocessors. For example, the original RISC
I prototype had just 31 instructions, while the RISC II had 39. In the RISC II prototype, these
instructions are hard-wired into the microprocessor using 41,000 integrated transistors, so that
when a program instruction is presented for execution it can be processed immediately. This
typifies the pure RISC approach, which results in up-to-a fourfold increase in processing power
over comparable CISC processors.

In contrast, the Intel 386 has 280,000 and uses microcode stored in on-board ROM to process the
instructions. Complex instructions have to be first decoded in order to identify which microcode
routine needs to be executed to implement the instructions. The Pentium II used 9.5 million
transistors and while older microcode is retained, the most frequently used and simpler instructions,
such as MMX, are hardwired. Thus, Pentium CPUs are essentially a hybrid; however they are still
classified as RISC as their basic instructions are complex. Remember the internal transistor logic
gates in a CPU are opened and closed under the control of clock pulses (i.e. electrical voltage values
of 0 or 5 V (volts) being 0 or 1). These simply process the binary machine code or data by producing
predetermined outputs for given inputs. Machine code or instructions (the binary equivalent of
high-level programming code) control the operation of the CPU so that logical or mathematical
operations can be executed. In CISC processors, complex instructions are first decoded and the
corresponding microcode routine dispatched to the execution unit. The decode activity can take
several clock cycles depending on the complexity of the instruction. In the 1970s, an IBM engineer

51
discovered that 20% of the instructions were doing 80% of the work in a typical CPU. In addition,
he found that a collection of simple instructions could perform the same operation as a complex
instruction in less clock cycles. This led him to propose an architecture based on reduced instruction
set size, where small instructions could be executed without decoding and in parallel with others.
As indicated, this simplified CPU design and made for faster processing of instructions with
reduced overhead in terms of clock cycles.

CISC RISC

Large instruction set Compact instruction set

Complex, powerful instructions Simple hard-wired machine code and


control unit

Instruction sub-commands micro-coded in Pipelining of instructions


on board ROM

Compact and versatile register set Numerous registers

Numerous memory addressing options for Compiler and IC developed simultaneously


operands

3.3 Computer Arithmetic


The two principal concerns for computer arithmetic are the way in which numbers are represented
(the binary format) and the algorithms used for the basic arithmetic operations (add, subtract,
multiply, divide).
Computer arithmetic is commonly performed on two very different types of numbers:
 Integer and
 Floating point.
In both cases, the representation chosen is a crucial design issue for the computer system. Floating-
point numbers are expressed as a number (significand) multiplied by a constant (base) rose to some
integer power (exponent). Floating point numbers can be used to represent very large and very
small numbers. Most processors implement the IEEE 754 standard for floating-point representation
and floating-point arithmetic. IEEE 754 defines both a 32-bit and a 64-bit format.

52
3.3.1 Arithmetic Logic Unit (ALU)
The ALU is that part of the processor of a computer that actually performs arithmetic and logical
operations on data. All of the other elements of the computer system control unit, registers,
memory, I/O are there mainly to bring data into the ALU for it to process and then to take the
results back out. The CPU’s Arithmetic and Logic Units (ALU) performs operations such as add,
shift/rotate, compare, increment, decrement, negate, AND, OR, XOR, compliment, clear, and
preset. If the ALU were directed to add using the ADD instruction, the procedure might appear
something like that diagrammed in figure 3.10.

Figure 3.10: The ALU executing ADD instruction

Here the accumulator contents (0A16 = 0000 1010) are being added to the temporary register
contents (0516 = 0000 0101). The sum (0F16 = 00001111) is then placed back into the accumulator.

Figure 3.11 indicate some of the functional sections in a typical Arithmetic and Logic Unit.

Figure 3.11: Organization of the ALU

53
The ALU contains an adder and shifter, with the result being fed back into the accumulator via the
internal data bus. The status register (condition code register) is the most important to the
programmer are used to set or reset based on the conditions created by the last ALU operation.
These are also referred to as flags. The flags include indicators for zero, negative results, carry from
the MSB. The flags are used for decision making when using subsequent branching instructions.

3.4 Computer Memory and Data Representation


Computer uses a fixed number of bits to represent a piece of data, which could be a number, a
character, or others. An n-bit storage location can represent up to 2^n distinct entities. For example,
a 3-bit memory location can hold one of these eight binary
patterns: 000, 001, 010, 011, 100, 101, 110, or 111. Hence, it can represent at most 8 distinct
entities. They can be used to represent numbers 0 to 7, numbers 8881 to 8888, characters 'A' to 'H',
or up to 8 kinds of fruits like apple, orange, banana; or up to 8 kinds of animals like lion, tiger, etc

Integers, for example, can be represented in 8-bit, 16-bit, 32-bit or 64-bit. The programmer, choose
an appropriate bit-length for integers. The choice will impose constraint on the range of integers
that can be represented. Besides the bit-length, an integer can be represented in
various representation schemes, e.g., unsigned vs. signed integers. An 8-bit unsigned integer has a
range of 0 to 255, while an 8-bit signed integer has a range of -128 to 127 - both representing 256
distinct numbers.

It is important to note that a computer memory location merely stores a binary pattern. It is entirely
up to the programmer, to decide on how these patterns are to be interpreted. For example, the 8-bit
binary pattern "0100 0001B" can be interpreted as an unsigned integer 65, or an ASCII
character 'A', or some secret information known only to you. In other words, you have to first
decide how to represent a piece of data in a binary pattern before the binary patterns make sense.
The interpretation of binary pattern is called data representation or encoding. Furthermore, it is
important that the data representation schemes are agreed-upon by all the parties, i.e., industrial
standards need to be formulated and strictly followed.

Once you decided on the data representation scheme, certain constraints, in particular, the precision
and range will be imposed. Hence, it is important to understand data representation to
write correct and high-performance programs.

54
3.4.1 Integer Representation
Integers are whole numbers or fixed-point numbers with the radix point fixed after the least-
significant bit. They are contrast to real numbers or floating-point numbers, where the position of
the radix point varies. It is important to take note that integers and floating-point numbers are
treated differently in computers. They have different representation and are processed differently
(e.g., floating-point numbers are processed in a so-called floating-point processor).
Computers use a fixed number of bits to represent an integer. The commonly-used bit-lengths for
integers are 8-bit, 16-bit, 32-bit or 64-bit. Besides bit-lengths, there are two representation schemes
for integers:
1. Unsigned Integers: can represent zero and positive integers.
2. Signed Integers: can represent zero, positive and negative integers. Three representation
schemes had been proposed for signed integers:

a. Sign-Magnitude representation

b. 1's Complement representation

c. 2's Complement representation

The programmer, need to decide on the bit-length and representation scheme for integers,
depending on application's requirements. Suppose that a counter is needed for counting a small
quantity from 0 up to 200, an 8-bit unsigned integer scheme might be chosen as there is no negative
numbers involved.

N-bit Unsigned Integers:

Unsigned integers can represent zero and positive integers, but not negative integers. The value of
an unsigned integer is interpreted as "the magnitude of its underlying binary pattern".

Example 1: Suppose that n=8-bits and the binary pattern is 0100 0001B, the value of this
unsigned integer is 1×2^0 + 1×2^6 = 65D.

Example 2: Suppose that n=16 and the binary pattern is 0001 0000 0000 1000B, the value of this
unsigned integer is 1×2^3 + 1×2^12 = 4104D.

Example 3: Suppose that n=16 and the binary pattern is 0000 0000 0000 0000B, the value of this
unsigned integer is 0.

An n-bit pattern can represent 2^n distinct integers. An n-bit unsigned integer can represent
integers from 0 to (2^n)-1, as tabulated below:

55
n Minimum Maximum

8 0 (2^8)-1 (=255)
16 0 (2^16)-1 (=65,535)
32 0 (2^32)-1 (=4,294,967,295) (9+ digits)

64 0 (2^64)-1 (=18,446,744,073,709,551,615) (19+ digits)

Signed Integers:

Signed integers can represent zero, positive integers, as well as negative integers. Three
representation schemes are available for signed integers:

1. Sign-Magnitude representation

2. 1's Complement representation

3. 2's Complement representation

In all the above three schemes, the most-significant bit (msb) is called the sign bit. The sign bit is
used to represent the sign of the integer - with 0 for positive integers and 1 for negative integers.
The magnitude of the integer, however, is interpreted differently in different schemes.

N-bit Sign Integers in Sign-Magnitude Representation:

In sign-magnitude representation:

The most-significant bit (msb) is the sign bit, with value of 0 representing positive integer and 1
representing negative integer.

The remaining n-1 bits represent the magnitude (absolute value) of the integer. The absolute value
of the integer is interpreted as "the magnitude of the (n-1)-bit binary pattern".

Example 1: Suppose that n=8-bits and the binary representation is 0 100 0001B.
Sign bit is 0 ⇒ positive
Absolute value is 100 0001B = 65D
Hence, the integer is +65D

Example 2: Suppose that n=8 and the binary representation is 1 000 0001B.
Sign bit is 1 ⇒ negative
Absolute value is 000 0001B = 1D
Hence, the integer is -1D

56
Example 3: Suppose that n=8 and the binary representation is 0 000 0000B.
Sign bit is 0 ⇒ positive
Absolute value is 000 0000B = 0D
Hence, the integer is +0D

Example 4: Suppose that n=8 and the binary representation is 1 000 0000B.
Sign bit is 1 ⇒ negative
Absolute value is 000 0000B = 0D
Hence, the integer is -0D

The drawbacks of sign-magnitude representation are:

1. There are two representations (0000 0000B and 1000 0000B) for the number zero, which
could lead to inefficiency and confusion.

2. Positive and negative integers need to be processed separately.

N-bit Sign Integers in 1's Complement Representation:

In 1's complement representation:

 Again, the most significant bit (msb) is the sign bit, with value of 0 representing positive
integers and 1 representing negative integers.
 The remaining n-1 bits represents the magnitude of the integer, as follows:
o For positive integers, the absolute value of the integer is equal to "the magnitude of the
(n-1)-bit binary pattern".
o For negative integers, the absolute value of the integer is equal to "the magnitude of
the complement (inverse) of the (n-1)-bit binary pattern" (hence called 1's complement).
Example 1: Suppose that n=8 and the binary representation 0 100 0001B.
Sign bit is 0 ⇒ positive
Absolute value is 100 0001B = 65D
Hence, the integer is +65D
Example 2: Suppose that n=8 and the binary representation 1 000 0001B.
Sign bit is 1 ⇒ negative
Absolute value is the complement of 000 0001B, i.e., 111 1110B = 126D
Hence, the integer is -126D
Example 3: Suppose that n=8 and the binary representation 0 000 0000B.
Sign bit is 0 ⇒ positive

57
Absolute value is 000 0000B = 0D
Hence, the integer is +0D
Example 4: Suppose that n=8 and the binary representation 1 111 1111B.
Sign bit is 1 ⇒ negative
Absolute value is the complement of 111 1111B, i.e., 000 0000B = 0D
Hence, the integer is -0D

Again, the drawbacks are:

1. There are two representations (0000 0000B and 1111 1111B) for zero.

2. The positive integers and negative integers need to be processed separately.

N-bit Sign Integers in 2's Complement Representation:

In 2's complement representation:

 Again, the most significant bit (msb) is the sign bit, with value of 0 representing positive
integers and 1 representing negative integers.
 The remaining n-1 bits represents the magnitude of the integer, as follows:
o For positive integers, the absolute value of the integer is equal to "the magnitude of the
(n-1)-bit binary pattern".
o For negative integers, the absolute value of the integer is equal to "the magnitude of
the complement of the (n-1)-bit binary pattern plus one" (hence called 2's complement).
Example 1: Suppose that n=8-bits and the binary representation 0 100 0001B.
Sign bit is 0 ⇒ positive
Absolute value is 100 0001B = 65D
Hence, the integer is +65D
Example 2: Suppose that n=8-bits and the binary representation 1 000 0001B.
Sign bit is 1 ⇒ negative
Absolute value is the complement of 000 0001B plus 1, i.e., 111 1110B + 1B = 127D
Hence, the integer is -127D
Example 3: Suppose that n=8-bits and the binary representation 0 000 0000B.
Sign bit is 0 ⇒ positive
Absolute value is 000 0000B = 0D
Hence, the integer is +0D
Example 4: Suppose that n=8 and the binary representation 1 111 1111B.
Sign bit is 1 ⇒ negative

58
Absolute value is the complement of 111 1111B plus 1, i.e., 000 0000B + 1B = 1D
Hence, the integer is -1D

Computers use 2's Complement Representation for Signed Integers:

With the three representations discussed for signed integers: signed-magnitude, 1's complement
and 2's complement. Computers use 2's complement in representing signed integers. This is
because:

1. There is only one representation for the number zero in 2's complement, instead of two
representations in sign-magnitude and 1's complement.

2. Positive and negative integers can be treated together in addition and subtraction.
Subtraction can be carried out using the "addition logic".

Example 1: Addition of Two Positive Integers: Suppose that n=8, 65D + 5D = 70D

65D → 0100 0001B

5D → 0000 0101B (+

0100 0110B → 70D (OK)

Example 2: Subtractio n is treated as Addition of a Positiv e and a Neg ative


Integers: Suppose that n=8, 65D - 5D = 65D + (-5D) = 60D

65D → 0100 0001B

-5D → 1111 1011B (+

0011 1100B → 60D (discard carry - OK)

Example 3: Addition of Two Negative Integers: Suppose that n=8, -65D - 5D = (-65D) +
(-5D) = -70D

-65D → 1011 1111B

-5D → 1111 1011B (+

1011 1010B → -70D (discard carry - OK)

Because of the fixed precision (i.e., fixed number of bits), an n-bit 2's complement signed integer
has a certain range. For example, for n=8, the range of 2's complement signed integers is -
128 to +127. During addition (and subtraction), it is important to check whether the result exceeds
this range, in other words, whether overflow or underflow has occurred.

59
Example 4: Overflo w: Suppose that n=8, 127D + 2D = 129D (overflow - beyond the range)

127D → 0111 1111B

2D → 0000 0010B (+

1000 0001B → -127D (wrong)

Example 5: Underflo w: Suppose that n=8, -125D - 5D = -130D (underflow - below the range)

-125D → 1000 0011B

-5D → 1111 1011B (+

0111 1110B → +126D (wrong)

Range of n-bit 2's Complement Signed Integers:

An n-bit 2's complement signed integer can represent integers from -2^(n-1) to +2^(n-1)-1, as
tabulated. Take note that the scheme can represent all the integers within the range, without any
gap. In other words, there are no missing integers within the supported range.

n minimum maximum
8 -(2^7) (=-128) +(2^7)-1 (=+127)
16 -(2^15) (=-32,768) +(2^15)-1 (=+32,767)

32 -(2^31) (=-2,147,483,648) +(2^31)-1 (=+2,147,483,647) (9+ digits)


64 -(2^63) (=- +(2^63)-1 (=+9,223,372,036,854,775,807)
9,223,372,036,854,775,808) (18+ digits)

Decoding 2's Complement Numbers:

1. Check the sign bit (denoted as S).


2. If S=0, the number is positive and its absolute value is the binary value of the
remaining n-1 bits.
3. If S=1, the number is negative. you could "invert the n-1 bits and plus 1" to get the
absolute value of negative number.
Alternatively, you could scan the remaining n-1 bits from the right (least-significant bit).
Look for the first occurrence of 1. Flip all the bits to the left of that first occurrence of 1.
The flipped pattern gives the absolute value. For example,

60
n = 8, bit pattern = 1 100 0100B
S = 1 → negative
Scanning from the right and flip all the bits to the left of the first occurrence of 1
⇒ 011 1100B = 60D
Hence, the value is -60D

Addition and Subtraction:

Addition proceeds as if the two numbers were unsigned integers. If the result of the operation is
positive, we get a positive number in twos complement form, which is the same as in unsigned-
integer form. If the result of the operation is negative, we get a negative number in twos
complement form. On any addition, the result may be larger than can be held in the word size being
used. This condition is called overflow. When overflow occurs, the ALU must signal this fact so
that no attempt is made to use the result. To detect overflow, the following rule is observed:
OVERFLOW RULE: If two numbers are added, and they are both positive and both negative, then
overflow occurs if and only if the result has the opposite sign.

SUBTRACTION RULE: To subtract one number (subtrahend) from another (minuend), take the
twos complement (negation) of the subtrahend and add it to the minuend. Therefore, subtraction is
achieved using addition, as illustrated below.

61
Figure 3.12 shows the data paths and hardware elements needed to accomplish addition and
subtraction. The central element is a binary adder, which is presented two numbers for addition and
produces a sum and an overflow indication. The binary adder treats the two numbers as unsigned
integers. For addition, the two numbers are presented to the adder from two registers, designated
in this case as A and B registers. The result may be stored in one of these registers or in a third.
The overflow indication is stored in a 1-bit overflow flag (overflow ;). For subtraction, the
subtrahend (B register) is passed through a twos complementor so that its twos complement is
presented to the adder.

Figure 3.12 Block diagram of Hardware for addition and subtraction

Multiplication:

62
Compared with addition and subtraction, multiplication is a complex operation, whether performed
in hardware or software. A wide variety of algorithms have been used in various computers.

Several important observations can be made:


1. Multiplication involves the generation of partial products, one for each digit in the
multiplier. These partial products are then summed to produce the final product.
2. The partial products are easily defined. When the multiplier bit is 0, the partial product is
0. When the multiplier is 1, the partial product is the multiplicand.
3. The total product is produced by summing the partial products. For this operation, each
successive partial product is shifted one position to the left relative to the preceding partial
product.
4. The multiplication of two n-bit binary integers results in a product of up to 2n bits in length
(e.g., 11 × 11 = 1001).

Big Endian vs. Little Endian:

Modern computers store one byte of data in each memory address or location, i.e., byte addressable
memory. A 32-bit integer is, therefore, stored in 4 memory addresses. The term "Endian" refers to
the order of storing bytes in computer memory. In "Big Endian" scheme, the most significant byte
is stored first in the lowest memory address (or big in first), while "Little Endian" stores the least
significant bytes in the lowest memory address.

For example, the 32-bit integer 12345678H (2215053170 10) is stored as 12H 34H 56H 78H in big
endian; and 78H 56H 34H 12H in little endian. A 16-bit integer 00H 01H is interpreted as 0001H
in big endian, and 0100H as little endian.

3.4.2 Floating-Point Number Representation


A floating-point number (or real number) can represent a very large (1.23×10^88) or a very small
(1.23×10^-88) value. It could also represent very large negative number (-1.23×10^88) and very
small negative number (-1.23×10^88), as well as zero, as illustrated:

63
A floating-point number is typically expressed in the scientific notation, with a fraction (F), and
an exponent (E) of a certain radix (r), in the form of F×r^E. Decimal numbers use radix of 10
(F×10^E); while binary numbers use radix of 2 (F×2^E).
Representation of floating-point number is not unique. For example, the number 55.66 can be
represented as 5.566×10^1, 0.5566×10^2, 0.05566×10^3, and so on. The fractional part can
be normalized. In the normalized form, there is only a single non-zero digit before the radix point.
For example, decimal number 123.4567 can be normalized as 1.234567×10^2; binary
number 1010.1011B can be normalized as 1.0101011B×2^3.
It is important to note that floating-point numbers suffer from loss of precision when represented
with a fixed number of bits (e.g., 32-bit or 64-bit). This is because there are infinite number of real
numbers (even within a small range of says 0.0 to 0.1). On the other hand, a n-bit binary pattern
can represent a finite 2^n distinct numbers. Hence, not all the real numbers can be represented. The
nearest approximation will be used instead, resulted in loss of accuracy.
It is also important to note that floating number arithmetic is very much less efficient than integer
arithmetic. It could be speed up with a so-called dedicated floating-point co-processor. Hence, use
integers if your application does not require floating-point numbers.
In computers, floating-point numbers are represented in scientific notation of fraction (F)
and exponent (E) with a radix of 2, in the form of F×2^E. Both E and F can be positive as well as
negative. Modern computers adopt IEEE 754 standard for representing floating-point numbers.
There are two representation schemes: 32-bit single-precision and 64-bit double-precision.

IEEE-754 32-bit Single-Precision Floating-Point Numbers:

In 32-bit single-precision floating-point representation:

 The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative
numbers.
 The following 8 bits represent exponent (E).
 The remaining 23 bits represents fraction (F).

64
Normalized Form:

Let's illustrate with an example, suppose that the 32-bit pattern is 1 1000 0001 011 0000 0000 0000
0000 0000, with:
 S=1
 E = 1000 0001
 F = 011 0000 0000 0000 0000 0000
In the normalized form, the actual fraction is normalized with an implicit leading 1 in the form
of 1.F. In this example, the actual fraction is 1.011 0000 0000 0000 0000 0000 = 1 + 1×2^-2 +
1×2^-3 = 1.375D.
The sign bit represents the sign of the number, with S=0 for positive and S=1 for negative number.
In this example with S=1, this is a negative number, i.e., -1.375D.
In normalized form, the actual exponent is E-127 (so-called excess-127 or bias-127). This is
because we need to represent both positive and negative exponent. With an 8-bit E, ranging from
0 to 255, the excess-127 scheme could provide actual exponent of -127 to 128. In this example, E-
127=129-127=2D.
Hence, the number represented is -1.375×2^2=-5.5D.

De-Normalized Form:

Normalized form has a serious problem, with an implicit leading 1 for the fraction; it cannot
represent the number zero! Convince yourself on this!

De-normalized form was devised to represent zero and other numbers.

For E=0, the numbers are in the de-normalized form. An implicit leading 0 (instead of 1) is used
for the fraction; and the actual exponent is always -126. Hence, the number zero can be represented
with E=0and F=0 (because 0.0×2^-126=0).
We can also represent very small positive and negative numbers in de-normalized form with E=0.
For example, if S=1, E=0, and F=011 0000 0000 0000 0000 0000. The actual fraction
is0.011=1×2^-2+1×2^-3=0.375D. Since S=1, it is a negative number. With E=0, the actual
exponent is -126. Hence the number is -0.375×2^-126 = -4.4×10^-39, which is an extremely small
negative number (close to zero).
65
Summary

In summary, the value (N) is calculated as follows:


 For 1 ≤ E ≤ 254, N = (-1) ^S × 1.F × 2^(E-127). These numbers are in the so-
called normalized form. The sign-bit represents the sign of the number. Fractional part (1.F)
are normalized with an implicit leading 1. The exponent is bias (or in excess) of 127, so as to
represent both positive and negative exponent. The range of exponent is -126 to +127.
 For E = 0, N = (-1) ^S × 0.F × 2^ (-126). These numbers are in the so-
called denormalized form. The exponent of 2^-126 evaluates to a very small number.
Denormalized form is needed to represent zero (with F=0 and E=0). It can also represent very
small positive and negative number close to zero.
 For E = 255, it represents special values, such as ±INF (positive and negative infinity)
and NaN (not a number). This is beyond the scope of this article.
Example 1: Suppose that IEEE-754 32-bit floating-point representation pattern
is 0 10000000 110 0000 0000 0000 0000 0000.

Sign bit S = 0 ⇒ positive number

E = 1000 0000B = 128D (in normalized form)

Fraction is 1.11B (with an implicit leading 1) = 1 + 1×2^-1 + 1×2^-2 = 1.75D

The number is +1.75 × 2^ (128-127) = +3.5D

Example 2: Suppose that IEEE-754 32-bit floating-point representation pattern


is 1 01111110 100 0000 0000 0000 0000 0000.

Sign bit S = 1 ⇒ negative number

E = 0111 1110B = 126D (in normalized form)

Fraction is 1.1B (with an implicit leading 1) = 1 + 2^-1 = 1.5D

The number is -1.5 × 2^ (126-127) = -0.75D

Example 3: Suppose that IEEE-754 32-bit floating-point representation pattern


is 1 01111110 000 0000 0000 0000 0000 0001.

Sign bit S = 1 ⇒ negative number

E = 0111 1110B = 126D (in normalized form)

Fraction is 1.000 0000 0000 0000 0000 0001B (with an implicit leading 1) = 1 + 2^-23

66
The number is -(1 + 2^-23) × 2^ (126-127) = -0.500000059604644775390625 (may not be exact
in decimal!)

Example 4 (De-Normalized Form): Suppose that IEEE-754 32-bit floating-point


representation pattern is 1 00000000 000 0000 0000 0000 0000 0001.

Sign bit S = 1 ⇒ negative number

E = 0 (in de-normalized form)

Fraction is 0.000 0000 0000 0000 0000 0001B (with an implicit leading 0) = 1×2^-23

The number is -2^-23 × 2^ (-126) = -2× (-149) ≈ -1.4×10^-45

IEEE-754 64-bit Double-Precision Floating-Point Numbers:

The representation scheme for 64-bit double-precision is similar to the 32-bit single-precision:

 The most significant bit is the sign bit (S), with 0 for positive numbers and 1 for negative
numbers.
 The following 11 bits represent exponent (E).
 The remaining 52 bits represents fraction (F).

The value (N) is calculated as follows:


 Normalized form: For 1 ≤ E ≤ 2046, N = (-1) ^S × 1.F × 2^(E-1023).
 Denormalized form: For E = 0, N = (-1) ^S × 0.F × 2^ (-1022). These are in the denormalized
form.
 For E = 2047, N represents special values, such as ±INF (infinity), NaN (not a number).
The rules for converting a decimal number into floating point are as follows:
A. Convert the absolute value of the number to binary, perhaps with a fractional part after the
binary point. This can be done by converting the integral and fractional parts separately.
The integral part is converted with the techniques examined previously. The fractional part
can be converted by multiplication. This is basically the inverse of the division method: we
repeatedly multiply by 2, and harvest each one bit as it appears left of the decimal.

67
B. Append × 2^0to the end of the binary number (which does not change its value).

C. Normalize the number. Move the binary point so that it is one bit from the left. Adjust the
exponent of two so that the value does not change.
D. Place the mantissa into the mantissa field of the number. Omit the leading one, and fill with
zeros on the right.
E. Add the bias to the exponent of two, and place it in the exponent field. The bias is 2k−1 − 1,
where k is the number of bits in the exponent field. For the eight-bit format, k = 3, so the
bias is 23−1 − 1 = 3. For IEEE 32-bit, k = 8, so the bias is 28−1 − 1 = 127.
F. Set the sign bit, 1 for negative, 0 for positive, according to the sign of the original number.

Examples:

 Convert 2.625 to our 8-bit floating point format.


A. The integral part is easy, 2 10 = 102. For the fractional part:

0.625 × 2 = 1.25 1 Generate 1 and continue with the rest.


0.25 × 2 = 0.5 0 Generate 0 and continue.
0.5 × 2 = 1.0 1 Generate 1 and nothing remains.

B. So, 0.62510 = 0.1012, and 2.62510 = 10.1012.


C. Add an exponent part: 10.101 2 = 10.1012 × 20.

D. Normalize: 10.1012 × 20 = 1.01012 × 21.

E. Mantissa: 0101

F. Exponent: 1 + 3 = 4 = 100 2.
G. Sign bit is 0.

The result is 01000101. Represented as hex, that is 45 16.

 Convert -4.75 to our 8-bit floating point format.


a. The integral part is 410 = 1002. The fractional:

0.75 × 2 = 1.5 1 Generate 1 and continue with the rest.

0.5 × 2 = 1.0 1 Generate 1 and nothing remains.

b. So, 4.7510 = 100.112.


c. Normalize: 100.112 = 1.00112 × 22.

68
d. Mantissa is 0011, exponent is 2 + 3 = 5 = 1012, sign bit is 1.

So, -4.75 is 11010011 = d316

 Convert 0.40625 to our 8-bit floating point format.

e. Converting:

0.40625 × 2 = 0.8125 0 Generate 0 and continue.


0.8125 × 2 = 1.625 1 Generate 1 and continue with the rest.
0.625 × 2 = 1.25 1 Generate 1 and continue with the rest.
0.25 × 2 = 0.5 0 Generate 0 and continue.
0.5 × 2 = 1.0 1 Generate 1 and nothing remains.

f. So, 0.4062510 = 0.011012.


g. Normalize: 0.01101 2 = 1.1012 × 2-2.
h. Mantissa is 1010, exponent is -2 + 3 = 1 = 0012, sign bit is 0.

So, 0.40625 is 00011010 = 1a16

 Convert -12.0 to our 8-bit floating point format.


o 1210 = 11002.

o Normalize: 1100.02 = 1.12 × 23.

o Mantissa is 1000, exponent is 3 + 3 = 6 = 110 2, sign bit is 1.

So, -12.0 is 11101000 = e816

 Convert decimal 1.7 to our 8-bit floating point format.


o The integral part is easy, 1 10 = 12. For the fractional part:

0.7 × 2 = 1.4 1 Generate 1 and continue with the rest.

0.4 × 2 = 0.8 0 Generate 0 and continue.


0.8 × 2 = 1.6 1 Generate 1 and continue with the rest.
0.6 × 2 = 1.2 1 Generate 1 and continue with the rest.

0.2 × 2 = 0.4 0 Generate 0 and continue.

0.4 × 2 = 0.8 0 Generate 0 and continue.

69
0.8 × 2 = 1.6 1 Generate 1 and continue with the rest.
0.6 × 2 = 1.2 1 Generate 1 and continue with the rest.

o The reason why the process seems to continue endlessly is that it does. The number
7/10, which makes a perfectly reasonable decimal fraction, is a repeating fraction in
binary, just as the fraction 1/3 is a repeating fraction in decimal. (It repeats in binary
as well.) We cannot represent this exactly as a floating-point number. The closest we
can come in four bits is .1011. Since we already have a leading 1, the best eight-bit
number we can make is 1.1011.
o Already normalized: 1.1011 2 = 1.10112 × 20.
o Mantissa is 1011, exponent is 0 + 3 = 3 = 011 2, sign bit is 0.

The result is 00111011 = 3b16. This is not exact, of course. If you convert it back to
decimal, you get 1.6875.

 Convert -1313.3125 to IEEE 32-bit floating point format.


o The integral part is 1313 10 = 101001000012. The fractional:

0.3125 × 2 = 0.625 0 Generate 0 and continue.


0.625 × 2 = 1.25 1 Generate 1 and continue with the rest.

0.25 × 2 = 0.5 0 Generate 0 and continue.


0.5 × 2 = 1.0 1 Generate 1 and nothing remains.

o So, 1313.312510 = 10100100001.0101 2.


o Normalize: 10100100001.0101 2 = 1.01001000010101 2 × 210.
o Mantissa is 01001000010101000000000, exponent is 10 + 127 = 137 = 10001001 2,
sign bit is 1.

So, -1313.3125 is 11000100101001000010101000000000 = c4a42a0016

 Convert 0.1015625 to IEEE 32-bit floating point format.


o Converting:

0.1015625 × 2 = 0.203125 0 Generate 0 and continue.

0.203125 × 2 = 0.40625 0 Generate 0 and continue.

70
0.40625 × 2 = 0.8125 0 Generate 0 and continue.
0.8125 × 2 = 1.625 1 Generate 1 and continue with the rest.
0.625 × 2 = 1.25 1 Generate 1 and continue with the rest.

0.25 × 2 = 0.5 0 Generate 0 and continue.

0.5 × 2 = 1.0 1 Generate 1 and nothing remains.

o So, 0.101562510 = 0.00011012.

o Normalize: 0.0001101 2 = 1.1012 × 2-4.


o Mantissa is 10100000000000000000000, exponent is -4 + 127 = 123 = 01111011 2,
sign bit is 0.

So, 0.1015625 is 00111101110100000000000000000000 = 3dd0000016

 Convert 39887.5625 to IEEE 32-bit floating point format.


o The integral part is 39887 10 = 1001101111001111 2. The fractional:

0.5625 × 2 = 1.125 1 Generate 1 and continue with the rest.


0.125 × 2 = 0.25 0 Generate 0 and continue.
0.25 × 2 = 0.5 0 Generate 0 and continue.
0.5 × 2 = 1.0 1 Generate 1 and nothing remains.

o So, 39887.562510 = 1001101111001111.1001 2.

o Normalize: 1001101111001111.1001 2 = 1.0011011110011111001 2 × 215.


o Mantissa is 00110111100111110010000, exponent is 15 + 127 = 142 = 10001110 2,
sign bit is 0.

So, 39887.5625 is 01000111000110111100111110010000 = 471bcf9016

3.5 Floating-point Arithmetic


For addition and subtraction, it is necessary to ensure that both operands have the same exponent
value. This may require shifting the radix point on one of the operands to achieve alignment.
Multiplication and division are more straightforward.
A floating-point operation may produce one of these conditions:
 Exponent overflow: A positive exponent exceeds the maximum possible exponent value.

71
 Exponent underflow: A negative exponent is less than the minimum possible exponent
value (e.g., is less than). This means that the number is too small to be represented, and it
may be reported as 0.
 Significand underflow: In the process of aligning significands, digits may flow off the
right end of the significand.
 Significand overflow: The addition of two significands of the same sign may result in a
carry out of the most significant bit.

In floating-point arithmetic, addition and subtraction are more complex than multiplication and
division. This is because of the need for alignment. There are four basic phases of the algorithm
for addition and subtraction:
1. Check for zeros.
2. Align the significands.
3. Add or subtract the significands.
4. Normalize the result.

Examples:
X = 0.3 * 102 = 30
Y = 0.2 * 103 = 200

X \ Y = (0.3 \ 0.2) * 102-3 = 1.5 * 10-1 = 0.15


X * Y = (0.3 * 0.2) * 10 2+3 = 0.06 * 105 = 6000
X - Y = (0.3 * 102-3 - 0.2) * 103 = (-0.17) * 103 = -170
X + Y = (0.3 * 102-3 + 0.2) * 103 = 0.23 * 103 = 230

3.6 Character Encoding

In computer memory, character is "encoded" (or "represented") using a chosen "character encoding
schemes" (aka "character set", "charset", "character map", or "code page").

72
For example, in ASCII (as well as Latin1, Unicode, and many other character sets):

 Code numbers 65D (41H) to 90D (5AH) represents 'A' to 'Z', respectively.
 Code numbers 97D (61H) to 122D (7AH) represents 'a' to 'z', respectively.
 Code numbers 48D (30H) to 57D (39H) represents '0' to '9', respectively.
It is important to note that the representation scheme must be known before a binary pattern can be
interpreted. E.g., the 8-bit pattern "0100 0010B" could represent anything under the sun known
only to the person encoded it. The most commonly-used character encoding schemes are: 7-bit
ASCII (ISO/IEC 646) and 8-bit Latin-x (ISO/IEC 8859-x) for western European characters, and
Unicode (ISO/IEC 10646) for internationalization (i18n). A 7-bit encoding scheme (such as
ASCII) can represent 128 characters and symbols. An 8-bit character encoding scheme (such as
Latin-x) can represent 256 characters and symbols; whereas a 16-bit encoding scheme (such as
Unicode UCS-2) can represents 65,536 characters and symbols.

7-bit ASCII Code (aka US-ASCII, ISO/IEC 646, ITU-T T.50):

 ASCII (American Standard Code for Information Interchange) is one of the earlier character
coding schemes.

 ASCII is originally a 7-bit code. It has been extended to 8-bit to better utilize the 8-bit computer
memory organization. (The 8th-bit was originally used for parity check in the early
computers.)
 Code numbers 32D (20H) to 126D (7EH) are printable (displayable) characters as tabulated:

Hex 0 1 2 3 4 5 6 7 8 9 A B C D E F
2 SP ! " # $ % & ' ( ) * + , - . /
3 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4 @ A B C D E F G H I J K L M N O
5 P Q R S T U V W X Y Z [ \ ] ^ _
6 ` A b c d e f g h i j k l m n o
7 p Q r s t u v w x y z { | } ~

o Code number 32D (20H) is the blank or space character.


o '0' to '9': 30H-39H (0011 0001B to 0011 1001B) or (0011 xxxxB where xxxx is the
equivalent integer value)

73
o 'A' to 'Z': 41H-5AH (0101 0001B to 0101 1010B) or (010x xxxxB). 'A' to 'Z' are
continuous without gap.
o 'a' to 'z': 61H-7AH (0110 0001B to 0111 1010B) or (011x xxxxB). 'A' to 'Z' are also
continuous without gap. However, there is a gap between uppercase and lowercase letters.
To convert between upper and lowercase, flip the value of bit-5.
 Code numbers 0D (00H) to 31D (1FH), and 127D (7FH) are special control characters, which
are non-printable (non-displayable), as tabulated below. Many of these characters were used
in the early days for transmission control (e.g., STX, ETX) and printer control (e.g., Form-
Feed), which are now obsolete. The remaining meaningful codes today are:
o 09H for Tab ('\t').
o 0AH for Line-Feed or newline (LF, '\n') and 0DH for Carriage-Return (CR, 'r'), which are
used as line delimiter (aka line separator, end-of-line) for text files. There is unfortunately
no standard for line delimiter: Unixes and Mac use 0AH ("\n"), Windows use 0D0AH
("\r\n"). Programming languages such as C/C++/Java (which was created on Unix) use
0AH ("\n").

3.7 Instruction Set: Characteristics and functions


What is an Instruction Set?

The operation of the processor is determined by the instructions it executes, referred to as machine
instructions or computer instructions. The collection of different instructions that the processor can
execute is referred to as the processor’s instruction set.

Elements of an Instruction:

Each instruction must have elements that contain the information required by the CPU for
execution. These elements are as follows:

 Operation code: Specifies the operation to be performed (e.g. ADD, I/O). The operation is
specified by a binary code, known as the operation code, or Opcode.
 Source operand reference: The operation may involve one or more source operands, that
is, operands that are inputs for the operation.

74
 Result operand reference: The operation may produce a result.
 Next instruction reference: This tells the CPU where to fetch the next instruction after the
execution of this instruction is complete.

The next instruction to be fetched is located in main memory or, in the case of a virtual memory
system, in either main memory or secondary memory (disk). In most cases, the next instruction to
be fetched immediately follows the current instruction. In those cases, there is no explicit reference
to the next instruction. Source and result operands can be in one of three areas:

 Main or virtual memory: As with next instruction references, the main or virtual memory
address must be supplied.
 CPU register: With rare exceptions, a CPU contains one or more registers that may be
referenced by machine instructions. If only one register exists, reference to it may be
implicit. If more than one register exists, then each register is assigned a unique number,
and the instruction must contain the number of the desired register.
 I/O device: The instruction must specify (the I/O module and device for the operation. If
memory-mapped I/O is used, this is just another main or virtual memory address.

Instruction in an instruction set can be categorized several ways. Based on an IEEE proposed
standard, instructions in this chapter will be organised into the following categories:

 Arithmetic instruction
 Logical instructions
 Data transfer instructions
 String instructions
 Program flow control or branch instructions
 Subroutine call instructions
 Return instructions
 Miscellaneous instructions

3.7.1 Arithmetic Instruction Sets


These are set of instructions that perform arithmetic calculations on data inputted in the
microprocessor of a microcomputer. A simple microprocessor instruction set would include the
following arithmetic instructions:

 Add

75
 Subtract
 Multiplication
 Increment
 Decrement
 Compare
 Negate

Other arithmetic instructions used by some microprocessors might include add with carry, subtract
with carry/borrow, and divide operations.

3.7.2 Logical Instructions


Logical instructions perform Boolean operations on individual bits of data in the microprocessor.
A simplified microprocessor would have the following logical instructions:

 AND
 OR
 Exclusive OR
 Not
 Shift right
 Shift left

Other logical instructions used by some microprocessors might include shift right arithmetic,
rotate right, rotate left, rotate right through carry, rotate through carry, and test operations.

3.7.3 Data Transfer Instructions


These are group of instructions which are given to the processor of a microcomputer for movement
of data to memory, registers to registers, and registers to memory. The basic microprocessor would
contain variations of the following data transfer instructions:

 Load
 Store
 Move
 Input
 Output

Other data transfer instructions used by some microprocessors might include exchange and various
clear and sets operations.

76
3.7.4 String Instructions
These are set of instructions used to manipulate, scan, copy, move, and compare strings. The basic
microcomputer would contain variation of the following string instruction:

 Move String
 Scan String
 Copy String
 Compare characters in a string

3.7.5 Branch Instructions


These set of instructions alter the sequential flow of program by the microprocessor. The
microprocessor would contain the following branch instructions in its instruction set:

 Unconditional branch
 Branch if zero
 Branch if not zero
 Branch if equal
 Branch if not equal
 Branch if positive
 Branch if negative

Other conditional branch instructions used by some microprocessors might depend on conditions
such as greater than or less than, no carry or carry, or no overflow or overflow.

3.7.6 Subroutine Call or Program Flow Control Instructions


These set of instructions alter the sequential flow of program by the microprocessor. A simple
microprocessor would have a subroutine call instruction (referred to as CALL) to make the program
jump to a special group of instructions which perform a specific task. All microprocessors have the
unconditional call instruction, and some have conditional call instructions as well. Conditional call
instructions might include call if zero, call if not zero, call if positive, and call if not positive.

At the end of the subroutine, the program must return to where it originally left off in the main
program listing. This task is return with a return instruction.

3.7.7Return Instructions
Return instructions might include return from subroutine or return from interrupt operations. Return
is usually unconditional, but some microprocessors contain conditional return instructions.

77
3.7.8 Miscellaneous Instructions
A simplified microprocessor instruction set would include the following miscellaneous
instructions:

 No operation
 Push
 Pop
 Wait
 Halt

Other miscellaneous instructions might include enable interrupt; disable interrupt, break, and
decimal adjust operations.

3.8 Addressing Modes


Addressing mode of a processor refers to the methods used to retrieve data or bits of information
from program memory. The address field or fields in a typical instruction format are relatively
small. We would like to be able to reference a large range of locations in main memory or for some
systems, virtual memory. To achieve this objective, a variety of addressing techniques is employed
by the CPU. They all involve some trade-off between address range and/or addressing flexibility,
on the one hand, and the number of memory references and/or the complexity of address
calculation, on the other. The addressing techniques used by the CPU instruction set are:

 Immediate
 Direct
 Indirect
 Register
 Register indirect
 Displacement / Inherent

3.8.1 Immediate Addressing


The simplest form of addressing is immediate addressing, in which the operand is actually present
in the instruction:

 Operand is part of instruction


 Operand = address field

E.g. ADD 5; Add 5 to contents of accumulator; 5 is operand

78
The advantage of immediate addressing is that no memory reference other than the instruction fetch
is required to obtain the operand, thus saving one memory or cache cycle in the instruction cycle.

The disadvantage is that the size of the number is restricted to the size of the address field, which,
in most instruction sets, is small compared with the word length.

3.8.2 Direct Addressing


A very simple form of addressing is direct addressing, in which:

 Address field contains address of operand


 Effective address (EA) = address field (A)

E.g. ADD A; Add contents of cell A to accumulator

The technique was common in earlier generations of computers but is not common on
contemporary architectures. It requires only one memory reference and no special calculation. The
obvious limitation is that it provides only a limited address space.

3.8.3 Register Addressing


Register addressing is similar to direct addressing. The only difference is that the address field
refers to a register rather than a main memory address.

The advantages of register addressing are that:

79
 Only a small address field is needed in the instruction
 No memory references are required, faster instruction fetch

The disadvantage of register addressing is that the address space is very limited.

3.8.4 Register Indirect Addressing


Just as register addressing is analogous to direct addressing, register indirect addressing is
analogous to indirect addressing. In both cases, the only difference is whether the address field
refers to a memory location or a register. Thus, for register indirect address: Operand is in memory
cell pointed to by contents of register.

The advantages and limitations of register indirect addressing are basically the same as for indirect
addressing. In both cases, the address space limitation (limited range of addresses) of the address
field is overcome by having that field refer to a word-length location containing an address. In
addition, register indirect addressing uses one less memory reference than indirect addressing.

3.8.5 Displacement/ Inherent Addressing


A very powerful mode of addressing combines the capabilities of direct addressing and register
indirect addressing. It is known by a variety of names depending on the context of its use but the
basic mechanism is the same. We will refer to this as displacement addressing, address field hold
two values:

 A = base value
 R = register that holds displacement

80
81
CHAPTER FOUR

COMPUTER PERFORMANCE EVALUATION

4.1 Chapter objectives and expected results


The objectives of this chapter are:

 A discussion of the history of computer technology.


 A look at the technology trends that have made performance the focus of computer system
design and
 A preview of the various techniques and strategies that are used to achieve balanced,
efficient performance.
At the end of this chapter students are expected to:

o Know the history of computer technology


o Know the trends of the Intel x86 architecture
o Know how to assess the performance of a computer

4.2 Computer Evolution

The first-Generation Computers: Vacuum Tubes:

 ENIAC
The ENIAC (Electronic Numerical Integrator And Computer), the world’s first general-purpose
electronic digital computer was designed by and constructed under the supervision of John
Mauchly and John Presper Eckert at the University of Pennsylvania. The machine weighed 30 tons,
occupying 15,000 square feet of floor space, and containing more than 18,000 vacuum tubes. When
operating, it consumed 140 kilowatts of power. It was also substantially faster than any electronic-
mechanical computer, being capable of 5000 additions per second.
The ENIAC was decimal rather than a binary machine. Its memory consisted of 20 “accumulators”,
each capable of holding a 10-digit decimal number. Each digit was represented by a ring of 10
vacuum tubes. At any time, only one vacuum tube was in the ON state, representing one of the 10
digits. The major drawback of the ENIAC was that it had to be programmed manually by setting
switches and plugging and unplugging cables.
 The Von Neumann Machine
The task of entering and altering programs for the ENIAC was extremely tedious. The
programming process could be facilitated if the program could be represented in a form suitable
for storing in memory alongside the data. Then, a computer could get its instructions by reading

82
them from memory, and a program could be set of altered by setting the values of a portion of
memory.
This idea, known as the Stored-program concept, is usually attributed to the ENIAC designers,
most notably the mathematician John Von Neumann, who was a consultant on the ENIAC project.
The first publication of the idea was in a 1945 proposal by Von Neumann for a new computer, the
EDVAC (Electronic Discrete Variable Computer).
In 1946, Von Neumann and his colleagues began the design of a new stored-program computer,
referred to as the IAS computer, at the Princeton Institute for Advanced Studies. The IAS computer,
although not completed until 1952, is the prototype of all subsequent general-purpose computers.
Figure 4.2 shows the general structure of the IAS computer.
It consists of:
 A main memory, which stores both data and instructions.
 An arithmetic-logical unit (ALU) capable of operating on binary data.
 A control unit, which interprets the instructions in memory and causes them to be executed.
 Input and output (I/O) equipment operated by the control unit.

Figure 4.1: John Von Neumann architecture of computer

Figure 4.2: the structure of the IAS Computer

 Commercial Computers
The 1950s saw the birth of the computer industry with two companies, Sperry and IBM, dominating
the marketplace. In 1947, Eckert and Mauchly formed the Eckert-Maunchly computer Corporation
83
to manufacture computers commercially. Their first successful machine was the UNIVAC I
(Universal Automatic Computer), which was commissioned by the Bureau of the Census for the
1950 calculations. The Eckert-Maunchly Computer Corporation became part of the UNIVAC
division of Sperry-Rand Corporation, which went on to build a series of successor machines. The
UNIVAC II, which had greater memory capacity and higher performance than the UNIVAC I, was
delivered in the late 1950s and illustrates several trends that have remained characteristic of the
computer industry. IBM delivered its first electronic stored-program computer, the 701, in 1953.
The 70l was intended primarily for scientific applications. In 1955, IBM introduced the companion
702 product, which had a number of hardware features that suited it to business applications. These
were the first of a long series of 700/7000 computers that established IBM as the overwhelmingly
dominant computer manufacturer.

The Second-generation Computers: Transistors


The use of the transistor defines the second generation of computers. The first major change in the
electronic computer came with the replacement of the vacuum tube by the transistor. A transistor
is simply an on/off switch controlled by electricity. The transistor is smaller, cheaper, and dissipates
less heat than a vacuum tube but can be used in the same way as a vacuum tube to construct
computers. Full transistorized computers were commercially available in the 1950s. NCR and
more successfully, RCA were the front-runners with some small transistor machines. IBM followed
shortly with the 7000 series.
The second generation saw the introduction of more complex arithmetic and logic units and control
units, the use of high-level programming languages, and the provision of system software with the
computer. The second generation is noteworthy also for the appearance of the Digital Equipment
Corporation (DEC). DEC was founded in 1957 and, in that year, delivered its first computer, the
PDP-1. This computer and this company began the minicomputer phenomenon that would become
so prominent in the third generation.

The third generation Computers: Integrated Circuits


In 1958 the invention of the integrated circuit revolutionized electronics and started the era of
microelectronics. It is the integrated circuit that defines the third generation of computers.

MICROELECTRONICS:
The basic elements of a digital computer must perform storage, movement, processing, and control
functions. Only two fundamental types of components are required: gates and memory cells.

84
A gate is a device that implements a simple Boolean or logical function, such as AND, OR, Ex-OR
etc.
The memory cell is a device that can store one bit of data; that is, the device can be in one of two
stable states at any time.
By interconnecting large numbers of these fundamental devices, we can construct a computer.
The integrated circuit exploits the fact that such components as transistors, resistors, and conductors
can be fabricated from a semiconductor such as silicon. It is merely an extension of the solid-state
art to fabricate an entire circuit in a tiny piece of silicon rather than assemble discrete components
made from separate pieces of silicon into the same circuit. Many transistors can be produced at the
same time on a single wafer of silicon. Equally important, these transistors can be connected with
a process of metallization to form circuits.
Moore’s law
The increase in transistor count for an integrated circuit is popularly known as Moore’s law, which
states that transistor capacity doubles every 18–24 months.
The two most important members of the third generation are the IBM System/360 and the DEC
PDP-8.

Later Generations: Microchip


Beyond the third generation there is less general agreement on defining generations of computers.
The later generations of computers are based on advances in integrated circuit technology. With
the introduction of large-scale integration (LSI), more than 1000 components can be placed on a
single integrated circuit chip. Very-large-scale integration (VLSI) achieved more than 10,000
components per chip, while current ultra-large-scale integration (ULSI) chips can contain more
than one million components.
It has become widely accepted to classify computers into generations based on the fundamental
hardware technology employed. Each new generation is characterized by greater processing
performance, larger memory capacity, and smaller size than the previous one. Table 4.1 illustrates
the various technological changes through the computer generation.

Table 4.1 Computer generations

85
SEMICONDUCTOR MEMORY
The first application of integrated circuit technology to computers was construction of the processor
(CU and ALU) out of integrated circuit chips. This same technology was used to construct
memories. Since 1970, semiconductor memory has been through 13 generations: 1K, 4K, 16K,
64K, 256K, 1M, 4M, 16M, 64M, 256M, 1G, 4G, and 16G on a single chip. Each generation has
provided four times the storage density of the previous generation, accompanied by declining cost
per bit and declining access time.

MICROPROCESSORS
In 1971, the microprocessor was born when Intel developed its 4004 processor. The 4004 was the
first chip to contain all of the components of a CPU on a single chip. The 4004 can add two 4-bit
numbers and can multiply only by repeated addition. The microprocessor evolution can be seen
most easily in the number of bits of data that can be brought into or sent out of the processor at a
time. Another measure is the number of bits in the accumulator or in the set of general-purpose
registers. The next major step in the evolution of the microprocessor was the introduction in 1972
of the Intel 8008.This was the first 8-bit microprocessor and was almost twice as complex as the
4004. Neither of these steps was to have the impact of the next major event: the introduction in
1974 of the Intel 8080.This was the first general-purpose microprocessor.
Whereas the 4004 and the 8008 had been designed for specific applications, the 8080 was designed
to be the CPU of a general-purpose microcomputer. Like the 8008, the 8080 is an 8-bit
microprocessor. The 8080, however, is faster, has a richer instruction set, and has a large addressing
capability. About the same time, 16-bit microprocessors began to be developed. However, it was
not until the end of the 1970s that powerful, general-purpose 16-bit microprocessors appeared. One
of these was the 8086. The next step in this trend occurred in 1981, when both Bell Labs and
Hewlett-Packard developed 32-bit, single- chip microprocessors. Intel introduced its own 32-bit

86
microprocessor, the 80386, in 1985. Table 4.2 goes through the various evolution of Intel
microprocessors.
Table 4.2 Intel processors

87
4.3 Performance Assessment
When trying to choose among different computers, performance is important key parameters to
consider, along with cost, size, security, reliability, and power consumption.
If a program is run on two different desktop computers, the faster one is the desktop computer that
gets the job done first. Every computer user is interested in reducing response time, the time
between the start and completion of a task also referred to as execution time or processor time.
In discussing the performance of computers, the primarily concerned is with response time or
execution time. To maximize performance, response time or execution time needs to be minimised
for some task. Thus, we can relate performance and execution time for a computer X as follows:

This means that for two computers X and Y, if the performance of X is greater than the
performance of Y, we have

That is, the execution time on Y is longer than that on X, if X is faster than Y.

In discussing a computer design, we often want to relate the performance of two different
computers quantitatively. We will use the phrase “X is n times faster than Y” or equivalently “X is
n times as fast as Y” to mean:

If X is n times faster than Y, then the execution time on Y is n times longer than it is on X:

88
4.3.1 Clock Speed and Instructions per Second

THE SYSTEM CLOCK


Operations performed by a processor are governed by a system clock. All operations begin with
the pulse of the clock. The speed of a processor is dictated by the pulse frequency produced by the
clock, measured in cycles per second, or Hertz (Hz).
The clock signals are generated by a quartz crystal, which generates a constant signal wave while
power is applied. This wave is converted into a digital voltage pulse stream that is provided in a
constant flow to the processor circuitry. For example, a 1-GHz processor receives 1 billion pulses
per second. The rate of pulses is known as the clock rate, or clock speed. One increment, or pulse,
of the clock is referred to as a clock cycle, or a clock tick. The time between pulses is the cycle
time.

INSTRUCTION EXECUTION RATE


A processor is driven by a clock with a constant frequency f or, equivalently, a constant cycle time
‫ז‬, where ‫ = ז‬1/f.

Instruction count, Ic
Instruction count of a program is the number of machine instructions executed for that program
until it runs to completion or for some defined time interval.

Average Cycles per Instruction CPI for a program.


Let CPIi be the number of cycles required for instruction type i. and Ii be the number of executed
instructions of type i for a given program.
Overall CPI can be calculated as follows:

Processor Time, T for a program

89
The processor time T needed to execute a given program can be expressed as:

During the execution of an instruction, part of the work is done by the processor, and part of the
time a word is being transferred to or from memory. The time to transfer depends on the memory
cycle time, which may be greater than the processor cycle time. Therefore, T can be written as
follows:

Where:
P = number of processor cycles needed to decode and execute the instruction,
m = number of memory references needed, and
k = ratio between memory cycle time and processor cycle time.

The five performance factors in the preceding equation (Ic , p,m, k, ‫ )ז‬are influenced by four system
attributes: the design of the instruction set (instruction set architecture), compiler technology (how
effective the compiler is in producing an efficient machine language program from a high-level
language program), processor implementation, and cache and memory hierarchy.
A common measure of performance for a processor is the rate at which instructions are executed,
expressed as millions of instructions per second (MIPS), referred to as the MIPS rate. We can
express the MIPS rate in terms of the clock rate and CPI as follows:

Example:
Consider the execution of a program which results in the execution of 2 million instructions on a
400-MHz processor. The program consists of four major types of instructions. The instruction
mixes and the CPI for each instruction type are given below.

90
The average CPI when the program is executed on a uniprocessor with the above results is:

Floating point performance is expressed as millions of floating-point operations per second


(MFLOPS), defined as follows:

4.3.2 Performance Enhancement Calculations: Amdahl's Law


The performance enhancement possible due to a given design improvement is limited by the
amount that the improved feature is used.

Amdahl’s Law:

Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl,
and is used to find the maximum expected improvement to an overall system when only part of the
system is improved. It is often used in parallel computing to predict the theoretical maximum speed
up using multiple processors.

Performance improvement or speedup due to enhancement E:

Speedup (E) = Execution Time without E / Execution Time with E= Performance without E/
Performance with E

Suppose that enhancement E accelerates a fraction F of the original execution time by a factor S
and the remainder of the time is unaffected then:

Execution Time with E = ((1-F) + F/S) X Execution Time without E

Hence speedup is given by:

Speedup (E) = Execution Time without E / ((1 - F) + F/S) X Execution Time without E =1/ ((1-
F) + F/S)

Performance Enhancement Example:

For the RISC machine with the following instruction mix given earlier:

91
Op Freq. Cycles CPI(i) %Time

ALU 50% 1 0.5 23%

Load 20% 5 1 45%

Store 10% 3 0.3 14%

Branch 20% 2 0.4 18%

CPI= 2.2

If a CPU design enhancement improves the CPI of load instructions from 5 to 2, what is the
resulting performance improvement from this enhancement?

Fraction enhanced = F = 45% or .45

Unaffected fraction = 100% - 45% = 55% or .55

Factor of enhancement = 5/2 = 2.5

Using Amdahl’s Law:

Speedup (E) = 1/ ((1-F) +F/S) = 1/ ((0.55+0.45/2.5) =1.37

An Alternative Solution Using CPU Equation:

Old CPI = 2.2

New CPI = .5 x 1 + .2 x 2 + .1 x 3 + .2 x 2 = 1.6

Speedup (E) = Original Execution time /New Execution time = Old CPI /New CPI

Speedup (E) =2.2/1.6=1.37

This value is the same speedup obtained from Amdahl’s Law in the first solution above.

Example 2

A program runs in 100 seconds on a machine with multiply operations responsible for 80 seconds
of this time. By how much must the speed of multiplication be improved to make the program four
times faster?

Desired speedup = 4 = 100/ Execution Time with enhancement

Execution time with enhancement = 25 seconds


92
25 seconds = (100 - 80 seconds) + 80 seconds / n

25 seconds = 20 seconds + 80 seconds / n

5 = 80 seconds / n

n = 80/5 = 16

Hence multiplication should be 16 times faster to get a speedup of 4.

93
CHAPTER FIVE

THE COMPUTER MEMORY SYSTEM

5.1 Chapter objectives and expected results


This chapter focuses on the internal and external memory elements of a computer system. It
examines key characteristics of computer memories. The remainder of the chapter examines an
essential element of all modern computer systems: cache memory.
At the end of this chapter students are expected to know:
o The various components of the Memory System
o The Memory Hierarchy
o Cache Memories
o Cache Organization
o Replacement Algorithms
o Memory write strategies
o Virtual Memory

5.2 Memory System


The physical devices used to store data or programs (sequences of instructions) on a temporary or
permanent basic for use in a computer system is called memory. Computer memory is usually
meant to refer to the semiconductor technology that is used to store bits of information in an
electronic device. The flip-flop, or latch, is the basic memory cell used in many semi-conductor
memories.

The basic element of a semiconductor memory is the memory cell. All semiconductor memory
cells share certain properties:

 They exhibit two stable states, which can be used to represent binary 1 and 0;
 They are capable of being written into (at least once), to set the state;
 They are capable of being read to sense the state.

Figure 5.1 Memory cell operations

94
Figure 5.1 depicts the operation of a memory cell. The cell has three functional terminals capable
of carrying an electrical signal. The select terminal, as the name suggests, selects a memory cell
for a read or write operation. The control terminal indicates read or write. For writing, the other
terminal provides an electrical signal that sets the state of the cell to 1 or 0. For reading, that terminal
is used for output of the cell’s state.
The basic concept of computer memory is the following:

 Bits - The basic unit of memory is the bit. A bit may contain a 0 or 1. It is the simplest
possible unit.
 Memory addresses - Memories consist of a number of cells or locations each of which can
store a piece of information. Each location has a number called its address, by which
program can refer to it.
 If an address has m bits, the maximum number of cells addressable is 2m.
 Byte: 8-bits
 Bytes are grouped into words. The significance of word is that most instruction operates on
entire word. A computer with a 32bit/word has 4 bytes/word

 Byte ordering
 The bytes in a word can be numbered from left-to-right or right-to-left.
 The system, where the numbering begins at the “big” (i.e. high-order) end is called a Big-
Endian computer. Whereas the system, where the numbering begins at the “little” (i.e. low-
order) end is called a Little-Endian computer.

5.2.1 Characteristics of Memory System


Computer memory is made more manageable by classifying memory systems according to their
key characteristics. The most important of these are listed in Table 4.1 and explained below:
 Location: This refers to whether memory is internal and external to the computer. Internal
memory is often equated with main memory. The processor requires its own local memory,
in the form of registers and Cache. External memory consists of peripheral storage devices,
such as disk and tape.
 Capacity: It is the amount of information that can be contained in a memory unit. For
internal memory, this is typically expressed in terms of bytes or words. Common word
lengths are 8, 16, and 32 bits.
 Memory word: The unit of organization of memory. The size of the word is typically equal
to the number of bits used to represent an integer and to the instruction length.

95
 Addressable unit: The fundamental data element size that can be addressed in the memory
typically either the word size or individual bytes.
 Unit of transfer: The number of data (bits) elements transferred at a time usually bits in
main memory and blocks in secondary memory.
 Transfer rate: Rate at which data is transferred to/from the memory device
 Access time: The time to address the unit and perform the transfer
 Memory cycle time: Access time plus any other time required before a second access can
be started.
 Transfer rate: This is the rate at which data can be transferred into or out of a memory
unit.
o For random-access memory, it is equal to 1/ (cycle time).
o For non-random-access memory, the following relationship holds:

Where:
TN = Average time to read or write N bits
TA = Average access time
n = Number of bits
R = Transfer rate, in bits per second (bps)
 Access technique: how are memory contents accessed
Table 5.1 Key Characteristics of Computer Memory Systems

5.2.2 Memory Types Used


Computer memory system consists of various types depending on the technology used. The
technologies affect not only the operating characteristics but also the manufacturing cost.

96
Semiconductor memories used in microcomputers are usually divided into two groups according
to how data or bits of information are stored. The two types of semiconductor memories are:

 Volatile memory
 Non-volatile memory

Volatile memory:

The volatile memory type also known as read/write memory requires power to maintain the bits of
information stored in it. It can be easily programmed, erased and reprogrammed by the user. The
programming is called writing into memory. The read and write memory is what we often called
the RAM (Random Access Memory) which is the system memory of any computer.

These types of semiconductor memories come in different types:

 Static Random Access Memory (SRAM) – It is basically an array of flip-flop storage cells.
This type of memory which does not require being refreshed periodically and are made
from flip-flop like circuits.
 Dynamic Random Access Memory (DRAM) – Storage cell is essentially a transistor acting
as a capacitor. This type of RAM stores each bit of data or information in separate capacitors
within an integrated circuit. Information on this memory fades unless the capacitor is
refreshed periodically.
 Cache Memory - The cache memories are high-speed buffers for holding recently accessed
data and neighboring data in main memory. The organization and operations of cache
provide an apparently fast memory system.

Non-volatile memory:

The non-volatile memory is referred to as the Read-Only-Memory (ROM). This type of memory
stores bits of data or information permanently even when not powered. Retain the stored
information or programs permanently when power is lost by the system.

Read-only-Memory (ROM) comes in four different versions:

 Standard Read-Only-Memory (SROM) –This version of ROM is programmed by the


manufacturer and cannot be reprogrammed.
 Programmable Read-Only-Memory (PROM) – can be programmed once by the user or
distributer using special equipment.

97
 Erasable Programmable Read-Only-Memory (EPROM) – Programming is similar to a
PROM. Can be erased by exposing to UV light
 Electrically erasable PROMs - Can be written to many times while remaining in a
system. Does not have to be erased first before programming individual bytes. Used in
systems for development, personalization, and other tasks requiring unique information to
be stored.

The ROM, PROM, EPROM, and EEROM are all considered permanent non-volatile memories
that do not lose their data when power to the system is turned off. Also, now on the market are
some forms of external ROM’s

 Magnetic disks
 RAID technology disks
 Optical disks
 Magnetic tape

Table 5.2 lists the major types of semiconductor memory.


Table 5.2 Semiconductor memory types

5.2.3 Memory Organization of Computer System


Writing into or reading from a storage location is called accessing memory. Generally, data storage
can be classified as either:

 Sequential –access memory


 Random-Access memory

98
Data in a sequential access memory is located by searching in serial fashion through all the storage
locations. For example, when data are stored on magnetic tape, the tape must be searched from end
to end to find the appropriate data.

In a Random-access memory, any storage location can be written into or read from in a given time
called the access time. The semiconductor RAM and ROM storage devices used in microcomputers
are both of the faster random-access type.

5.3 Memory Hierarchy


Although memory is technically any form of electronic storage, it is used most often to identify
fast, temporary forms of storage. If your computer's CPU had to constantly access the hard drive
to retrieve every piece of data it needs, it would operate very slowly. When the information is kept
in memory, the CPU can access it much more quickly. Most forms of memory are intended to store
data temporarily.

No matter how big the main memory, we can organize effectively the memory system in order to
store more information than it can hold. The traditional solution to storing a great deal of data is a
memory hierarchy.

 Major design objective of any memory system:


 To provide adequate storage capacity at
 An acceptable level of performance
 At a reasonable cost
 Four interrelated ways to meet this goal:
 Use a hierarchy of storage devices
 Develop automatic space allocation methods for efficient use of the memory
 Through the use of virtual memory techniques, free the user from memory management
tasks
 Design the memory and its related interconnection structure so that the process

The CPU accesses memory according to a distinct hierarchy shown in figure 5.2, whether it comes
from permanent storage (the hard drive) or input (the keyboard), most data goes in random access
memory (RAM) first. The CPU then stores pieces of data it will need to access, often in a cache,
and maintains certain special instructions in the register. All of the components in your computer,
such as the CPU, the hard drive and the operating system, work together as a team, and memory is
one of the most essential parts of this team. From the moment you turn your computer on until the
time you shut it down, your CPU is constantly using memory.

99
Figure 5.2: Memory hierarchy of a computer system

A look at a typical scenario when the computer is turn on:

 The computer loads data from read-only memory (ROM) and performs a power-on self-test
(POST) to make sure all the major components are functioning properly. As part of this
test, the memory controller checks all of the memory addresses with a quick read/write
operation to ensure that there are no errors in the memory chips. Read/write means that
data is written to a bit and then read from that bit.

 The computer loads the basic input/output system (BIOS) from ROM. The BIOS provides
the most basic information about storage devices, boot sequence, security, Plug and Play
(auto device recognition) capability and a few other items.
 The computer loads the operating system (OS) from the hard drive into the system's RAM.
Generally, the critical parts of the operating system are maintained in RAM as long as the
computer is on. This allows the CPU to have immediate access to the operating system,
which enhances the performance and functionality of the overall system.
 When you open an application, it is loaded into RAM. To conserve RAM usage, many
applications load only the essential parts of the program initially and then load other pieces
as needed.
 After an application is loaded, any files that are opened for use in that application are
loaded into RAM.

100
 When you save a file and close the application, the file is written to the specified storage
device, and then it and the application are purged from RAM.

In the list above, every time something is loaded or opened, it is placed into RAM. This simply
means that it has been put in the computer's temporary storage area so that the CPU can access that
information more easily. The CPU requests the data it needs from RAM, processes it and writes
new data back to RAM in a continuous cycle. In most computers, this shuffling of data between
the CPU and RAM happens millions of times every second. When an application is closed, it and
any accompanying files are usually purged (deleted) from RAM to make room for new data. If the
changed files are not saved to a permanent storage device before being purged, they are lost.

System RAM speed is controlled by the system bus width and system bus speed. The system bus
width refers to the number of bits of information that can be sent to the CPU simultaneously, and
the system bus speed refers to the number of times a group of bits of information can be sent each
second.

5.3.1 Memory Performance


Goal of the memory hierarchy is to try to match the processor speed with the rate of information
transfer from the lowest element in the hierarchy.

 The memory hierarchy speeds up the memory performance

The memory hierarchy works because of locality of reference

 Memory references made by the processor, for both instructions and data, tend to cluster
together

+ Instruction loops, subroutines

+ Data arrays, tables

 Keep these clusters in high speed memory to reduce the average delay in accessing data
 Over time, the clusters being referenced will change memory management must deal with
this

 Performance of a two-level memory

Example: Suppose that the processor has access to two level of memory:

 Two-level memory system


 Level 1 access time of 1 us
101
 Level 2 access time of 10us
 Ave access time = H (1) + (1-H) (10) ns

Where: H is a fraction of all memory access that are found in the faster memory (e.g. cache)

Figure 5.3: Performance of a two-level memory

5.4 Cache Memory


A cache memory is a small, very fast memory that retains copies of recently used information from
main memory. It operates transparently to the programmer, automatically deciding which values
to keep and which to overwrite. When CPU needs a word, it first looks in the cache. Only if the
word is not there does it go to main memory. If a substantial fraction of the words is in the cache,
the average access time can be greatly reduced. So, the success or failure thus depends on what
fractions of the words are in cache. The concept is illustrated in Figure 5.4. There is a relatively
large and slow main memory together with a smaller, faster cache memory. The cache contains a
copy of portions of main memory.

Figure 5.4 Single Cache


Figure 5.5 depicts the use of multiple levels of cache. The L2 cache is slower and typically larger
than the L1 cache, and the L3 cache is slower and typically larger than the L2 cache.

102
Figure 5.5 Three level Cache Organisation
The processor operates at its high clock rate only when the memory items it requires are held in the
cache. The overall system performance depends strongly on the proportion of the memory accesses
which can be satisfied by the cache.
o An access to an item which is in the cache is referred to as a hit.
o An access to an item which is not in the cache is referred to as a miss.
o The proportion of all memory accesses that are satisfied by the cache is referred to as its hit
rate.
o The proportion of all memory accesses that are not satisfied by the cache is referred to as
its miss rate.

Cache space (~KBytes) is much smaller than main memory (~MBytes). Items have to be placed in
the cache so that they are available there when they are needed.
During execution of a program, memory references by the processor, for both instructions and data,
tend to cluster: once an area of the program is entered, there are repeated references to a small set
of instructions (loop, subroutine) and data (components of a data structure, local variables or
parameters on the stack).
Temporal locality (locality in time): If an item is referenced, it will tend to be referenced again
soon.
Spatial locality (locality in space): If an item is referenced, items whose addresses are close by will
tend to be referenced soon.

5.4.1 Separate Data and Instruction Caches


It is common to split the cache into one dedicated to instructions and one dedicated to data. The
figure 5.6 shows architecture with a unified instruction and data cache.

103
Figure 5.6 separate data and instruction Cache
Advantages of unified caches:
 They are able to better balance the load between instruction and data fetches depending on
the dynamics of the program execution;
 Design and implementation are cheaper.
Advantages of split caches (Harvard Architectures)
 Competition for the cache between instruction processing and execution units is eliminated.
 Instruction fetch can proceed in parallel with memory access from the execution unit.

Main memory consists of up to 2 n addressable words, with each word having a unique n-bit address.
For mapping purposes, this memory is considered to consist of a number of fixed length blocks of
K words each. That is, there are M = 2 n/K blocks in main memory. The cache consists of m - blocks,
called lines. Each line contains K words, plus a tag of a few bits. Figure 5.7 depicts the structure of
a cache/main-memory system.

104
Figure 5.7 Structure of a Cache/Main-memory system
Each line includes control bits, such as a bit to indicate whether the line has been modified since
being loaded into the cache. The length of a line, not including tag and control bits, is the line size.
The line size may be as small as 32 bits, with each “word” being a single byte; in this case the line
size is 4 bytes. The number of lines is considerably less than the number of main memory blocks
(m < M). At any time, some subset of the blocks of memory resides in lines in the cache. If a word
in a block of memory is read, that block is transferred to one of the lines of the cache. Because
there are more blocks than lines, an individual line cannot be uniquely and permanently dedicated
to a particular block. Each line includes a tag that identifies which particular block is currently
being stored. The tag is usually a portion of the main memory address.

5.4.2 Cache Organisation


Since there are fewer cache lines than main memory blocks, an algorithm is needed for mapping
main memory blocks into cache lines and also determining which main memory block currently
occupies a cache line. The choice of the mapping function dictates how the cache is organized.
Three techniques can be used:
 direct,
 associative, and
 set associative

Direct Mapping

Direct mapping technique maps each block of main memory into only one possible cache line.
That is memory block is mapped into a unique cache line, depending on the memory address of the
respective block. This is illustrated in figure 5.8.

105
Figure 5.8 Direct mapping
Each block of main memory maps into one unique line of the cache. The next m-blocks of main
memory map into the cache in the same fashion; that is, block B m of main memory maps into line
L0 of cache, block B m+1 map into line L1 , and so on. The mapping function is easily implemented
using the main memory address.
A memory address is considered to be composed of three fields:
1. The least significant bits W, identify the byte within the block;
2. The rest of the address S, identify the block in main memory; for the cache logic, this part
is interpreted as two fields:
 The least significant bits specify the cache line;
 The most significant bits represent the tag, which is stored in the cache together
with the line.
Tags are stored in the cache in order to distinguish among blocks which fit into the same cache
line.
To summarize,
 Address length = (S + W) bits
 Number of addressable units = 2S+W words or bytes
 Block size line size = 2W words or bytes

 Number of blocks in main memory =


 Number of lines in cache = m = 2r
 Size of cache 2r+w words or bytes
 Size of tag = (s - r) bits

Advantages:
1. Simple and cheap;

106
2. The tag field is short; only those bits have to be stored which are not used to address the
cache;
3. Access is very fast.
Disadvantage:
1. A given block fits into a fixed cache location thus a given cache line will be replaced
whenever there is a reference to another memory block which fits to the same line,
regardless what the status of the other cache lines is. This can produce a low hit ratio, even
if only a very small part of the cache is effectively used.

Associative mapping

Associative mapping overcomes the disadvantage of direct mapping by permitting each main
memory block to be loaded into any line of the cache. A memory block can be mapped to any cache
line. If a block has to be placed in the cache the particular line will be determined according to a
replacement algorithm. This is illustrated in figure 5.9.

Figure 5.9 Associative mapping


In this case, the cache control logic interprets a memory address simply as a Tag and a Word field.
The Tag field uniquely identifies a block of main memory. To determine whether a block is in the
cache, the cache control logic must simultaneously examine every line’s tag for a match.
All tags, corresponding to every line in the cache memory, have to be checked in order to determine
if we have a hit or miss. If we have a hit, the cache logic finally points to the actual line in the
cache. The cache line is retrieved based on a portion of its content (the tag field) rather than its
address. Such a memory structure is called associative memory.
Advantages:
1. Associative mapping provides the highest flexibility concerning the line to be replaced when
a new block is read into the cache.
Disadvantages:
1. Complex

107
2. The tag field is long
3. Fast access can be achieved only using high performance associative memories for the
cache, which is difficult and expansive.

Set-Associative mapping

A memory block is mapped into any of the lines of a set. The set is determined by the memory
address, but the line inside the set can be any one. If a block has to be placed in the cache the
particular line of the set will be determined according to a replacement algorithm. The memory
address is interpreted as three fields by the cache logic, similar to direct mapping. However, a
smaller number of bits (13 in our example) are used to identify the set of lines in the cache;
correspondingly, the tag field will be larger (9 bits in our example). Several tags (corresponding to
all lines in the set) have to be checked in order to determine if we have a hit or miss. If we have a
hit, the cache logic finally points to the actual line in the cache. The number of lines in a set is
determined by the designer;
 2 lines/set: two-way set associative mapping;
 4 lines/set: four-way set associative mapping.
Set associative mapping keeps most of the advantages of direct mapping:
 short tag field
 fast access
 relatively simple
 Set associative mapping tries to eliminate the main shortcoming of direct mapping;
certain flexibility is given concerning the line to be replaced when a new block is
read into the cache.
 Cache hardware is more complex for set associative mapping than for direct mapping.
In practice 2 and 4-way set associative mapping are used with very good results. Larger sets do not
produce further significant performance improvement.
If a set consists of a single line then direct mapping;
If there is one single set consisting of all lines then associative mapping.

5.4.3 Replacement Algorithms


When a new block is to be placed into the cache, the block stored in one of the cache lines has to
be replaced. With direct mapping there is no choice. But with associative or set-associative
mapping a replacement algorithm is needed in order to determine which block to replace (and,
implicitly, in which cache line to place the block); with set-associative mapping, the candidate lines

108
are those in the selected set; with associative mapping, all lines of the cache are potential
candidates;
Random replacement:
One of the candidate lines is selected randomly. All the other policies are based on information
concerning the usage history of the blocks in the cache.
 Least recently used (LRU): - The candidate line is selected which holds the block that has
been in the cache the longest without being referenced.
 First-in-first-out (FIFO): The candidate line is selected which holds the block that has been
in the cache the longest.
 Least frequently used (LFU): The candidate line is selected which holds the block that has
got the fewest references.
Replacement algorithms for cache management have to be implemented in hardware in order to be
effective. LRU is the most efficient: relatively simple to implement and good results. FIFO is
simple to implement. Random replacement is the simplest to implement and results are surprisingly
good.

5.4.4 Memory Write Strategies


Problems arise when a write is issued to a memory address, and the content of the respective address
is potentially changed. Therefore, there are different write strategies use to keep cache content and
the content of main memory consistent without losing too much performance. The techniques used
are as follows:
 Write-through
All write operations are passed to main memory; if the addressed location is currently hold in the
cache, the cache is updated so that it is coherent with the main memory. For writes, the processor
always slows down to main memory speed.
 Write-through with buffered write
The same as write-through, but instead of slowing the processor down by writing directly to main
memory, the write address and data are stored in a high-speed write buffer; the write buffer transfers
data to main memory while the processor continues its task. This technique offers higher speed,
more complex hardware.
 Copy-back
Write operations update only the cache memory which is not kept coherent with main memory.
Cache lines have to remember if they have been updated. If such a line is replaced from the cache,
its content has to be copied back to memory. This technique provides good performance (usually

109
several writes are performed on a cache line before it is replaced and has to be copied into main
memory), and complex hardware.
Cache coherence problems are very complex and difficult to solve in multiprocessor systems.
Examples of some Cache Architectures:
Intel 80486
 Has a single on-chip cache of 8 Kbytes
 Has line size: 16 bytes
 Uses 4-way set associative organization
Pentium
 Have two on-chip caches, for data and instructions.
 Has each cache been: 8 Kbytes
 Line size: 32 bytes
 Uses 2-way set associative organization
PowerPC 601
 Has a single on-chip cache of 32 Kbytes
 line size: 32 bytes
 Uses 8-way set associative organization
PowerPC 603
 Have two on-chip caches, for data and instructions
 Each cache: 8 Kbytes
 line size: 32 bytes
 Uses a 2-way set associative organization
(Simpler cache organization than the 601 but stronger processor)
PowerPC 604
 Have two on-chip caches, for data and instructions
 Each cache: 16 Kbytes
 line size: 32 bytes
 Uses a 4-way set associative organization
PowerPC 620
 Have two on-chip caches, for data and instructions
 Each cache: 32 Kbytes
 line size: 64 bytes
 Uses an 8-way set associative organization

110
5.5 Virtual Memory
The address space needed and seen by programs is usually much larger than the available main
memory. Only one part of the program fits into main memory; the rest is stored on secondary
memory (hard disk). In order to be executed or data to be accessed, a certain segment of the program
has to be first loaded into main memory; in this case it has to replace another segment already in
memory. Movement of programs and data, between main memory and secondary storage, is
performed automatically by the operating system. These techniques are called virtual-memory
techniques.
The binary address issued by the processor is a virtual (logical) address; it considers a virtual
address space, much larger than the physical one available in main memory.
Virtual memory is a facility that allows programs to address memory from a logical point of view,
without regard to the amount of main memory physically available.
When virtual memory is used, the address fields of machine instructions contain virtual addresses.
For reads to and writes from main memory, a hardware memory management unit (MMU)
translates each virtual address into a physical address in main memory. When virtual addresses are
used, the system designer may choose to place the cache between the processor and the MMU or
between the MMU and main memory. This is illustrated in figure 5.10.

Figure 5.10 Logical and Physical Caches


A logical cache, also known as a virtual cache, stores data using virtual addresses. The processor
accesses the cache directly, without going through the MMU. A physical cache stores data using
main memory physical addresses.

111
Advantage of the logical cache is that cache access speed is faster than for a physical cache,
because the cache can respond before the MMU performs an address translation.

The disadvantage has to do with the fact that most virtual memory systems supply each application
with the same virtual memory address space. That is, each application sees a virtual memory that
starts at address 0. Thus, the same virtual address in two different applications refers to two
different physical addresses. The cache memory must therefore be completely flushed with each
application context switch, or extra bits must be added to each line of the cache to identify which
virtual address space this address refers to.

Virtual Memory Organization


The virtual programme space (instructions + data) is divided into equal, fixed-size chunks called
pages. Physical main memory is organized as a sequence of frames; a page can be assigned to an
available frame in order to be stored (page size = frame size). The page is the basic unit of
information which is moved between main memory and disk by the virtual memory system.
Common page sizes are: 2 - 16Kbytes.

Demand Paging:
The program consists of a large number of pages which are stored on disk; at any one time, only a
few pages have to be stored in main memory. The operating system is responsible for loading/
replacing pages so that the number of page faults is minimized. We have a page fault when the
CPU refers to a location in a page which is not in main memory; this page has then to be loaded
and, if there is no available frame, it has to replace a page which previously was in memory. This
is illustrated in figure 5.11.

Figure 5.11 Demand paging


Address Translation:
Accessing a word in memory involves the translation of a virtual address into a physical one.

112
Virtual address: page number + offset
Physical address: frame number + offset
Address translation is performed by the MMU using a page table.
Example:
Virtual memory space: 2 Gbytes - (31 address bits; 2 31 = 2 G)
Physical memory space: 16 Mbytes (2 24 =16M)
Page length: 2Kbytes (2 11 = 2K)
Total number of pages: 2 20 = 1M
Total number of frames: 2 13 = 8K
The address translation is illustrated in figure 5.12

5.12 Address translation

The Page Table:


The page table has one entry for each page of the virtual memory space. Each entry of the page
table holds the address of the memory frame which stores the respective page, if that page is in
main memory.
Each entry of the page table also includes some control bits which describe the status of the page:
 Whether the page is actually loaded into main memory or not;
 if since the last loading the page has been modified;
 Information concerning the frequency of access, etc.
The page table is very large (number of pages in virtual memory space is very large). Access to the
page table has to be very fast. The page table has to be stored in very fast memory, on chip.
A special cache is used for page table entries, called translation look aside buffer (TLB); it works
in the same way as an ordinary memory cache and contains those page table entries which have

113
been most recently used. The page table is often too large to be stored in main memory. Virtual
memory techniques are used to store the page table itself. Only part of the page table is stored in
main memory at a given moment.
The page table itself is distributed along the memory hierarchy:
 TLB (cache)
 main memory
 disk

Page Replacement:
When a new page is loaded into main memory and there is no free memory frame, an existing page
has to be replaced. The decision on which page to replace is based on the same speculations like
those for replacement of blocks in cache memory. LRU strategy is often used to decide on which
page to replace.
When the content of a page, which is loaded into main memory, has been modified as result of a
write, it has to be written back on the disk after its replacement. One of the control bits in the page
table is used in order to signal that the page has been modified.

114
CHAPTER SIX

THE I/O OF A COMPUTER SYSTEM

6.1 Chapter objectives and expected results


This chapter explains the Input/output of the computer system looking at external devices, followed
by an overview of the structure and function of an I/O module. Then a look at the various ways in
which the I/O functions can be performed in cooperation with the processor and memory: the
internal I/O interface.
At the end of this chapter students are expected to know:
o The I/O of the computer system;
o Structure and function of an I/O module;
o How the I/O module performs in cooperation with the processor and memory.

6.2 Input/output Subsystem of a Computer


The computer system’s I/O architecture is its interface to the outside world. In addition to the
processor and a set of memory modules, the third key element of a computer system is a set of I/O
modules. Each module interfaces to the system bus and controls one or more peripheral devices.
I/O operations are accomplished through a wide assortment of external devices that provide a
means of exchanging data between the external environment and the computer. An external device
attaches to the computer by a link to an I/O module (Figure 6.1). The link is used to exchange
control, status, and data between the I/O module and the external device. An external device
connected to an I/O module is often referred to as a peripheral device.

Figure 6.1 Generic Model of an I/O Module


The I/O module contains logic for performing a communication function between the peripheral
and the bus. Peripherals do not connect directly to the system bus thus an I/O module is required.

115
This is due to the following reasons:
 Peripherals have various methods of operation and that it would be impractical to
incorporate the necessary logic within the processor to control a range of devices;
 The data transfer rate of peripherals is often much slower than that of the memory or
processor and therefore it is impractical to use the high-speed system bus to communicate
directly with a peripheral;
 Some peripherals have faster data rate transfer than that of the memory or processor and
therefore the mismatch would lead to inefficiencies if not managed properly;
 Peripherals often use different data formats and word lengths than the computer to which
they are attached.

The I/O module has two major functions and these are as follows:
1. Interface to the processor and memory via the system bus;
2. Interface to one or more peripheral devices.

The interface to the I/O module is in the form of:


 Control,
 Data, and
 Status signals.
Control signals - Determine the function that the device will perform, such as send data to the I/O
module (INPUT or READ), accept data from the I/O module (OUTPUT or WRITE), report status,
or perform some control function particular to the device.

Data signals - are in the form of a set of bits to be sent to or received from the I/O module.

Status signals - indicate the state of the device. Examples are READY/NOT-READY to show
whether the device is ready for data transfer.

6.3 I/O Modules


During any period of time, the processor may communicate with one or more external devices in
unpredictable patterns, depending on the program’s need for I/O. The internal resources, such as
main memory and the system bus, must be shared among a number of activities, including data I/O.
The major function for an I/O module includes:
 Control and timing
 Processor communication

116
 Device communication
 Data buffering
 Error detection
Control and timing - Coordinate the flow of traffic between internal resources and external devices.
For example, the control of the transfer of data from an external device to the processor might
involve the following sequence of steps:
1. The processor interrogates the I/O module to check the status of the attached device.
2. The I/O module returns the device status.
3. If the device is operational and ready to transmit, the processor requests the transfer of data,
by means of a command to the I/O module.
4. The I/O module obtains a unit of data (e.g., 8 or 16 bits) from the external device.
5. The data are transferred from the I/O module to the processor.
If the system employs a bus, then each of the interactions between the processor and the I/O module
involves one or more bus arbitrations.

Processor communication:
The I/O modules communicate with the processor and with the external device. This involves the
following:
 Command decoding: The I/O module accepts commands from the processor, typically
sent as signals on the control bus.
 Data: Data are exchanged between the processor and the I/O module over the data bus.
 Status reporting: Because peripherals are so slow, it is important to know the status of the
I/O module.
 Address recognition: Just as each word of memory has an address, so does each I/O device.
Thus, an I/O module must recognize one unique address for each peripheral it controls.

Device communication:
The I/O module must be able to perform device communication which involves commands, status
information, and data.
Data buffering:
Whereas the transfer rate into and out of main memory or the processor is quite high, the rate is
orders of magnitude lower for many peripheral devices. Data coming from main memory are sent
to an I/O module in a rapid burst. The data are buffered in the I/O module and then sent to the
peripheral device at its data rate. In the opposite direction, data are buffered so as not to tie up the
memory in a slow transfer operation. Thus, the I/O module must be able to operate at both device

117
and memory speeds. Similarly, if the I/O device operates at a rate higher than the memory access
rate, then the I/O module performs the needed buffering operation.

Error detection:
An I/O module is often responsible for error detection and for subsequently reporting errors to the
processor. One class of errors includes mechanical and electrical malfunctions reported by the
device (e.g., paper jam, bad disk track) and unintentional changes to the bit pattern as it is
transmitted from device to I/O module.

An I/O module that takes on most of the detailed processing burden, presenting a high-level
interface to the processor, is usually referred to as an I/O channel or I/O processor.
An I/O module that requires detailed control is usually referred to as an I/O controller or device
controller.
I/O controllers are commonly seen on microcomputers, whereas I/O channels are used on
mainframes.

6.4 I/O processes


There are 3 principal I/O processes:

 Programmed I/O
 Interrupt-driven I/O
 Direct Memory Access (DMA)

Programmed I/O:

With programmed I/O, I/O occurs under the direct and continuous control of the program
requesting the I/O operation and data are exchanged between the processor and the I/O module.
The processor executes a program that gives it direct control of the I/O operation, including sensing
device status, sending a read or write command, and transferring the data. When the processor
issues a command to the I/O module, it must wait until the I/O operation is complete. If the
processor is faster than the I/O module, this is wasteful of processor time.

Interrupt-driven I/O:

With interrupt-driven I/O, the processor issues an I/O command, continues to execute other
instructions, and is interrupted by the I/O module when the latter has completed its work. A
program issues an I/O command and then continues to execute, until it is interrupted by the I/O
hardware to signal the end of the I/O operation. With both programmed and interrupt I/O, the
118
processor is responsible for extracting data from main memory for output and storing data in main
memory for input.

Direct Memory Access (DMA):

With direct memory access the I/O module and main memory exchange data directly, without
processor involvement. Here a specialized I/O processor takes over control of an I/O operation to
move a large block of data.

6.4.1 Programmed I/O


When the processor is executing a program and encounters an instruction relating to I/O, it executes
that instruction by issuing a command to the appropriate I/O module. With programmed I/O, the
I/O module will perform the requested action and then set the appropriate bits in the I/O status
register. The I/O module takes no further action to alert the processor. In particular, it does not
interrupt the processor. Thus, it is the responsibility of the processor periodically to check the status
of the I/O module until it finds that the operation is complete.
Programmed I/O technique involves two processes:
 I/O commands issued by the processor to the I/O module, and
 I/O instructions executed by the processor.
I/O Commands:
To execute an I/O-related instruction, the processor issues an address, specifying the particular I/O
module and external device, and an I/O command. There are four types of I/O commands that an
I/O module may receive when it is addressed by a processor:
 Control: Used to activate a peripheral and tell it what to do. These commands are tailored
to the particular type of peripheral device.
 Test: Used to test various status conditions associated with an I/O module and its
peripherals. The processor will want to know that the peripheral of interest is powered on
and available for use. It will also want to know if the most recent I/O operation is completed
and if any errors occurred.
 Read: Causes the I/O module to obtain an item of data from the peripheral and place it in
an internal buffer. The processor can then obtain the data item by requesting that the I/O
module place it on the data bus.
 Write: Causes the I/O module to take an item of data (byte or word) from the data bus and
subsequently transmit that data item to the peripheral.

I/O Instructions:

119
With programmed I/O, there is a close correspondence between the I/O-related instructions that the
processor fetches from memory and the I/O commands that the processor issues to an I/O module
to execute the instructions. That is, the instructions are easily mapped into I/O commands, and there
is often a simple one-to-one relationship. The form of the instruction depends on the way in which
external devices are addressed.
Each I/O devices connected through I/O modules to the system is given a unique identifier or
address. When the processor issues an I/O command, the command contains the address of the
desired device. Therefore, each I/O module must interpret the address lines to determine if the
command is for itself.
When the processor, main memory, and I/O share a common bus, two modes of addressing are
possible:
 memory mapped and
 Isolated.

Memory mapped I/O


Memory-mapped I/O involves a single address space for memory locations and I/O devices and
processor treats the status and data registers of I/O modules as memory locations and uses the same
machine instructions to access both memory and I/O devices. So, for example, with 10 address
lines, a combined total of 2 10 = 1024 memory locations and I/O addresses can be supported, in any
combination. With memory-mapped I/O, a single read line and a single write line are needed on
the bus.

Isolated I/O:
With isolated I/O the address space for the I/O is isolated from that for memory and therefore the
bus is equipped with memory read and writes plus input and output command lines. The command
line specifies whether the address refers to a memory location or an I/O device. The full range of
addresses may be available for both. Again, with 10 address lines, the system may now support
both 1024 memory locations and 1024 I/O addresses.

6.4.2 Interrupt-driven I/O


In interrupt-driven I/O the processor issues an I/O command to a module and then go on to do some
other useful work. The I/O module will then interrupt the processor to request service when it is

120
ready to exchange data with the processor. The processor then executes the data transfer, as before,
and then resumes its former processing.
For data input the I/O module receives a READ command from the processor. The I/O module
then proceeds to read data in from an associated peripheral. Once the data are in the module’s data
register, the module signals an interrupt to the processor over a control line. The module then waits
until its data are requested by the processor. When the request is made, the module places its data
on the data bus and is then ready for another I/O operation.
The processor issues a READ command. It then goes off and does something else (e.g., the
processor may be working on several different programs at the same time). At the end of each
instruction cycle, the processor checks for interrupts. When the interrupt from the I/O module
occurs, the processor saves the context (e.g., program counter and processor registers) of the current
program and processes the interrupt. In this case, the processor reads the word of data from the I/O
module and stores it in memory. It then restores the context of the program it was working on (or
some other program) and resumes execution.
An interrupt is basically the process whereby an external device can request for the attention of
the CPU when it is engaged doing other things like processing of normal instructions from memory.

These Interrupts to the CPU can be classified into 2 types:

 Maskable Interrupts
 Non-maskable Interrupts

A Maskable Interrupt is the process the attentions drown by the external device can be delayed or
rejected by the CPU of the computer system. Where as in Non-maskable interrupt the attention or
signal by the external device or interrupt generated to the CPU cannot be delayed or rejected by
the CPU but responded to immediately.

Interrupts to the CPU whether maskable or non-maskable can also be classified as:

 Vectored Interrupt
 Non-vectored Interrupt

A vectored interrupt is an I/O interrupt whether maskable or non-maskable that tells the CPU the
part of the computer that handles I/O interrupts at the hardware level that a request for attention
from an I/O device has been received and also the identity of the device that sent the request. In a
non-vectored interrupt maskable or non-maskable the CPU does not know the address of the device
sending request. It just processes the interrupt request signal generated without knowing where this
signal came from.

121
When the CPU is interrupted, it receives an interrupt signal and then suspends its currently
executing program instructions and jumps to an Interrupt Routine Service (ISR) to respond to the
incoming interrupt. Each interrupt will most probably have its own ISR.

In implementing interrupt, I/O with multiple I/O modules, the processor determines which device
issued the interrupt and decides which one to process first.
Four general categories of techniques are used in identifying devices:
 Multiple interrupt lines
 Software poll
 Daisy chain (hardware poll, vectored)
 Bus arbitration (vectored)

6.4.3 Direct Memory Access (DMA)


DMA involves an additional module on the system bus. The DMA module is capable of mimicking
the processor and, indeed, of taking over control of the system from the processor. It needs to do
this to transfer data to and from memory over the system bus. For this purpose, the DMA module
must use the bus only when the processor does not need it, or it must force the processor to suspend
operation temporarily. When the processor wishes to read or write a block of data, it issues a
command to the DMA module, by sending to the DMA module the following information:
 Whether a read or write is requested, using the read or write control line between the
processor and the DMA module;
 The starting location in memory to read from or write to, communicated on the data lines
and stored by the DMA module in its address register;
 The number of words to be read or written, again communicated via the data lines and
stored in the data count register
The processor then continues with other work. It has delegated this I/O operation to the DMA
module. The DMA module transfers the entire block of data, one word at a time, directly to or from
memory, without going through the processor. When the transfer is complete, the DMA module
sends an interrupt signal to the processor.

6.5 Operating System Support


The Operating System is the software that control the execution of programs on a processor and
that manages the computer’s resources

 Operating System

122
 The Operating System (OS) is the program that manages the resources, provides services
to programmer, schedules execution of other programs.
 The OS masks the details of the hardware from the programmer
 The OS provides programmer a good interface for using the system that consists the new
instructions are called system calls
 The OS mediates programmer and application programs requests for facilities and services
 Services provided by OS
 Program creation general utilities
 Program execution loading, device initializing, etc.
 Standardized access to I/O devices
 Controlled access to files
 Overall system access control
 Types of Operating System
 Interactive OS

User directly interacts with the OS through a keyboard / terminal

o Batch OS

User programs are collected together (“off line”) and submitted to the OS in a batch by an operator.
Print-out results of the program execution is returned to the user

This type of OS is not typical of current machines. This type is used in the mainframes of 60-70s
when the cost of hardware was such that you wanted to keep it busy all the time. O/S was really a
monitor program that focused on job scheduling.

 Multiprogramming / time sharing

An OS is said to be multiprogramming if it supports the simultaneous execution of more than 1


job. In this case a queue of pending jobs is maintained. Current job is swapped out when it is idled
waiting for I/O and other devices. Next pending job is started. Time sharing is the term for
multiprogramming when applied to an interactive system. Either requires much more sophistication
than a typical batch OS: Memory management and Scheduling

6.5.1 OS Scheduling
In multiprogramming OS, multiple jobs are held in memory and alternate between using the CPU,
using I/O, and waiting (idle). The key to high efficiency with multiprogramming is effective
scheduling into the following:

123
 High-level
 Short-term
 I/O

High-level scheduling:

 Determines which jobs are admitted into the system for processing
 Controls the degree of multiprocessing
 Admitted jobs are added to the queue of pending jobs that is managed by the short-term
scheduler
 Works in batch or interactive modes

Short-term scheduling:

 This OS segment runs frequently and determines which pending job will receive the CPU’s
attention next
 Based on the normal changes of state that a job/process goes through
 A process is running in the CPU until:
o It issues a service call to the OS (e.g., for I/O service)
o Process is suspended until the request is satisfied. Process causes and interrupt and is
suspended. External event causes interrupt
 Short-term scheduler is invoked to determine which process is serviced next.
 Queues are simply listing of jobs that are waiting for some type of service (e.g., CPU cycles,
I/ etc.). Once job has been admitted into the system by the high-level scheduler, it is
allocated its portion of memory for its program and data requirements
 The objective of the short-term scheduler is to keep the CPU actively working on one of
the pending jobs -- maximize the used CPU cycles

Problem:
o The high-level scheduler will admit as many jobs as possible (constrained by system
resources such as available memory)
o Despite many jobs in the system, the speed of the CPU (compared to peripherals)
might be such that all jobs are waiting for (slow) I/O and thus the CPU is idled (no
jobs in the ready queue)
o Use memory management to solve this

124
Figure 6.2: Scheduling queues

6.5.2 OS Memory Management


One of important task of OS is to manage the memory system. To avoid having the CPU idle
because all jobs are waiting, one could increase the memory size to hold more jobs.

 Memory is expensive costly solution to the problem


 Programs’ sizes (process sizes) tend to expand to take up available memory

A better approach is to use a mechanism to remove a waiting job from memory and replace it with
one that (hopefully) will run on the CPU in the short term.

 This is the idea of swapping


 Idle processes are removed from the memory and placed in an intermediate queue
 Replace idled job with a “ready” one from the front of the intermediate queue or with a new
job
 Swapping uses I/O operations to read and write to disk.

125

You might also like