Computer Organization Notes
Computer Organization Notes
[R18A0505]
LECTURE NOTES
1
II Year B. Tech CSE - I Sem L T/P/D C
3 -/-/- 3
(R18A0505) COMPUTER ORGANIZATION
Objectives of the course:
To expose the students to the following:
1. How Computer Systems work & the basic principles
2. Instruction Level Architecture and Instruction Execution
3. The current state of art in memory system design
4. How I/O devices are accessed and its principles.
5. To provide the knowledge on Instruction Level Parallelism
6. To impart the knowledge on micro programming
7. Concepts of advanced pipelining techniques.
UNIT I
Basic Functional units of Computers: functional units, basic Operational concepts, Bus structures.
Software, Performance, Multiprocessors, Multicomputer.
Data Representation: Signed number representation, fixed and floating point Representations.
Computer Arithmetic: Addition and subtraction, multiplication Algorithms, Division Algorithms.
Error detection and correction codes
UNIT II
Register Transfer Language and Micro Operations: RTL- Registers, Register transfers, Bus and
memory transfers. Micro operations: Arithmetic, Logic, and Shift micro operations, Arithmetic logic
shift unit.
Basic Computer Organization and Design: Computer Registers, Computer instructions, Instruction
cycle. Instruction codes, Timing and Control, Types of Instructions: Memory Reference Instructions,
Input – Output and Interrupt, Complete Computer Description.
UNIT III
Central Processing Unit organization: General Register Organization, Stack organization, Instruction
formats, Addressing modes, Data Transfer and Manipulation, Program Control, CISC and RISC
processors
Control unit design: Design approaches, Control memory, Address sequencing, micro program
example, design of CU. Micro Programmed Control.
UNIT IV
Memory Organization: Semiconductor memory technologies, hierarchy, Interleaving, Main Memory-
RAM and ROM chips, Address map, Associative memory-Hardware organization. Match logic.
Cache memory-size vs. block size, Mapping functions-Associate, Direct, Set Associative mapping.
Replacement algorithms, write policies. Auxiliary memory-Magnetic tapes etc
UNIT V
Input –Output Organization: Peripheral devices, Input-output subsystems, I/O device interface, I/O
Processor, I/O transfers–Program controlled, Interrupt driven, and DMA, interrupts and exceptions.
I/O device interfaces – SCII, USB
Pipelining and Vector Processing: Basic concepts, Instruction level Parallelism Throughput and
Speedup, Pipeline hazards. Case Sudy- Introduction to x86 architecture.
Text Books:
1. “Computer Organization and Design: The Hardware/Software Interface”, 5th Edition by
David A. Patterson and John L. Hennessy, Elsevier.
2. “Computer Organization and Embedded Systems”, 6th Edition by CarlHamacher,McGraw
Hill Higher Education.
2
Reference Books:
1. “Computer Architecture and Organization”, 3rd Edition by John P. Hayes,WCB/McGraw-
Hill
2. “Computer Organization and Architecture: Designing for Performance”, 10th Edition by
William Stallings, Pearson Education.
3. “Computer System Design and Architecture”, 2nd Edition by Vincent P. Heuring and Harry
F. Jordan, Pearson Education.
Course Outcomes:
Upon completion of this course, students should be able to:
1. Student will learn the concepts of computer organization for several engineering applications.
2. Student will develop the ability and confidence to use the fundamentals of computer
organization as a tool in the engineering of digital systems.
3. An ability to identify, formulate, and solve hardware and software computer engineering
problems using sound computer engineering principles
3
INDEX
4
UNIT I
Syllabus:
Basic Functional units of Computers: functional units, basic Operational concepts, Bus structures.
Software, Performance, Multiprocessors, Multicomputer.
Data Representation: Signed number representation, fixed and floating point Representations.
Computer Arithmetic: Addition and subtraction, multiplication Algorithms, Division Algorithms.
Error detection and correction codes.
Computer Architecture in general covers three aspects of computer design namely: Computer
Hardware, Instruction set Architecture and Computer Organization.
Computer hardware consists of electronic circuits, displays, magnetic and optical storage
media and communication facilities.
Instruction set Architecture is programmer visible machine interface such as instruction set,
registers, memory organization and exception handling. Two main approaches are mainly
CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer)
Computer Organization includes the high level aspects of a design, such as memory
system, the bus structure and the design of the internal CPU.
Computer Types
Computer is a fast electronic calculating machine which accepts digital input, processes it
according to the internally stored instructions (Programs) and produces the result on the
output device. The internal operation of the computer can be as depicted in the figure below:
1
The computers can be classified into various categoriesas given below:
Micro Computer
Laptop Computer
Work Station
Super Computer
Main Frame
Hand Held
Multi core
Laptop Computer: A portable, compact computer that can run on power supply or a battery
unit. All components are integrated as one compact unit. It is generally more expensive than a
comparable desktop. It is also called a Notebook.
Work Station: Powerful desktop computer designed for specialized tasks. Generally used for
tasks that requires a lot of processing speed. Can also be an ordinary personal computer
attached to a LAN (local area network).
Super Computer: A computer that is considered to be fastest in the world. Used to execute
tasks that would take lot of time for other computers. For Ex: Modeling weather systems,
genome sequence, etc (Refer site: http://www.top500.org/)
Main Frame: Large expensive computer capable of simultaneously processing data for
hundreds or thousands of users. Used to store, manage, and process large amounts of data that
need to be reliable, secure, and centralized.
Hand Held: It is also called a PDA (Personal Digital Assistant). A computer that fits into a
pocket, runs on batteries, and is used while holding the unit in your hand. Typically used as
an appointment book, address book, calculator and notepad.
Multi Core: Have Multiple Cores – parallel computing platforms. Many Cores or computing
elements in a single chip. Typical Examples: Sony Play station, Core 2 Duo, i3, i7 etc.
GENERATION OF COMPUTERS
Development of technologies used to fabricate the processors, memories and I/O units of
the computers has been divided into various generations as given below:
First generation
Second generation
Third generation
Fourth generation
Beyond the fourth generation
2
First generation:
1946 to 1955: Computers of this generation used Vacuum Tubes. The computes were built using
stored program concept. Ex: ENIAC, EDSAC, IBM 701.
Computers of this age typically used about ten thousand vacuum tubes. They were bulky in
size had slow operating speed, short life time and limited programming facilities.
Second generation:
1955 to 1965: Computers of this generation used the germanium transistors as the active
switching electronic device. Ex: IBM 7000, B5000, IBM 1401. Comparatively smaller in
size About ten times faster operating speed as compared to first generation vacuum tube
based computers. Consumed less power, had fairly good reliability. Availability of large
memory was an added advantage.
Third generation:
1965 to 1975: The computers of this generation used the Integrated Circuits as the active
electronic components. Ex: IBM system 360, PDP minicomputer etc. They were still smaller
in size. They had powerful CPUs with the capacity of executing 1 million instructions per
second (MIPS). Used to consume very less power consumption.
Fourth generation:
1976 to 1990: The computers of this generation used the LSI chips like microprocessor as
their active electronic element. HCL horizen III, and WIPRO’S Uniplus+ HCL’s Busybee
PC etc.
They used high speed microprocessor as CPU. They were more user friendly and highly reliable
systems. They had large storage capacity disk memories.
3
Functional Unit
A computer in its simplest form comprises five functional units namely input unit, output unit
memory unit, arithmetic & logic unit and control unit. Figure 2 depicts the functional units of
a computer system.
1. Input Unit: Computer accepts encoded information through input unit. The
standard input device is a keyboard. Whenever a key is pressed, keyboard
controller sends the code to CPU/Memory.
Examples include Mouse, Joystick, Tracker ball, Light pen, Digitizer, Scanner etc.
2. Memory Unit: Memory unit stores the program instructions (Code), data
and results of computations etc. Memory unit is classified as:
4
Primary memory is a semiconductor memory that provides access at high speed.
Run time program instructions and operands are stored in the main memory. Main
memory is classified again as ROM and RAM. ROM holds system programs and
firmware routines such as BIOS, POST, I/O Drivers that are essential to manage the
hardware of a computer. RAM is termed as Read/Write memory or user memory that
holds run time program instruction and data. While primary storage is essential, it is
volatile in nature and expensive. Additional requirement of memory could be supplied
as auxiliary memory at cheaper cost. Secondary memories are non volatile in nature.
3. Arithmetic and logic unit: ALU consist of necessary logic circuits like adder,
comparator etc., to perform operations of addition, multiplication, comparison of two
numbers etc.
4. Output Unit: Computer after computation returns the computed results, error
messages, etc. via output unit. The standard output device is a video monitor,
LCD/TFT monitor. Other output devices are printers, plotters etc.
5. Control Unit: Control unit co-ordinates activities of all units by issuing control
signals. Control signals issued by control unit govern the data transfers and then
appropriate operations take place. Control unit interprets or decides the
operation/action to be performed.
2. The CPU fetches those instructions sequentially one-by-one from the main memory,
decodes them and performs the specified operation on associated data operands in
ALU.
4. All activities pertaining to processing and data movement inside the computer
machine are governed by control unit.
5
Basic Operational Concepts
An Instruction consists of two parts, an Operation code and operand/s as shown below:
OPCODE OPERAND/s
This instruction is an addition operation. The following are the steps to execute the
instruction: Step 1: Fetch the instruction from main memory into the processor
Step 2: Fetch the operand at location LOCA from main memory into the processor
Step 3: Add the memory operand (i.e. fetched contents of LOCA) to the contents of register
R0 Step 4: Store the result (sum) in R0.
The same instruction can be realized using two instructions as
Load LOCA,
R1 Add R1,
R0
The steps to execute the instructions can be enumerated as below:
Step 1: Fetch the instruction from main memory into the processor
Step 2: Fetch the operand at location LOCA from main memory into
the processor Register R1
Step 3: Add the content of Register R1 and the contents of register
R0 Step 4: Store the result (sum) in R0.
Figure 3 below shows how the memory and the processor are connected. As shown in the
diagram, in addition to the ALU and the control circuitry, the processor contains a number of
registers used for several different purposes. The instruction register holds the instruction that
is currently being executed. The program counter keeps track of the execution of the program.
It contains the memory address of the next instruction to be fetched and executed. There are n
general purpose registers R0 to Rn-1 which can be used by the programmers during writing
programs.
6
Figure 3: Connections between the processor and the memory
The interaction between the processor and the memory and the direction of flow of
information is as shown in the diagram below:
7
BUS STRUCTURES
Group of lines that serve as connecting path for several devices is called a bus (one bit per
line). Individual parts must communicate over a communication line or path for exchanging
data, address and control information as shown in the diagram below. Printer example –
processor to printer. A common approach is to use the concept of buffer registers to hold the
content during the transfer.
SOFTWARE
If a user wants to enter and run an application program, he/she needs a System Software.
System Software is a collection of programs that are executed as needed to perform functions
such as:
Let’s assume computer with 1 processor, 1 disk and 1 printer and application program is in
machine code on disk. The various tasks are performed in a coordinated fashion, which is
called multitasking. t0, t1 …t5 are the instances of time and the interaction during various
instances as given below:
8
Figure 6 :User program and OS routine sharing of the processor
PERFORMANCE
The most important measure of the performance of a computer is how quickly it
can execute programs. The speed with which a computer executes program is affected
by the design of its hardware. For best performance, it is necessary to design the
compiles, the machine instruction set, and the hardware in a coordinated way.
The total time required to execute the program is elapsed time is a measure of
the performance of the entire computer system. It is affected by the speed of the
processor, the disk and the printer. The time needed to execute a instruction is called the
processor time.
Just as the elapsed time for the execution of a program depends on all units in a
computer system, the processor time depends on the hardware involved in the execution
of individual machine instructions. This hardware comprises the processor and the
memory which are usually connected by the bus.
The pertinent parts of the fig. c is repeated in fig. d which includes the cache
memory as part of the processor unit.
Let us examine the flow of program instructions and data between the memory
and the processor. At the start of execution, all program instructions and the required
data are stored in the main memory. As the execution proceeds, instructions are fetched
one by one over the bus into the processor, and a copy is placed in the cache later if the
same instruction or data item is needed a second time, it is read directly from the cache.
The processor and relatively small cache memory can be fabricated on a single IC chip.
The internal speed of performing the basic steps of instruction processing on chip is
very high and is considerably faster than the speed at which the instruction and data can
be fetched from the main memory. A program will be executed faster if the movement
of instructions and data between the main memory and the processor is minimized,
which is achieved by using the cache.
9
For example:- Suppose a number of instructions are executed repeatedly over a short
period of time as happens in a program loop. If these instructions are available in the
cache, they can be fetched quickly during the period of repeated use. The same applies
to the data that are used repeatedly.
Processor clock:
Processor circuits are controlled by a timing signal called clock. The clock designer the
regular time intervals called clock cycles. To execute a machine instruction the
processor divides the action to be performed into a sequence of basic steps that each step
can be completed in one clock cycle. The length P of one clock cycle is an important
parameter that affects the processor performance.
Processor used in today’s personal computer and work station have a clock rates that
range from a few hundred million to over a billion cycles per second.
We now focus our attention on the processor time component of the total elapsed time.
Let ‘T’ be the processor time required to execute a program that has been prepared
in some high-level language. The compiler generates a machine language object
program that corresponds to the source program. Assume that complete execution of the
program requires the execution of N machine cycle language instructions. The number
N is the actual number of instruction execution and is not necessarily equal to the
number of machine cycle instructions in the object program. Some instruction may be
executed more than once, which in the case for instructions inside a program loop others
may not be executed all, depending on the input data used.
Suppose that the average number of basic steps needed to execute one machine
cycle instruction is S, where each basic step is completed in one clock cycle. If clock
rate is ‘R’ cycles per second, the program execution time is given by
T=N*S/R
Performance measurements:
10
as bench mark programs. But synthetic programs do not properly predict the
performance obtained when real application programs are run.
A non profit organization called SPEC- system performance evaluation corporation
selects and publishes bench marks.
The program selected range from game playing, compiler, and data base applications to
numerically intensive programs in astrophysics and quantum chemistry. In each case,
the program is compiled under test, and the running time on a real computer is
measured. The same program is also compiled and run on one computer selected as
reference.
The ‘SPEC’ rating is computed as follows.
Running time on the reference computer
SPEC rating = ---------------------------------------------------
Running time on the computer under test
If the SPEC rating = 50
Large computers that contain a number of processor units are called multiprocessor
system. These systems either execute a number of different application tasks in parallel
or execute subtasks of a single large task in parallel. All processors usually have access
to all memory locations in such system & hence they are called shared memory
multiprocessor systems. The high performance of these systems comes with much
increased complexity and cost. In contrast to multiprocessor systems, it is also possible
to use an interconnected group of complete computers to achieve high total
computational power. These computers normally have access to their own memory units
when the tasks they are executing need to communicate data they do so by exchanging
messages over a communication network. This properly distinguishes them from shared
memory multiprocessors, leading to name message-passing multi computer.
Data Representation:
11
Numeric Data Representation
12
NB: In all systems, the leftmost bit is 0 for positive number and 1 for negative number.
Floating-point representation
Floating-point numbers are so called as the decimal or binary point floats over the base
depending on the exponent value.
It consists two components.
• Exponent
• Mantissa
Example: Avogadro's number can be written as 6.02x1023 in base 10. And the mantissa
and exponent are 6.02 and 1023 respctivly. But computer floating-point numbers are
usually based on base two. So 6.02x1023 is approximately (1 and 63/64)x278 or
1.111111 (base two) x 21001110 (base two)
Error Detection Codes
Parity System
Hamming Distance
CRC
Check sum
13
UNIT 2
Syllabus:
Register Transfer Language and Micro Operations: RTL- Registers, Register transfers,
Bus and memory transfers. Micro operations: Arithmetic, Logic, and Shift micro
operations, Arithmetic logic shift unit.
Basic Computer Organization and Design: Computer Registers, Computer instructions,
Instruction cycle. Instruction codes, Timing and Control, Types of Instructions: Memory
Reference Instructions, Input – Output and Interrupt, Complete Computer Description.
Register Transfer
14
Designate information transfer from one register to
another by R2 R1
This statement implies that the hardware is available
o The outputs of the source must have a path to the inputs of the
destination
o The destination register has a parallel load capability
If the transfer is to occur only under a predetermined control condition,
designate it by
If (P = 1) then (R2 R1)
or
P: R2 R1,
15
It is assumed that all transfers occur during a clock edge transition
All microoperations written on a single line are to be executed at the same time T:
R2 R1, R1 R2
Rather than connecting wires between all registers, a common bus is used
A bus structure consists of a set of common lines, one for each bit of a register
Control signals determine which register is selected by the bus during each
transfer
Multiplexers can be used to construct a common bus
16
Multiplexers select the source register whose binary information is then placed on the
bus
The select lines are connected to the selection inputs of the multiplexers and
choose the bits of one register
Instead of using multiplexers, three-state gates can be used to construct the bus
system
A three-state gate is a digital circuit that exhibits three states
Two of the states are signals equivalent to logic 1 and 0
The third state is a high-impedance state – this behaves like an open circuit, which
means the output is disconnected and does not have a logic significance
17
The three-state buffer gate has a normal input and a control input which
determines the output state
With control 1, the output equals the normal input
With control 0, the gate goes to a high-impedance state
This enables a large number of three-state gate outputs to be connected with wires to
form a common bus line without endangering loading effects
Decoders are used to ensure that no more than one control input is active at any
given time
This circuit can replace the multiplexer in Figure 4.3
To construct a common bus for four registers of n bits each using three-state
buffers, we need n circuits with four buffers in each
Only one decoder is necessary to select between the four registers
Designate a memory word by the letter M
It is necessary to specify the address of M when writing memory transfer
operations
Designate the address register by AR and the data register by DR
The read operation can be stated as: Read: DR M[AR]
The write operation can be stated as:
Write: M[AR] R1
18
Arithmetic Microoperations
To implement the add microoperation with hardware, we need the registers that
hold the data and the digital component that performs the addition
A full-adder adds two bits and a previous carry
19
A binary adder is a digital circuit that generates the arithmetic sum of two binary
numbers of any length
A binary added is constructed with full-adder circuits connected in cascade
An n-bit binary adder requires n full-adders
20
Each of the arithmetic microoperations can be implemented in one composite
arithmetic circuit
The basic component is the parallel adder
Multiplexers are used to choose between the different operations
The output of the binary adder is calculated from the following sum: D =
A + Y + Cin
21
Logic Microoperations
Logic operations specify binary operations for strings of bits stored in registers and
treat each bit separately
Example: the XOR of R1 and R2 is symbolized by
P: R1 R1 ⊕ R2
Example: R1 = 1010 and R2 = 1100
1010 Content of R1
1100 Content of R2
0110 Content of R1 after P = 1
22
The hardware implementation of logic microoperations requires that logic gates be
inserted for each bit or pair of bits in the registers
All 16 microoperations can be derived from using four logic gates
23
Logic microoperations can be used to change bit values, delete a group of bits, or
insert new bit values into a register
The selective-set operation sets to 1 the bits in A where there are corresponding 1’s
in B
1010 A before
1100 B (logic operand)
1110 A after
A A B
A A⊕B
The selective-clear operation clears to 0 the bits in A only where there are
corresponding 1’s in B
1010 A before
1100 B (logic operand)
0010 A after
A A B
The mask operation is similar to the selective-clear operation, except that the bits of
A are cleared only where there are corresponding 0’s in B
1010 A before
1100 B (logic operand)
1000 A after
A A B
24
The insert operation inserts a new value into a group of bits
This is done by first masking the bits to be replaced and then Oring them with the bits
to be inserted
0110 1010 A before
0000 1111 B (mask)
0000 1010 A after masking
The clear operation compares the bits in A and B and produces an all 0’s result if the
two number are equal
1010 A
1010 B
0000 A A⊕B
Shift Microoperations
The circular shift (aka rotate) circulates the bits of the register around the two
ends without loss of information
The symbols cil and cir are for circular shift left and right
The arithmetic shift shifts a signed binary number to the left or right
To the left is multiplying by 2, to the right is dividing by 2
Arithmetic shifts must leave the sign bit unchanged
A sign reversal occurs if the bit in Rn-1 changes in value after the shift
This happens if the multiplication causes an overflow
An overflow flip-flop Vs can be used to detect the overflow Vs
= Rn-1 ⊕ Rn-2
25
A bi-directional shift unit with parallel load could be used to implement this
Two clock pulses are necessary with this configuration: one to load the value and
another to shift
In a processor unit with many registers it is more efficient to implement the shift
operation with a combinational circuit
The content of a register to be shifted is first placed onto a common bus and the output
is connected to the combinational shifter, the shifted number is then loaded back into
the register
This can be constructed with multiplexers
26
27
Basic Computer Organization and Design
Instruction codes. Computer Registers Computer instructions, Timing and Control, Instruction
cycle. Memory Reference Instructions, Input – Output and Interrupt, Complete Computer
Description.
Micro Programmed Control: Control memory, Address sequencing, micro program example,
design of control unit, micro Programmed control
------------------------------------------------------------------------------------------------------------
Instruction Formats:
A computer will usually have a variety of instruction code formats. It is the function
of the control unit within the CPU to interpret each instruction code and provide the
necessary control functions needed to process the instruction.
The format of an instruction is usually depicted in a rectangular box symbolizing the
bits of the instruction as they appear in memory words or in a control register. The bits of
the instruction are divided into groups called fields. The most common fields found in
instruction formats are:
1 An operation code field that specifies the operation to be performed.
2. An address field that designates a memory address or a processor registers.
3. A mode field that specifies the way the operand or the effective address is
determined.
Other special fields are sometimes employed under certain circumstances, as for
example a field that gives the number of shifts in a shift-type instruction.
The operation code field of an instruction is a group of bits that define various
processor operations, such as add, subtract, complement, and shift. The bits that define the
28
mode field of an instruction code specify a variety of alternatives for choosing the operands
from the given address.
Operations specified by computer instructions are executed on some data stored in
memory or processor registers, Operands residing in processor registers are specified with a
register address. A register address is a binary number of k bits that defines one of 2 k
registers in the CPU. Thus a CPU with 16 processor registers R0 through R15 will have a
register address field of four bits. The binary number 0101, for example, will designate
register R5.
Where X is the address of the operand. The ADD instruction in this case results in the
operation AC ← AC + M[X]. AC is the accumulator register and M[X] symbolizes the
memory word located at address X.
An example of a general register type of organization was presented in Fig. 7.1. The
instruction format in this type of computer needs three register address fields. Thus the
instruction for an arithmetic addition may be written in an assembly language as
29
MOV R1, R2
Denotes the transfer R1 ← R2 (or R2 ← R1, depending on the particular computer).
Thus transfer-type instructions need two address fields to specify the source and the
destination.
General register-type computers employ two or three address fields in their
instruction format. Each address field may specify a processor register or a memory word.
An instruction symbolized by
ADD R1, X
Would specify the operation R1 ← R + M [X]. It has two address fields, one for
register R1 and the other for the memory address X.
The stack-organized CPU was presented in Fig. 8-4. Computers with stack
organization would have PUSH and POP instructions which require an address field. Thus
the instruction
PUSH X
Will push the word at address X to the top of the stack. The stack pointer is updated
automatically. Operation-type instructions do not need an address field in stack-organized
computers. This is because the operation is performed on the two items that are on top of
the stack. The instruction ADD in a stack computer consists of an operation code only with
no address field. This operation has the effect of popping the two top numbers from the
stack, adding the numbers, and pushing the sum into the stack. There is no need to specify
operands with an address field since all operands are implied to be in the stack.
To illustrate the influence of the number of addresses on computer programs, we will
evaluate the arithmetic statement X = (A + B) ∗ (C + D).
Using zero, one, two, or three address instruction. We will use the symbols ADD,
SUB, MUL, and DIV for the four arithmetic operations; MOV for the transfer-type
operation; and LOAD and STORE for transfers to and from memory and AC register. We
will assume that the operands are in memory addresses A, B, C, and D, and the result must
be stored in memory at address X.
Three-Address Instructions
Computers with three-address instruction formats can use each address field to
specify either a processor register or a memory operand. The program in assembly
language that evaluates X = (A + B) ∗ (C + D) is shown below, together with comments
that explain the register transfer operation of each instruction.
30
ADD R1, A, B R1 ← M [A] + M [B]
ADD R2, C, D R2 ← M [C] + M [D]
MUL X, R1, R2 M [X] ← R1 ∗ R2
It is assumed that the computer has two processor registers, R1 and R2. The symbol M [A]
denotes the operand at memory address symbolized by A.
The advantage of the three-address format is that it results in short programs when
evaluating arithmetic expressions. The disadvantage is that the binary-coded instructions
require too many bits to specify three addresses. An example of a commercial computer
that uses three-address instructions is the Cyber 170. The instruction formats in the Cyber
computer are restricted to either three register address fields or two register address fields
and one memory address field.
Two-Address Instructions
Two address instructions are the most common in commercial computers. Here again each
address field can specify either a processor register or a memory word. The program to
evaluate X = (A + B) ∗ (C + D) is as follows:
MOV R1, A R1 ← M [A]
ADD R1, B R1 ← R1 + M [B]
MOV R2, C R2 ← M [C]
ADD R2, D R2 ← R2 + M [D]
MUL R1, R2 R1 ← R1∗R2
MOV X, R1 M [X] ← R1
The MOV instruction moves or transfers the operands to and from memory and
processor registers. The first symbol listed in an instruction is assumed to be both a source
and the destination where the result of the operation is transferred.
One-Address Instructions
One-address instructions use an implied accumulator (AC) register for all data
manipulation. For multiplication and division there is a need for a second register.
However, here we will neglect the second and assume that the AC contains the result of tall
operations. The program to evaluate X = (A + B) ∗ (C + D) is
LOAD A AC ← M [A]
ADD B AC ← A [C] + M [B]
STORE T M [T] ← AC
LOAD C AC ← M [C]
ADD D AC ← AC + M [D]
31
MUL T AC ← AC ∗ M [T]
STORE X M [X] ← AC
All operations are done between the AC register and a memory operand. T is the
address of a temporary memory location required for storing the intermediate result.
Zero-Address Instructions
A stack-organized computer does not use an address field for the instructions ADD
and MUL. The PUSH and POP instructions, however, need an address field to specify the
operand that communicates with the stack. The following program shows how X = (A + B)
∗ (C + D) will be written for a stack organized computer. (TOS stands for top of stack)
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ← (A + B)
PUSH C TOS ← C
PUSH D TOS ← D
ADD TOS ← (C + D)
MUL TOS ← (C + D) ∗ (A + B)
POP X M [X] ← TOS
To evaluate arithmetic expressions in a stack computer, it is necessary to convert the
expression into reverse Polish notation. The name “zero-address” is given to this type of
computer because of the absence of an address field in the computational instructions.
Instruction Codes
A set of instructions that specify the operations, operands, and the sequence by which
processing has to occur. An instruction code is a group of bits that tells the computer to perform
a specific operation part.
Format of Instruction
The format of an instruction is depicted in a rectangular box symbolizing the bits of an
instruction. Basic fields of an instruction format are given below:
1. An operation code field that specifies the operation to be performed.
2. An address field that designates the memory address or register.
3. A mode field that specifies the way the operand of effective address is determined.
Computers may have instructions of different lengths containing varying number of addresses.
The number of address field in the instruction format depends upon the internal organization of
its registers.
32
Addressing Modes
To understand the various addressing modes to be presented in this section, it is imperative
that we understand the basic operation cycle of the computer. The control unit of a computer
is designed to go through an instruction cycle that is divided into three major phases:
33
instructions since the operands are implied to be on top of the stack.
2 Immediate Mode: In this mode the operand is specified in the instruction itself.
Inother words, an immediate- mode instruction has an operand field rather than an address
field. The operand field contains the actual operand to be used in conjunction with the
operation specified in the instruction. Immediate-mode instructions are useful for
initializing registers to a constant value.
It was mentioned previously that the address field of an instruction may specify either
a memory word or a processor register. When the address field specifies a processor
register, the instruction is said to be in the register mode.
3 Register Mode: In this mode the operands are in registers that reside within the
CPU.The particular register is selected from a register field in the instruction. A k-bit field
The address field of an instruction is used by the control unit in the CPU to obtain the
operand from memory. Sometimes the value given in the address field is the address of the
operand, but sometimes it is just an address from which the address of the operand is
calculated. To differentiate among the various addressing modes it is necessary to
distinguish between the address part of the instruction and the effective address used by the
34
control when executing the instruction. The effective address is defined to be the memory
address obtained from the computation dictated by the given addressing mode. The
effective address is the address of the operand in a computational-type instruction. It is the
address where control branches in response to a branch-type instruction. We have already
defined two addressing modes in previous chapter.
6 Direct Address Mode: In this mode the effective address is equal to the address part
ofthe instruction. The operand resides in memory and its address is given directly by the
address field of the instruction. In a branch-type instruction the address field specifies the
actual branch address.
7 Indirect Address Mode: In this mode the address field of the instruction gives theaddress
where the effective address is stored in memory. Control fetches the instruction from
memory and uses its address part to access memory again to read the effective address.
8 Relative Address Mode: In this mode the content of the program counter is added to
theaddress part of the instruction in order to obtain the effective address. The address part
of the instruction is usually a signed number (in 2’s complement representation) which
can be either positive or negative. When this number is added to the content of the
program counter, the result produces an effective address whose position in memory is
relative to the address of the next instruction. To clarify with an example, assume that the
program counter contains the number 825 and the address part of the instruction contains
the number 24. The instruction at location 825 is read from memory during the fetch
phase and the program counter is then incremented by one to 826 + 24 = 850. This is 24
memory locations forward from the address of the next instruction. Relative addressing is
often used with branch-type instructions when the branch address is in the area
surrounding the instruction word itself. It results in a shorter address field in the
instruction format since the relative address can be specified with a smaller number of bits
compared to the number of bits required to designate the entire memory address.
9 Indexed Addressing Mode: In this mode the content of an index register is added to
theaddress part of the instruction to obtain the effective address. The index register is a
special CPU register that contains an index value. The address field of the instruction
defines the beginning address of a data array in memory. Each operand in the array is
stored in memory relative to the beginning address. The distance between the beginning
address and the address of the operand is the index value stores in the index register.
Any operand in the array can be accessed with the same instruction provided that the
35
index register contains the correct index value. The index register can be incremented to
facilitate access to consecutive operands. Note that if an index-type instruction does not
include an address field in its format, the instructionconverts to the register indirect
mode of operation. Some computers dedicate one CPU register to function solely as an
index register. This register is involved implicitly when the index-mode instruction is
used. In computers with many processor registers, any one of the CPU registers can
contain the index number. In such a case the register must be specified explicitly in a
register field within the instruction format.
10 Base Register Addressing Mode: In this mode the content of a base register is
added tothe address part of the instruction to obtain the effective address. This is similar
to the indexed addressing mode except that the register is now called a base register
instead of an index register. The difference between the two modes is in the way they
are used rather than in the way that they are computed. An index register is assumed to
hold an index number that is relative to the address part of the instruction. A base
register is assumed to hold a base address and the address field of the instruction gives a
displacement relative to this base address. The base register addressing mode is used in
computers to facilitate the relocation of programs in memory. When programs and data
are moved from one segment of memory to another, as required in multiprogramming
systems, the address values of the base register requires updating to reflect the
beginning of a new memory segment.
Numerical Example
36
Computer Registers
37
Register Number Register Register
symbol of bits name Function
Computer Instructions:
The basic computer has 16 bit instruction register (IR) which can denote either memory
reference or register reference or input-output instruction.
1. Memory Reference – These instructions refer to memory address as an operand. The
other operand is always accumulator. Specifies 12 bit address, 3 bit opcode (other than
111) and 1 bit addressing mode for direct and indirect addressing.
Example –
IR register contains = 0001XXXXXXXXXXXX, i.e. ADD after fetching and decoding of
instruction we find out that it is a memory reference instruction for ADD operation.
Hence, DR <- M[AR]
AC <- AC+ DR, SC <- 0
2. Register Reference – These instructions perform operations on registers rather than
memory addresses. The IR(14-12) is 111 (differentiates it from memory reference) and
IR(15) is 0 (differentiates it from input/output instructions). The rest 12 bits specify
register operation.
Example –
IR register contains = 0111001000000000, i.e. CMA after fetch and decode cycle we find
out that it is a register reference instruction for complement accumulator.
Hence, AC <- ~AC
3. Input/Output – These instructions are for communication between computer and
outside environment. The IR(14-12) is 111 (differentiates it from memory reference) and
IR(15) is 1 (differentiates it from register reference instructions). The rest 12 bits specify
I/O operation.
Example –
IR register contains = 1111100000000000, i.e. INP after fetch and decode cycle we find
out that it is an input/output instruction for inputing character. Hence, INPUT character
from peripheral device.
38
Timing and Control
All sequential circuits in the Basic Computer CPU are driven by a master clock, with the
exception of the INPR register. At each clock pulse, the control unit sends control signals to
control inputs of the bus, the registers, and the ALU.
Control unit design and implementation can be done by two general methods:
A hardwired control unit is designed from scratch using traditional digital logic design
techniques to produce a minimal, optimized circuit. In other words, the control unit is
like an ASIC (application-specific integrated circuit).
A microprogrammed control unit is built from some sort of ROM. The desired control
signals are simply stored in the ROM, and retrieved in sequence to drive the
microoperations needed by a particular instruction.
Instruction Cycle
The CPU performs a sequence of microoperations for each instruction. The sequence for each
instruction of the Basic Computer can be refined into 4 abstract phases:
1. Fetch instruction
2. Decode
3. Fetch operand
4. Execute
1. Program execution
a. Instruction 1
i. Fetch instruction
ii. Decode
iii. Fetch operand
iv. Execute
b. Instruction 2
i. Fetch instruction
ii. Decode
iii. Fetch operand
iv. Execute
c. Instruction 3 ...
After this, the SC is incremented at each clock cycle until an instruction is completed, and then
it is cleared to begin the next instruction. This process repeats until a HLT instruction is
executed, or until the power is shut off.
39
Everything that happens in this phase is driven entirely by timing variables T 0, T1 and T2.
Hence, all control inputs in the CPU during fetch and decode are functions of these three
variables alone.
T0: AR ← PC
T1: IR ← M[AR], PC ← PC + 1
Control Memory
40
address generator circuit (sequencer) and then transferred into the CAR to read the
next microinstructions
Typical functions of a sequencer are:
o incrementing the CAR by one
o loading into the CAR and address from control memory
o transferring an external address
o loading an initial address to start the control operations
A clock is applied to the CAR and the control word and next-address information are
taken directly from the control memory
The address value is the input for the ROM and the control work is the output
No read signal is required for the ROM as in a RAM
The main advantage of the microprogrammed control is that once the hardware
configuration is established, there should be no need for h/w or wiring changes
To establish a different control sequence, specify a different set of
microinstructions for control memory
Address Sequencing
Microinstructions are stored in control memory in groups, with each group specifying a
routine
Each computer instruction has its own microprogram routine to generate the
microoperations
The hardware that controls the address sequencing of the control memory must be capable
of sequencing the microinstructions within a routine and be able to branch from one
routine to another
Steps the control must undergo during the execution of a single computer
instruction:
o Load an initial address into the CAR when power is turned on in the computer. This
address is usually the address of the first microinstruction that activates the
instruction fetch routine – IR holds instruction
o The control memory then goes through the routine to determine the effective
address of the operand – AR holds operand address
o The next step is to generate the microoperations that execute the instruction
by considering the opcode and applying a mapping
o After execution, control must return to the fetch routine by executing an
unconditional branch
The microinstruction in control memory contains a set of bits to initiate
microoperations in computer registers and other bits to specify the method by
which the next address is obtained
Conditional branching is obtained by using part of the microinstruction to select a specific
status bit in order to determine its condition
The status conditions are special bits in the system that provide parameter information
such as the carry-out of an adder, the sign bit of a number, the mode bits of an
instruction, and i/o status conditions
The status bits, together with the field in the microinstruction that specifies a branch
address, control the branch logic
The branch logic tests the condition, if met then branches, otherwise, increments the
CAR
41
If there are 8 status bit conditions, then 3 bits in the microinstruction are used to specify
the condition and provide the selection variables for the multiplexer
For unconditional branching, fix the value of one status bit to be one load the branch
address from control memory into the CAR
A special type of branch exists when a microinstruction specifies a branch to the first
word in control memory where a microprogram routine is located
The status bits for this type of branch are the bits in the opcode
Assume an opcode of four bits and a control memory of 128 locations
The mapping process converts the 4-bit opcode to a 7-bit address for control
memory
This provides for each computer instruction a microprogram routine with a
capacity of four microinstructions
Subroutines are programs that are used by other routines to accomplish a particular
task and can be called from any point within the main body of the microprogram
Frequently many microprograms contain identical section of code
Microinstructions can be saved by employing subroutines that use common
sections of microcode
Microprograms that use subroutines must have a provisions for storing the return address
during a subroutine call and restoring the address during a subroutine return
A subroutine register is used as the source and destination for the addresses
42
UNIT 3
Syllabus:
Central Processing Unit organization: General Register Organization, Stack organization,
Instruction formats, Addressing modes, Data Transfer and Manipulation, Program Control,
CISC and RISC processors
Control unit design: Design approaches, Control memory, Address sequencing, micro program
example, design of CU. Micro Programmed Control.
Introduction to CPU
To do these tasks, it should be clear that the CPU needs to store some data temporarily. It must
remember the location of the last instruction so that it can know where to get the next
instruction. It needs to store instructions and data temporarily while an instruction is being
executed. In other words, the CPU needs a small internal memory. These storage locations are
generally referred as registers.
The major components of the CPU are an arithmetic and logic unit (ALU) and a control unit
(CU). The ALU does the actual computation or processing of data. The CU controls the
movement of data and instruction into and out of the CPU and controls the operation of the
ALU.
The CPU is connected to the rest of the system through system bus. Through system bus, data or
information gets transferred between the CPU and the other component of the system. The
system bus may have three components:
Data Bus: Data bus is used to transfer the data between main memory and CPU.
Address Bus: Address bus is used to access a particular memory location by putting the address
of the memory location.
Control Bus: Control bus is used to provide the different control signal generated by CPU to
different part of the system.
As for example, memory read is a signal generated by CPU to indicate that a memory read
operation has to be performed. Through control bus this signal is transferred to memory module
to indicate the required operation.
43
Figure 1: CPU with the system bus.
There are three basic components of CPU: register bank, ALU and Control Unit. There are
several data movements between these units and for that an internal CPU bus is used. Internal
CPU bus is needed to transfer data between the various registers and the ALU.
Stack Organization:
A useful feature that is included in the CPU of most computers is a stack or last in, first out
(LIFO) list. A stack is a storage device that stores information in such a manner that the item
stored last is the first item retrieved. The operation of a stack can be compared to a stack of
trays. The last tray placed on top of the stack is the first to be taken off.
The stack in digital computers is essentially a memory unit with an address register that can
only( after an initial value is loaded in to it).The register that hold the address for the stack is
called a stack pointer (SP) because its value always points at the top item in stack. Contrary to a
stack of trays where the tray it self may be taken out or inserted, the physical registers of a stack
are always available for reading or writing.
44
The two operation of stack are the insertion and deletion of items. The operation of insertion is
called PUSH because it can be thought of as the result of pushing a new item on top. The
operation of deletion is called POP because it can be thought of as the result of removing one
item so that the stack pops up. However, nothing is pushed or popped in a computer stack.
These operations are simulated by incrementing or decrementing the stack pointer register.
Register stack:
A stack can be placed in a portion of a large memory or it can be organized as a collection of
a finite number of memory words or registers. Figure X shows the organization of a 64-word
register stack. The stack pointer register SP contains a binary number whose value is equal to
the address of the word that is currently on top of the stack. Three items are placed in the stack:
A, B, and C in the order. item C is on the top of the stack so that the content of sp is now 3. To
remove the top item, the stack is popped by reading the memory word at address 3 and
decrementing the content of SP. Item B is now on top of the stack since SP holds address 2. To
insert a new item, the stack is pushed by incrementing SP and writing a word in the next higher
location in the stack. Note that item C has read out but not physically removed. This does not
matter because when the stack is pushed, a new item is written in its place.
In a 64-word stack, the stack pointer contains 6 bits because 26 =64. since SP has only six bits,
it cannot exceed a number grater than 63(111111 in binary). When 63 is incremented by 1, the
result is 0 since 111111 + 1 =1000000 in binary, but SP can accommodate only the six least
significant bits. Similarly, when 000000 is decremented by 1, the result is 111111. The one bit
register Full is set to 1 when the stack is full, and the one-bit register EMTY is set to 1 when the
stack is empty of items. DR is the data register that holds the binary data to be written in to or
read out of the stack.
Initially, SP is cleared to 0, Emty is set to 1, and Full is cleared to 0, so that SP points to the
word at address o and the stack is marked empty and not full. if the stack is not full , a new item
is inserted with a push operation. the push operation is implemented with the following
sequence of micro-operation.
45
The stack pointer is incremented so that it points to the address of the next-higher word. A
memory write operation inserts the word from DR into the top of the stack. Note that SP holds
the address of the top of the stack and that M(SP) denotes the memory word specified by the
address presently available in SP, the first item stored in the stack is at address 1. The last item
is stored at address 0, if SP reaches 0, the stack is full of item, so FULLL is set to 1. This
condition is reached if the top item prior to the last push was in location 63 and after increment
SP, the last item stored in location 0. Once an item is stored in location 0, there are no more
empty register in the stack. If an item is written in the stack, Obviously the stack can not be
empty, so EMTY is cleared to 0.
The top item is read from the stack into DR. The stack pointer is then decremented. if its value
reaches zero, the stack is empty, so Emty is set to 1. This condition is reached if the item read
was in location 1. once this item is read out , SP is decremented and reaches the value 0, which
is the initial value of SP. Note that if a pop operation reads the item from location 0 and then SP
is decremented, SP changes to 111111, which is equal to decimal 63. In this configuration, the
word in address 0 receives the last item in the stack. Note also that an erroneous operation will
result if the stack is pushed when FULL=1 or popped when EMTY =1.
Memory Stack :
A stack can exist as a stand-alone unit as in figure 4 or can be implemented in a random access
memory attached to CPU. The implementation of a stack in the CPU is done by assigning a
portion of memory to a stack operation and using a processor register as a stack pointer. Figure
shows a portion of computer memory partitioned in to three segment program, data and stack.
The program counter PC points at the address of the next instruction in the program. The
address register AR points at an array of data. The stack pointer SP points at the top of the stack.
The three register are connected to a common address bus, and either one can provide an address
for memory. PC is used during the fetch phase to read an instruction. AR is used during the
execute phase to read an operand. SP is used to push or POP items into or from the stack.
As show in figure :4 the initial value of SP is 4001 and the stack grows with decreasing
addresses. Thus the first item stored in the stack is at address 4000, the second item is stored at
address 3999, and the last address hat can be used for the stack is 3000. No previous are
available for stack limit checks. We assume that the items in the stack communicate with a data
register DR. A new item is inserted with the push operation as follows.
SP← SP-1
M[SP] ← DR
The stack pointer is decremented so that it points at the address of the next word. A Memory
write operation insertion the word from DR into the top of the stack. A new item is deleted with
a pop operation as follows.
DR← M[SP]
SP←SP + 1
46
The top item is read from the stack in to DR. The stack pointer is then incremented to point at
the next item in the stack.
Most computer do not provide hardware to check for stack overflow (FULL) or underflow
(Empty). The stack limit can be checked by using two prossor register :
one to hold upper limit and other hold the lower limit. after the pop or push operation SP is
compared with lower or upper limit register.
INSTRUCTION FORMATS:
We know that a machine instruction has an opcode and zero or more operands. Encoding an
instruction set can be done in a variety of ways. Architectures are differentiated from one
another by the number of bits allowed per instruction (16, 32, and 64 are the most common), by
the number of operands allowed per instruction, and by the types of instructions and data each
can process. More specifically, instruction sets are differentiated by the following features:
1. Operand storage in the CPU (data can be stored in a stack structure or in registers)
2. Number of explicit operands per instruction (zero, one, two, and three being the most
common)
3. Operand location (instructions can be classified as register-to-register, register-to-memory or
memory-to-memory, which simply refer to the combinations of operands allowed per
instruction)
4. Operations (including not only types of operations but also which instructions can access
memory and which cannot)
5. Type and size of operands (operands can be addresses, numbers, or even characters)
47
Number of Addresses:
One of the characteristics of the ISA(Industrial Standard Architecture) that shapes the
architecture is the number of addresses used in an instruction. Most operations can be divided
into binary or unary operations. Binary operations such as addition and multiplication require
two input operands whereas the unary operations such as the logical NOT need only a single
operand. Most operations produce a single result. There are exceptions, however. For example,
the division operation produces two outputs: a quotient and a remainder. Since most operations
are binary, we need a total of three addresses: two addresses to specify the two input operands
and one to specify where the result should go.
Three-Address Machines:
In three-address machines, instructions carry all three addresses explicitly. The RISC
processors use three addresses. Table X1 gives some sample instructions of a three-address
machine.
Instruction Semantics
add dest,src1,src2 Adds the two values at src1 and src2 and
stores the result in dest
M(dest) = [src1] + [src2]
sub dest,src1,src2 Subtracts the second
source operand at src2 from the first at
src1 and stores the result in dest
M(dest) = [src1] - [src2]
mult dest,src1,src2 Multiplies the two values at src1
and src2 and stores the result in dest
M(dest) = [src1] * [src2]
We use the notation that each variable represents a memory address that stores the value
associated with that variable. This translation from symbol name to the memory address is done
by using a symbol table.
As you can see from this code, there is one instruction for each arithmetic operation. Also
notice that all instructions, barring the first one, use an address twice. In the middle three
instructions, it is the temporary T and in the last one, it is A. This is the motivation for using two
addresses, as we show next.
48
Two-Address Machines :
In two-address machines, one address doubles as a source and destination. Usually, we use dest
to indicate that the address is used for destination. But you should note that this address also
supplies one of the source operands. The Pentium is an example processor that uses two
addresses. Sample instructions of a two-address machine
Instruction Semantics
Since we use only two addresses, we use a load instruction to first copy the C value into a
temporary represented by T. If you look at these six instructions, you will notice that the
operand T is common. If we make this our default, then we don’t need even two addresses: we
can get away with just one address.
One-Address Machines :
In the early machines, when memory was expensive and slow, a special set of registers was
used to provide an input operand as well as to receive the result from the ALU. Because of this,
these registers are called the accumulators. In most machines, there is just a single accumulator
register. This kind of design, called accumulator machines, makes sense if memory is expensive.
In accumulator machines, most operations are performed on the contents of the accumulator
and the operand supplied by the instruction. Thus, instructions for these machines need to
specify only the address of a single operand. There is no need to store the result in memory: this
reduces the need for larger memory as well as speeds up the computation by reducing the
number of memory accesses. A few sample accumulator machine instructions are shown in
49
Table X3.
In these machines, the C statement
A=B+C*D-E+F+A
is converted to the following code:
50
M(addr) = pop
add Adds the top two values on the stack and pushes the result
onto the stack
push(pop + pop)
Subtracts the second top value from the top value of the
sub stack
and pushes the result onto the stack
push(pop – pop)
Multiplies the top two values in the stack and pushes the
mult result
onto the stack
push(pop * pop)
Notice that the first two instructions are not zero-address instructions. These two are
special instructions that use a single address and are used to move data between memory and
stack.
All other instructions use the zero-address format. Let’s see how the stack machine
translates the arithmetic expression we have seen in the previous subsections. In these machines,
the C statement
A=B+C*D-E+F+A
is converted to the following code:
push E ; <E>
push C ; <C, E>
push D ; <D, C, E>
mult ; <C*D, E>
push B ; <B, C*D, E>
add ; <B+C*D, E>
sub ; <B+C*D-E>
push F ; <F, B+D*C-E>
add ; <F+B+D*C-E>
push A ; <A, F+B+D*C-E>
add ; <A+F+B+D*C-E>
pop A ;<>
On the right, we show the state of the stack after executing each instruction. The top
element of the stack is shown on the left. Notice that we pushed E early because we need to
subtract it from (B+C*D).
Stack machines are implemented by making the top portion of the stack internal to the
processor. This is referred to as the stack depth. The rest of the stack is placed in memory. Thus,
to access the top values that are within the stack depth, we do not have to access the memory.
Obviously, we get better performance by increasing the stack depth.
INSTRUCTION TYPES
Most computer instructions operate on data; however, there are some that do not.
Computer manufacturers regularly group instructions into the following categories:
• Data movement
• Arithmetic
• Boolean
• Bit manipulation (shift and rotate)
• I/O
51
• Transfer of control
• Special purpose
Data movement instructions are the most frequently used instructions. Data is moved
from memory into registers, from registers to registers, and from registers to memory, and many
machines provide different instructions depending on the source and destination. For example,
there may be a MOVER instruction that always requires two register operands, whereas a
MOVE instruction allows one register and one memory operand.
Some architectures, such as RISC, limit the instructions that can move data to and from
memory in an attempt to speed up execution. Many machines have ariations of load, store, and
move instructions to handle data of different sizes. For example, there may be a LOADB
instruction for dealing with bytes and a LOADW instruction for handling words.
Arithmetic operations include those instructions that use integers and floating point
numbers. Many instruction sets provide different arithmetic instructions for various data sizes.
As with the data movement instructions, there are sometimes different instructions for providing
various combinations of register and memory accesses in different addressing modes.
Boolean logic instructions perform Boolean operations, much in the same way that
arithmetic operations work. There are typically instructions for performing AND, NOT, and
often OR and XOR operations.
Bit manipulation instructions are used for setting and resetting individual bits (or
sometimes groups of bits) within a given data word. These include both arithmetic and logical
shift instructions and rotate instructions, both to the left and to the right. Logical shift
instructions simply shift bits to either the left or the right by a specified amount, shifting in zeros
from the opposite end. Arithmetic shift instructions, commonly used to multiply or divide by 2,
do not shift the leftmost bit, because this represents the sign of the number. On a right arithmetic
shift, the sign bit is replicated into the bit position to its right. On a left arithmetic shift, values
are shifted left, zeros are shifted in, but the sign bit is never moved. Rotate instructions are
simply shift instructions that shift in the bits that are shifted out. For example, on a rotate left 1
bit, the leftmost bit is shifted out and rotated around to become the rightmost bit.
I/O instructions vary greatly from architecture to architecture. The basic schemes for
handling I/O are programmed I/O, interrupt-driven I/O, and DMA devices. These are covered in
more detail in Chapter 5.
Control instructions include branches, skips, and procedure calls. Branching can be
unconditional or conditional. Skip instructions are basically branch instructions with implied
addresses. Because no operand is required, skip instructions often use bits of the address field to
specify different situations (recall the Skipcond instruction used by MARIE). Procedure calls are
special branch instructions that automatically save the return address. Different machines use
different methods to save this address. Some store the address at a specific location in memory,
others store it in a register, while still others push the return address on a stack. We have already
seen that stacks can be used for other purposes.
Special purpose instructions include those used for string processing, high level
language support, protection, flag control, and cache management. Most architectures provide
instructions for string processing, including string manipulation and searching.
Addressing Modes
We have examined the types of operands and operations that may be specified by
machine instructions. Now we have to see how is the address of an operand specified, and how
are the bits of an instruction organized to define the operand addresses and operation of that
instruction.
52
Addressing Modes: The most common addressing techniques are
• Immediate
• Direct
• Indirect
• Register
• Register Indirect
• Displacement
• Stack
All computer architectures provide more than one of these addressing modes. The
question arises as to how the control unit can determine which addressing mode is being used in
a particular instruction. Several approaches are used. Often, different opcodes will use different
addressing modes. Also, one or more bits in the instruction format can be used as a mode field.
The value of the mode field determines which addressing mode is to be used.
What is the interpretation of effective address. In a system without virtual memory, the
effective address will be either a main memory address or a register. In a virtual memory
system, the effective address is a virtual address or a register. The actual mapping to a physical
address is a function of the paging mechanism and is invisible to the programmer.
Immediate Addressing:
The simplest form of addressing is immediate addressing, in which the operand is
actually present in the instruction:
OPERAND = A
This mode can be used to define and use constants or set initial values of variables. The
advantage of immediate addressing is that no memory reference other than the instruction fetch
is required to obtain the operand. The disadvantage is that the size of the number is restricted to
the size of the address field, which, in most instruction sets, is small compared with the world
length.
53
Figure 4.1: Immediate Addressing Mod
The instruction format for Immediate Addressing Mode is shown in the Figure 4.1.
Direct Addressing:
A very simple form of addressing is direct addressing, in which the address field contains
the effective address of the operand:
EA = A
It requires only one memory reference and no special calculation.
Indirect Addressing:
With direct addressing, the length of the address field is usually less than the word
length, thus limiting the address range. One solution is to have the address field refer to the
address of a word in memory, which in turn contains a full-length address of the operand. This is
know as indirect addressing:
EA = (A)
The advantages of register addressing are that only a small address field is needed in the
instruction and no memory reference is required. The disadvantage of register addressing is that
the address space is very limited.
54
Figure 4.4: Register Addressing Mode.
The exact register location of the operand in case of Register Addressing Mode is shown
in the Figure 34.4. Here, 'R' indicates a register where the operand is present.
Diaplacement Addressing:
A very powerful mode of addressing combines the capabilities of direct addressing and
register indirect addressing, which is broadly categorized as displacement addressing:
EA = A + (R)
Displacement addressing requires that the instruction have two address fields, at least one
of which is explicit. The value contained in one address field (value = A) is used directly. The
other address field, or an implicit reference based on opcode, refers to a register whose contents
are added to A to produce the effective address. The general format of Displacement Addressing
is shown in the Figure 4.6.
Three of the most common use of displacement addressing are:
55
• Relative addressing
• Base-register addressing
• Indexing
Relative Addressing:
For relative addressing, the implicitly referenced register is the program counter (PC).
That is, the current instruction address is added to the address field to produce the EA. Thus, the
effective address is a displacement relative to the address of the instruction.
Base-Register Addressing:
The reference register contains a memory address, and the address field contains a
displacement from that address. The register reference may be explicit or implicit. In some
implementation, a single segment/base register is employed and is used implicitly. In others, the
programmer may choose a register to hold the base address of a segment, and the instruction
must reference it explicitly.
Indexing:
The address field references a main memory address, and the reference register contains
a positive displacement from that address. In this case also the register reference is sometimes
explicit and sometimes implicit.
Generally index register are used for iterative tasks, it is typical that there is a need to
increment or decrement the index register after each reference to it. Because this is such a
common operation, some system will automatically do this as part of the same instruction cycle.
This is known as auto-indexing. We may get two types of auto-indexing: -one is auto-
incrementing and the other one is -auto-decrementing.
If certain registers are devoted exclusively to indexing, then auto-indexing can be
invoked implicitly and automatically. If general purpose register are used, the auto index
operation may need to be signaled by a bit in the instruction.
EA = A + (R)
R = (R) + 1
56
EA = A + (R)
R = (R) - 1
In some machines, both indirect addressing and indexing are provided, and it is possible
to employ both in the same instruction. There are two possibilities: The indexing is performed
either before or after the indirection.
If indexing is performed after the indirection, it is termed post indexing
EA = (A) + (R)
First, the contents of the address field are used to access a memory location containing an
address. This address is then indexed by the register value.
EA = ( A + (R)
An address is calculated, the calculated address contains not the operand, but the address
of the operand.
Stack Addressing:
A stack is a linear array or list of locations. It is sometimes referred to as a pushdown list
or last-in- first-out queue. A stack is a reserved block of locations. Items are appended to the top
of the stack so that, at any given time, the block is partially filled. Associated with the stack is a
pointer whose value is the address of the top of the stack. The stack pointer is maintained in a
register. Thus, references to stack locations in memory are in fact register indirect addresses.
The stack mode of addressing is a form of implied addressing. The machine instructions
need not include a memory reference but implicitly operate on the top of the stack.
57
UNIT 4
Syllabus:
Memory Organization: Semiconductor memory technologies, hierarchy, Interleaving, Main
Memory-RAM and ROM chips, Address map, Associative memory-Hardware organization.
Match logic. Cache memory-size vs. block size, Mapping functions-Associate, Direct, Set
Associative mapping. Replacement algorithms, write policies. Auxiliary memory-Magnetic
tapes etc
Memory Hierarchy
58
Memory Access Methods
Each memory type, is a collection of numerous memory locations. To access data from any
memory, first it must be located and then the data is read from the memory location. Following
are the methods to access information from memory locations:
1. Random Access: Main memories are random access memories, in which each memory
location has a unique address. Using this unique address any memory location can be
reached in the same amount of time in any order.
2. Sequential Access: This methods allows memory access in a sequence or in order.
3. Direct Access: In this mode, information is stored in tracks, with each track having a
separate read/write head.
Main Memory
The memory unit that communicates directly within the CPU, Auxillary memory and Cache
memory, is called main memory. It is the central storage unit of the computer system. It is a
large and fast memory used to store data during computer operations. Main memory is made up
of RAM and ROM, with RAM integrated circuit chips holing the major share.
59
ROM: Read Only Memory, is non-volatile and is more like a permanent storage for
information. It also stores the bootstrap loader program, to load and start the operating
system when computer is turned on. PROM(Programmable ROM), EPROM(Erasable
PROM) and EEPROM(Electrically Erasable PROM) are some commonly used ROMs.
The addressing of memory can establish by means of a table that specifies the memory
address assigned to each chip.
The table, called a memory address map, is a pictorial representation of assigned
address space for each chip in the system, shown in the table.
To demonstrate with a particular example, assume that a computer system needs 512
bytes of RAM and 512 bytes of ROM.
The RAM and ROM chips to be used specified in figures.
Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For
example: Magnetic disks and tapes are commonly used auxiliary devices. Other devices used as
60
auxiliary memory are magnetic drums, magnetic bubble memory and optical disks.
It is not directly accessible to the CPU, and is accessed using the Input/Output channels.
Cache Memory
The data or contents of the main memory that are used again and again by CPU, are
stored in the cache memory so that we can easily access that data in shorter time.
Whenever the CPU needs to access memory, it first checks the cache memory. If the data is not
found in cache memory then the CPU moves onto the main memory. It also transfers block of
recent data into the cache and keeps on deleting the old data in cache to accomodate the new
one.
Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit ratio.
When the CPU refers to memory and finds the word in cache it is said to produce a hit. If the
word is not found in cache, it is in main memory then it counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit ratio.
Hit Ratio = Hit/(Hit + Miss)
Associative Memory
It is also known as content addressable memory (CAM). It is a memory chip in which
each bit position can be compared. In this the content is compared in each bit cell which allows
very fast table lookup. Since the entire chip can be compared, contents are randomly stored
without considering addressing scheme. These chips have less storage capacity than regular
memory chips.
Associative Mapping
Direct Mapping
Set Associative Mapping
Associative Mapping
The associative memory stores both address and data. The address value of 15 bits is 5
digit octal numbers and data is of 12 bits word in 4 digit octal number. A CPU address of 15 bits
is placed in argument register and the associative memory is searched for matching address.
61
Direct Mapping
The CPU address of 15 bits is divided into 2 fields. In this the 9 least significant bits
constitute the index field and the remaining 6 bits constitute the tag field. The number of bits in
index field is equal to the number of address bits required to access cache memory.
62
Replacement Algorithms
Data is continuously replaced with new data in the cache memory using replacement
algorithms. Following are the 2 replacement algorithms used:
FIFO - First in First out. Oldest item is replaced with the latest item.
LRU - Least Recently Used. Item which is least recently used by CPU is removed.
The benefit of write-through to main memory is that it simplifies the design of the
computer system. With write-through, the main memory always has an up-to-date copy of the
line. So when a read is done, main memory can always reply with the requested data.
If write-back is used, sometimes the up-to-date data is in a processor cache, and sometimes it is
in main memory. If the data is in a processor cache, then that processor must stop main memory
from replying to the read request, because the main memory might have a stale copy of the data.
This is more complicated than write-through.
Also, write-through can simplify the cache coherency protocol because it doesn't need
the Modifystate. The Modify state records that the cache must write back the cache line before it
invalidates or evicts the line. In write-through a cache line can always be invalidated without
writing back since memory already has an up-to-date copy of the line.
Cache Coherence:
In a shared memory multiprocessor with a separate cache memory for each processor , it is
possible to have many copies of any one instruction operand : one copy in the main memory and
one in each cache memory. When one copy of an operand is changed, the other copies of the
operand must be changed also. Cache coherence is the discipline that ensures that changes in the
values of shared operands are propagated throughout the system in a timely fashion.
Virtual Memory
Virtual memory is the separation of logical memory from physical memory. This separation
provides large virtual memory for programmers when only small physical memory is available.
63
Virtual memory is used to give programmers the illusion that they have a very large memory
even though the computer has a small main memory. It makes the task of programming easier
because the programmer no longer needs to worry about the amount of physical memory
available.
The table implementation of the address mapping is simplified if the information in the
address space. And the memory space is each divided into groups of fixed size.
Moreover, The physical memory is broken down into groups of equal size called blocks,
which may range from 64 to 4096 words each.
The term page refers to groups of address space of the same size.
Also, Consider a computer with an address space of 8K and a memory space of 4K.
If we split each into groups of 1K words we obtain eight pages and four blocks as shown
in the figure.
At any given time, up to four pages of address space may reside in main memory in any
one of the four blocks.
64
Associative memory page table:
The implementation of the page table is vital to the efficiency of the virtual
memory technique, for each memory reference must also include a reference to the page table.
The fastest solution is a set of dedicated registers to hold the page table but this method is
impractical for large page tables because of the expense. But keeping the page table in main
memory could cause intolerable delays because even only one memory access for the page table
involves a slowdown of 100 percent and large page tables can require more than one memory
access. The solution is to augment the page table with special high-speed memory made up of
associative registers or translation look aside buffers (TLBs) which are called ASSOCIATIVE
MEMORY.
Page replacement
The advantage of virtual memory is that processes can be using more memory than
exists in the machine; when memory is accessed that is not present (a page fault), it must be
paged in (sometimes referred to as being "swapped in", although some people reserve "swapped
in to refer to bringing in an entire address space).
Swapping in pages is very expensive (it requires using the disk), so we'd like to avoid page
faults as much as possible. The algorithm that we use to choose which pages to evict to make
space for the new page can have a large impact on the number of page faults that occur.
65
UNIT 5
Syllabus:
Input –Output Organization: Peripheral devices, Input-output subsystems, I/O device interface,
I/O Processor, I/O transfers–Program controlled, Interrupt driven, and DMA, interrupts and
exceptions. I/O device interfaces – SCII, USB Pipelining and Vector Processing: Basic
concepts, Instruction level Parallelism Throughput and Speedup, Pipeline hazards. Case Sudy-
Introduction to x86 architecture.
Introduction:
The I/O subsystem of a computer provides an efficient mode of communication between the
central system and the outside environment. It handles all the input-output operations of the
computer system.
Peripheral Devices
Input or output devices that are connected to computer are called peripheral devices. These
devices are designed to read information into or out of the memory unit upon command from the
CPU and are considered to be the part of computer system. These devices are also
called peripherals.
For example: Keyboards, display units and printers are common peripheral devices.
There are three types of peripherals:
1. Input peripherals : Allows user input, from the outside world to the computer.
Example: Keyboard, Mouse etc.
2. Output peripherals: Allows information output, from the computer to the outside
world. Example: Printer, Monitor etc
3. Input-Output peripherals: Allows both input(from outised world to computer) as well
as, output(from computer to the outside world). Example: Touch screen etc.
Interfaces
Interface is a shared boundary btween two separate components of the computer system which
can be used to attach two or more components to the system for communication purposes.
There are two types of interface:
1. CPU Inteface
2. I/O Interface
66
Input-Output Interface
Peripherals connected to a computer need special communication links for interfacing with
CPU. In computer system, there are special hardware components between the CPU and
peripherals to control or manage the input-output transfers. These components are called input-
output interface units because they provide communication links between processor bus and
peripherals. They provide a method for transferring information between internal system and
input-output devices.
And if the registers in the interface(I/O interface) share a common clock with CPU registers,
then transfer between the two units is said to be synchronous.But in most cases, the internal
timing in each unit is independent from each other in such a way that each uses its own private
clock for its internal registers.In that case, the two units are said to be asynchronous to each
other, and if data transfer occur between them this data transfer is said to be Asynchronous
Data Transfer.
But, the Asynchronous Data Transfer between two independent units requires that control
signals be transmitted between the communicating units so that the time can be indicated at
which they send data.
The strobe pulse and handshaking method of asynchronous data transfer are not restricted to I/O
transfer.In fact, they are used extensively on numerous occasion requiring transfer of data
between two independent units.So, here we consider the transmitting unit as source and
receiving unit as destination.
As an example: The CPU, is the source during an output or write transfer and is the destination
unit during input or read transfer.
And thus, the sequence of control during an asynchronous transfer depends on whether the
transfer is initiated by the source or by the destination.
So, while discussing each way of data transfer asynchronously we see the sequence of control in
both terms when it is initiated by source or when it is initiated by destination.In this way, each
way of data transfer, can be further divided into parts, source initiated and destination initiated.
67
We can also specify, asynchronous transfer between two independent units by means of a timing
diagram that shows the timing relationship that exists between the control and the data buses.
Now, we will discuss each method of asynchronous data transfer in detail one by one.
1. Strobe Control:
The Strobe Control method of asynchronous data transfer employs a single control line to
time each transfer .This control line is also known as strobe and it may be achieved either
by source or destination, depending on which initiate transfer.
The block diagram and timing diagram of strobe initiated by source unit is shown in figure
below:
In block diagram we see that strobe is initiated by source, and as shown in timing
diagram, the source unit first places the data on the data bus.After a brief delay to ensure that the
data settle to a steady value, the source activates a strobe pulse.The information on data bus and
strobe control signal remain in the active state for a sufficient period of time to allow the
destination unit to receive the data.Actually, the destination unit, uses a falling edge of strobe
control to transfer the contents of data bus to one of its internal registers.The source removes
the data from the data bus after it disables its strobe pulse.New valid data will be available only
after the strobe is enabled again.
The block diagram and timing diagram of strobe initiated by destination is shown in figure
below:
68
In block diagram, we see that, the strobe initiated by destination, and as shown in timing
diagram, the destination unit first activates the strobe pulse, informing the source to provide the
data.The source unit responds by placing the requested binary information on the data bus.The
data must be valid and remain in the bus long enough for the destination unit to accept it.The
falling edge of strobe pulse can be used again to trigger a destination register.The destination
unit then disables the strobe.And source removes the data from data bus after a per determine
time interval.
Now, actually in computer, in the first case means in strobe initiated by source - the strobe
may be a memory-write control signal from the CPU to a memory unit.The source, CPU, places
the word on the data bus and informs the memory unit, which is the destination, that this is a
write operation.
And in the second case i.e, in the strobe initiated by destination - the strobe may be a memory
read control from the CPU to a memory unit.The destination, the CPU, initiates the read
operation to inform the memory, which is a source unit, to place selected word into the data bus.
2. Handshaking:
The disadvantage of strobe method is that source unit that initiates the transfer has no way of
knowing whether the destination has actually received the data that was placed in the
bus.Similarly, a destination unit that initiates the transfer has no way of knowing whether the
source unit, has actually placed data on the bus.
Hand shaking method introduce a second control signal line that provides a replay to the unit
that initiates the transfer.
In it, one control line is in the same direction as the data flow in the bus from the source to
destination.It is used by source unit to inform the destination unit whether there are valid data in
the bus.The other control line is in the other direction from destination to the source.It is used
by the destination unit to inform the source whether it can accept data.And in it also, sequence
of control depends on unit that initiate transfer.Means sequence of control depends whether
transfer is initiated by source and destination.Sequence of control in both of them are described
below:
69
Source initiated Handshaking:
The source initiated transfer using handshaking lines is shown in figure below:
In its block diagram, we se that two handshaking lines are "data valid", which is generated by
the source unit, and "data accepted", generated by the destination unit.
The timing diagram shows the timing relationship of exchange of signals between the two
units.Means as shown in its timing diagram, the source initiates a transfer by placing data on the
bus and enabling its data valid signal.The data accepted signal is then activated by destination
unit after it accepts the data from the bus.The source unit then disable its data valid signal which
invalidates the data on the bus.After this, the destination unit disables its data accepted signal
and the system goes into initial state.The source unit does not send the next data item until after
the destination unit shows its readiness to accept new data by disabling the data accepted signal.
This sequence of events described in its sequence diagram, which shows the above sequence
in which the system is present, at any given time.
The destination initiated transfer using handshaking lines is shown in figure below:
70
In its block diagram, we see that the two handshaking lines are "data valid", generated by the
source unit, and "ready for data" generated by destination unit.Note that the name of signal data
accepted generated by destination unit has been changed to ready for data to reflect its new
meaning.
In it, transfer is initiated by destination, so source unit does not place data on data bus until it
receives ready for data signal from destination unit.After that, hand shaking process is some as
that of source initiated.
The sequence of event in it are shown in its sequence diagram and timing relationship
between signals is shown in its timing diagram.
Thus, here we can say that, sequence of events in both cases would be identical.If we
consider ready for data signal as the complement of data accept.Means, the only difference
between source and destination initiated transfer is in their choice of initial state.
71
Programmed I/O
Programmed I/O instructions are the result of I/O instructions written in computer
program. Each data item transfer is initiated by the instruction in the program.
Usually the program controls data transfer to and from CPU and peripheral. Transferring data
under programmed I/O requires constant monitoring of the peripherals by the CPU.
Priority Interrupt
A priority interrupt is a system which decides the priority at which various devices,
which generates the interrupt signal at the same time, will be serviced by the CPU. The system
has authority to decide which conditions are allowed to interrupt the CPU, while some other
interrupt is being serviced. Generally, devices with high speed transfer such as magnetic
disks are given high priority and slow devices such as keyboards are given low priority.
When two or more devices interrupt the computer simultaneously, the computer services the
device with the higher priority first.
72
DIRECT MEMORY ACCESS
Block of data transfer from high speed devices, Drum, Disk, Tape
ABUS Address bus High-impedence
Bus request BR DBUS Data bus (disabled)
CPU when BG is
Bus granted BG RD Read
WR Write enabled
Internal Bus
DMA select DS Address register
Register select RS
Read RD Word count register
Write WR Control
logic
Bus request BR Control register
Bus grant BG
Interrupt Interrupt DMA request
DMA acknowledge to I/O device
DMA TRANSFER
Interrupt
Random-access
BG
CPU memory unit (RAM)
BR
RD WR Addr Data RD WR Addr Data
Read control
Write control
Data bus
Address bus
Address
select
RD WR Addr Data
DS DMA ack.
RS DMA I/O
Controller Peripheral
BR device
BG DMA request
Interrupt
73
Input/output Processor
An input-output processor (IOP) is a processor with direct memory access capability. In this, the
computer system is divided into a memory unit and number of processors.
Each IOP controls and manage the input-output tasks. The IOP is similar to CPU except that it
handles only the details of I/O processing. The IOP can fetch and execute its own instructions.
These IOP instructions are designed to manage I/O transfers only.
Block Diagram Of I/O Processor:
Below is a block diagram of a computer along with various I/O Processors. The memory unit
occupies the central position and can communicate with each processor.
The CPU processes the data required for solving the computational tasks. The IOP provides a
path for transfer of data between peripherals and memory. The CPU assigns the task of initiating
the I/O program.
The IOP operates independent from CPU and transfer data between peripherals and memory.
The communication between the IOP and the devices is similar to the program control method
of transfer. And the communication with the memory is similar to the direct memory access
method.
In large scale computers, each processor is independent of other processors and any processor
can initiate the operation.
The CPU can act as master and the IOP act as slave processor. The CPU assigns the task of
initiating operations but it is the IOP, who executes the instructions, and not the CPU. CPU
instructions provide operations to start an I/O transfer. The IOP asks for CPU through interrupt.
Instructions that are read from memory by an IOP are also called commands to distinguish them
from instructions that are read by CPU. Commands are prepared by programmers and are stored
in memory. Command words make the program for IOP. CPU informs the IOP where to find the
commands in memory.
74
Pipelining and vector processing
Parallel processing
Execution of Concurrent Events in the computing process to achieve faster Computational
Speed
Levels of Parallel Processing
- Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
PARALLEL COMPUTERS
Architectural Classification
Flynn's classification
» Instruction Stream
Sequence of Instructions read from memory
» Data Stream
Operations performed on the data in the processor
What is Pipelining?
Pipelining is the process of accumulating instruction from the processor through a
pipeline. It allows storing and executing instructions in an orderly process. It is also known
as pipeline processing.
Pipelining is a technique where multiple instructions are overlapped during execution. Pipeline
is divided into stages and these stages are connected with one another to form a pipe like
structure. Instructions enter from one end and exit from another end.
Pipelining increases the overall instruction throughput.
In pipeline system, each segment consists of an input register followed by a combinational
circuit. The register is used to hold data and combinational circuit performs operations on it. The
output of combinational circuit is applied to the input register of the next segment.
75
Pipeline system is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, there are robotic arms
to perform a certain task, and then the car moves on ahead to the next arm.
Types of Pipeline
It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for floating
point operations, multiplication of fixed point numbers etc. For example: The input to the
Floating Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents.
The floating point addition and subtraction is done in 4 parts:
Registers are used for storing the intermediate results between the above operations.
76
Instruction Pipeline
In this a stream of instructions can be executed by overlapping fetch, decode and execute phases
of an instruction cycle. This type of technique is used to increase the throughput of the computer
system.
An instruction pipeline reads instruction from the memory while previous instructions are being
executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into
segments of equal duration.
Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
Vector(Array) Processing
There is a class of computational problems that are beyond the capabilities of a
conventional computer. These problems require vast number of computations on multiple data
items, that will take a conventional computer(with scalar processor) days or even weeks to
complete.
Such complex instructions, which operates on multiple data at the same time, requires a better
way of instruction execution, which was achieved by Vector processors.
Scalar CPUs can manipulate one or two data items at a time, which is not very efficient. Also,
simple instructions like ADD A to B, and store into C are not practically efficient.
Addresses are used to point to the memory location where the data to be operated will be found,
which leads to added overhead of data lookup. So until the data is found, the CPU would be
sitting ideal, which is a big performance issue.
Hence, the concept of Instruction Pipeline comes into picture, in which the instruction passes
through several sub-units in turn. These sub-units perform various independent functions, for
example: the first one decodes the instruction, the second sub-unit fetches the data and
the thirdsub-unit performs the math itself. Therefore, while the data is fetched for one
instruction, CPU does not sit idle, it rather works on decoding the next instruction set, ending up
working like an assembly line.
Vector processor, not only use Instruction pipeline, but it also pipelines the data, working on
multiple data at the same time.
A normal scalar processor instruction would be ADD A, B, which leads to addition of two
operands, but what if we can instruct the processor to ADD a group of
numbers(from 0 to n memory location) to another group of numbers(lets say, n to k memory
location). This can be achieved by vector processors.
In vector processor a single instruction, can ask for multiple data operations, which saves time,
as instruction is decoded once, and then it keeps on operating on different data items.
77
Applications of Vector Processors
Computer with vector processing capabilities are in demand in specialized applications. The
following are some areas where vector processing is used:
1. Petroleum exploration.
2. Medical diagnosis.
3. Data analysis.
4. Weather forecasting.
5. Aerodynamics and space flight simulations.
6. Image processing.
7. Artificial intelligence.
78