0% found this document useful (0 votes)
10 views

Coa

The document outlines the basic structure of computers, focusing on functional units, computer organization, and architecture. It explains the roles of various components such as the CPU, memory, ALU, and control unit, as well as the concepts of instruction sequencing and memory operations. Additionally, it distinguishes between computer organization and architecture, emphasizing their importance in the design and implementation of computer systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Coa

The document outlines the basic structure of computers, focusing on functional units, computer organization, and architecture. It explains the roles of various components such as the CPU, memory, ALU, and control unit, as well as the concepts of instruction sequencing and memory operations. Additionally, it distinguishes between computer organization and architecture, emphasizing their importance in the design and implementation of computer systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

KTU - CST202 - Computer Organization and Architecture Module: 1

BASIC STRUCTURE OF COMPUTERS - Functional units – Basic operational


concepts – Bus structures. Memory locations and addresses – memory
operations – instructions and instruction sequencing – addressing modes.
Module: 1 Basic processing unit -Fundamental concepts –instruction cycle -
execution of a complete instruction –single bus and multiple bus
organization.

WHY COMPUTER ORGANIZATION AND ARCHITECTURE?


Computer architecture is a key component of computer engineering and it is concerned
with all aspects of the design and organization of the central processing unit and the integration
of the CPU into the computer system itself.
Architecture extends upward into computer software because a processor’s architecture
must cooperate with the operating system and system software. It is difficult to design an
operating system well without knowledge of the underlying architecture.
Moreover, the computer designer must have an understanding of software in order to
implement the optimum architecture.

INTRODUCTION

Computer: A device that accepts input, processes data, stores data, and produces output, all
according to a series of stored instructions.
Software: A computer program that tells the computer how to perform particular tasks.
Hardware: Includes the electronic and mechanical devices that process the data; refers to the
computer as well as peripheral devices.
Peripheral devices: Used to expand the computer’s input, output and storage capabilities.
Network: Two or more computers and other devices that are connected, for the purpose of
sharing data and programs.
Computer Types:Computers are classified based on the parameters likeSpeed of operation, Cost,
Computational power and Type of application
Difference between computer organization and computer architecture
Architecture describes what the computer does and organization describes how it does it.
Computer organization:
Computer organization is concerned with the way the hardware components operate and
the way they are connected together to form computer system. It includes Hardware details

1
KTU - CST202 - Computer Organization and Architecture Module: 1

transparent to the programmer such as control signal and peripheral. It describes how the
computer performs. Example: circuit design, control signals, memory types this all are under
computer organization.
Computer Architecture:
Computer architecture is concerned with the structure and behavior of computer system
as seen by the user. It includes information, formats, instruction set and techniques for
addressing memory. It describes what the computer does.

FUNCTIONAL UNITS:

The computer system is divided into five separate units for its operation.
 Input Unit.
 ALU.
 Control Unit.
 Memory Unit.
 Output Unit.
Input & Output unit
The method of feeding data and
programs to a computer is accomplished

by an input device. Computer input Figure 1


devices read data from a source, such as
magnetic disks, and translate that data into electronic impulses [ADC] for transfer into the CPU.
Some typical input devices are a keyboard, a mouse, scanner, etc.
Computer output devices converts the electronic impulses [DAC] into human readable
form. Output unit sends processed results to the outside world. Examples: Display screens,
Printers, plotters, microfilms, synthesizers, high-tech blackboards, film recorders, etc.
Memory Unit (MU)
A Memory Unit is a collection of storage cells together with associated circuits needed to
transfer information in and out of storage. Data storage is a common term for archiving data or
information in a storage medium for use by a computer. It’s one of the basic yet fundamental
functions performed by a computer. It’s like a hierarchy of comprehensive storage solution for
fast access to computer resources.
A computer stores data or information using several methods, which leads to different
levels of data storage. Primary storage is the most common form of data storage which typically
refers to the random access memory (RAM). It refers to the main storage of the computer
because it holds data and applications that are currently in use by the computer. Then, there is

2
KTU - CST202 - Computer Organization and Architecture Module: 1

secondary storage which refers to the external storage devices and other external media such as
hard drive and optical media.
Arithmetic Logical Unit (ALU)
After you enter data through the input device it is stored in the primary storage unit.
Arithmetic Logical Unit performs the actual processing of data and instruction. The major
operations performed by the ALU are addition, subtraction, multiplication, division, logic and
comparison.
Data is transferred to ALU from storage unit when required. After processing, the output
is returned back to storage unit for further processing or getting stored.
Control Unit
The next component of computer is the control unit, which acts like the supervisor seeing
whether things are done in proper fashion. Control unit controls and coordinates the entire
operations of the computer system.
The control unit determines the sequence in which computer programs and instructions
are executed. Things like processing of programs stored in the main memory, interpretation of
the instructions and issuing of signals for other units of the computer to execute them.
It also acts as a switch board operator when several users access the computer
simultaneously. Thereby it coordinates the activities of computer’s peripheral equipment as they
perform the input and output. Therefore it is the manager of all operations.
Central Processing Unit (CPU)
The Arithmetic Logical Unit (ALU), Control Unit (CU) and Memory Unit (MU) of a
computer system are jointly known as the central processing unit. We may call CPU as the brain
of any computer system. It is just like a human brain that takes all major decisions, makes all
sorts of calculations and directs different part of the computer by activating and controlling the
operations.

BASIC OPERATIONAL CONCEPTS

To perform a given task, an appropriate program consisting of a list of instructions is


stored in the memory. Individual instructions are brought from the memory into the processor,
which executes the specified operations. [Load – Transfers data to register. Store – Transfers
data to memory.] A typical instruction might be
The operand at LOC is fetched from the memory into the processor. The
Load LOC,R2 operand is stored in register R2.
Adds the contents of registers R1 and R2, then places their sum into
Add R1, R2, R3 register R3.
Store R4, LOC This instruction copies the operand in register R4 to memory location

Add R1, R0 Add contents of R1, R0 and place the sum to R0.

3
KTU - CST202 - Computer Organization and Architecture Module: 1

The figure 2 shows how the memory and the processor


can be connected. In addition to the ALU and the
control circuitry, the processor contains a number of
registers used for several different purposes.
The instruction register (IR) holds the
instruction that is currently being executed. The
program counter (PC) contains the memory address of
the next instruction to be fetched and executed. In
addition to the IR and PC, general-purpose registers R0
through Rn−1, often called processor registers. They
serve a variety of functions, including holding
operands that have been loaded from the memory for
processing.
Operating Steps
 Programs reside in the memory through input devices.
 PC is set to point to the first instruction.
 The contents of PC are transferred to MAR. A read signal is sent to the memory.
 The first instruction is read out and loaded into MDR.
 The contents of MDR are transferred to IR.
 Decode and execute the instruction. Get operands for ALU (Address to MAR – Read
– MDR to ALU).
 Perform operation in ALU and Store the result back to general-purpose register.
 Transfer the result to memory (address to MAR, result to MDR – Write).
 During the execution, PC is incremented to the next instruction.
In addition to transferring data between the memory and the processor, the computer
accepts data from input devices and sends data to output devices.
In order to respond immediately to some instruction, execution of the current program
must be suspended. To cause this, the device raises an interrupt signal, which is a request for
service by the processor. The processor provides the requested service by executing a program
called an interrupt-service routine.

BUS STRUCTURES

The bus shown in Figure 3 is a simple structure that implements the interconnection
network. Only one source/destination pair of units can use this bus to transfer data at any one
time.
The bus consists of three sets of lines used to carry address, data, and control signals. I/O
device interfaces are connected to these lines, as shown in Figure 4 for an input device. Each I/O

4
KTU - CST202 - Computer Organization and Architecture Module: 1

device is assigned a unique set of addresses for


the registers in its interface. When the processor
places a particular address on the address lines, it
is examined by the address decoders of all devices
on the bus. The device that recognizes this
address responds to the commands issued on the
control lines.
The processor uses the control lines to Figure 2

request either a Read or a Write operation, and the


requested data are transferred over the data lines.
When I/O devices and the memory share the same
address space, the arrangement is called memory-
mapped I/O. Any machine instruction that can
access memory can be used to transfer data to or
from an I/O device.
For example, if the input device is a
Figure 3
keyboard and if DATAIN is its data register and
DATAOUT may be the data register of a display
device interface.
Load R2, DATAIN reads the data from DATAIN and stores them into processor
register R2
Store R2, DATAOUT Sends the contents of register R2 to location DATAOUT.
The address decoder, the data and status registers, and the control circuitry required to
coordinate I/O transfers constitute the device’s interface circuit.

MEMORY LOCATIONS AND ADDRESSES

The memory consists of many millions of storage cells, each of which can store a bit of
information having the value 0 or 1. The memory is organized so that a group of n bits can be
stored or retrieved in a single, basic operation. Each group of n bits is referred to as a word of
information, and n is called the word length.
The memory of a computer can be schematically represented as a collection of words, as
shown in Figure 5.Modern computers have word lengths that typically range from 16 to 64 bits.
Aunit of 8 bits iscalled a byte. Machine instructions may require one or more words for
their representation. Accessing the memory to store or retrieve a single item of information,
either a wordor a byte, requires distinct names or addresses for each location.
The memory can have up to 2k addressable locations.The2k addresses constitute the
address space of the computer.

5
KTU - CST202 - Computer Organization and Architecture Module: 1

Figure 4 Figure 5

Byte Addressability
We now have three basic information quantities to deal with: bit, byte, and word. A byte
is always 8 bits, but the word length typically ranges from 16 to 64 bits. It is impractical to
assign distinct addresses to individual bit locations in the memory. The most practical
assignment is to have successive addresses refer to successive byte locations in the memory.
The term byte-addressable memory is used for this assignment. Byte locations have
addresses 0, 1, 2 . . . Thus, if the word length of the machine is 32 bits, successive words are
located at addresses 0, 4, 8… with each word consisting of four bytes.
Big-Endian and Little-Endian Assignments
There are two ways that byte addresses can be assigned across words. The name big-
endian is used when lower byte addresses are used for the more significant bytes (the leftmost
bytes) of the word. The name little-endian is used for the opposite ordering, where the lower
byte addresses are used for the less significant bytes (the rightmost bytes) of the word.
In both cases, byte addresses 0, 4, and 8… are taken as the addresses of successive words
in the memory of a computer with a 32-bit word length. These are the addresses used when
accessing the memory to store or retrieve a word.

Figure 6

6
KTU - CST202 - Computer Organization and Architecture Module: 1

Word Alignment
In the case of a 32-bit word length, natural word boundaries occur at addresses 0, 4, 8…
We say that the word locations have aligned addresses if they begin at a byte address that is a
multiple of the number of bytes in a word. For practical reasons associated with manipulating
binary-coded addresses, the number of bytes in a word is a power of 2. Hence, if the word length
is 16 (2 bytes), aligned words begin at byte addresses 0, 2, 4... and for a word length of 64 (23
bytes), aligned words begin at byte addresses 0, 8, 16…

Accessing Numbers and Characters


A number usually occupies one word, and can be accessed in the memory by specifying
its word address. Similarly, individual characters can be accessed by their byte address. For
programming convenience it is useful to have different ways of specifying addresses in program
instructions.

MEMORY OPERATIONS

Both program instructions and data operands are stored in the memory. Two basic
operations involvingthe memory are needed, namely, Read and Write.The Read operation
transfers a copy of the contents of a specific memory location tothe processor. The memory
contents remain unchanged.
To start a Read operation, theprocessor sends the address of the desired location to the
memory and requests that itscontents be read. The memory reads the data stored at that address
and sends them to theprocessor.
The Write operation transfers an item of information from the processor to a
specificmemory location, overwriting the former contents of that location. To initiate a
Writeoperation, the processor sends the address of the desired location to the memory,
togetherwith the data to be written into that location. The memory then uses the address and
datato perform the write.

INSTRUCTIONS AND INSTRUCTION SEQUENCING

A computer must have instructions capable of performing four types of operations:


• Data transfers between the memory and the processor registers
• Arithmetic and logic operations on data
• Program sequencing and control
• I/O transfers
Begin by discussing instructions for the first two types of operations. To facilitate the
discussion, we first need some notation.

7
KTU - CST202 - Computer Organization and Architecture Module: 1

Register Transfer Notation


To describe the transfer of information, the contents of any location are denoted by
placing square brackets around its name.
R1 ← [LOC]
Thus, this expression means that the contents of memory location LOC are transferred
into processor register R1.

As another example, consider the operation that adds the contents of registers R1 and R2,
and places their sum into register R3. This action is indicated as
R3 ← [R1] + [R2]
This type of notation is known as Register Transfer Notation (RTN). Note that the
righthandside of an RTN expression always denotes a value, and the left-hand side is the name of
a location where the value is to be placed, overwriting the old contents of that location.

Assembly-Language Notation
We need another type of notation to represent machine instructions and programs. For
this, we use assembly language. For example, a generic instruction that causes the transfer
described above, from memory location LOC to processor register R1, is specified by the
statement
Move LOC, R1
The contents of LOC are unchanged by the execution of this instruction, but the old
contents of register R1 are overwritten.
The second example of adding two numbers contained in processor registers R1 and R2
and placing their sum in R3 can be specified by the assembly-language statement
Add R1, R2, R3
In this case, registers R1 and R2 hold the source operands, while R3 is the destination.

Basic Instruction Types


(1)Three Address Instruction:
Operation Source1, Source 2, Destination
Add A, B, C
Operand A and B are source operands, C is the destination operand. Add is the operation
to be performed on the operands.

8
KTU - CST202 - Computer Organization and Architecture Module: 1

(2)Two Address Instruction:


Operation Source, Destination
Add A, B
Performs the operation B ← [A] + [B]. When the sum is calculated, the result is sent to
memory and stored in location B, replacing the original contents of this location. This means
operand B is both source and destination.
The problem of adding the contents of location A and B without destroying either of them. and
to place the sum in location C is solved using the Move instruction. [Move works same as
Copy]. Move Source, Destination.
Move B,C
Add A,C
(3)One Address Instruction:
A processor register called Accumulator is used.
Add A - Add the contents of memory location A to the contents of accumulator register and
place the sum back to the accumulator.
Load A - Load instruction copies the content of memory location A into accumulator
Store A -Store instruction copies the content of accumulator into memory location A

Instruction Execution and Straight-Line Sequencing


Let’s consider task C = A + B, implemented as C←[A] + [B]. Figure 8 shows a possible
program segment for this task as it appears in the memory of a computer. We assume that the
word length is 32 bits and the memory is byte-addressable. The three instructions of the program
are in successive word locations, starting at location i.
Since each instruction is 4 bytes long, the second and third instructions are at addresses
i + 4 and i + 8. Let us consider how this program is executed.
 To begin executing a program, the address of its first instruction (i in our example)
must be placed into the PC.
 Then, the processor control circuits use the information in the PC to fetch and execute
instructions, one at a time, in the order of increasing addresses. This is called
straight-line sequencing.
 During the execution of each instruction, the PC is incremented by 4 to point to the
next instruction. Thus, after the Move instruction at location i + 8 is executed, the PC
contains the value i + 12, which is the address of the first instruction of the next
program segment.

9
KTU - CST202 - Computer Organization and Architecture Module: 1

Executing a given instruction is a two-


phase procedure. In the first phase, called
instruction fetch, the instruction is fetched
from the memory location whose address is in
the PC. This instruction is placed in the
instruction register (IR) in the processor.
At the start of the second phase, called
instruction execute, the instruction in IR is
examined to determine which operation is to be
performed. The specified operation is then
performed by the processor. This involves a
small number of steps such as fetching
operands from the memory or from processor
registers, performing an arithmetic or logic
operation, and storing the result in the
destination location.
At some point during this two-phase procedure, the contents of the PC are advanced to
point to the next instruction. When the execute phase of an instruction is completed, the PC
contains the address of the next instruction, and a new instruction fetch phase can begin.
Branching
Consider the task of adding a list of n numbers. LOOP is a straight line sequence of
instructions executed as many times as needed. Assume that the number of entries in the list, n, is
stored in memory location N. Register R1 is used as a counter to determine the number of times
the loop is executed. Hence, the contents of location N are loaded into register R1 at the
beginning of the program. Then, within the body of the loop, the instruction Decrement R1
reduces the contents of R1 by 1 each time through the loop. Execution of the loop is repeated as
long as the content of R1 is greater than zero.
We now introduce branch instructions. This type of instruction loads a new address into
the program counter. As a result, the processor fetches and executes the instruction at this new
address, called the branch target, instead of the instruction at the location that follows the
branch instruction in sequential address order.
A conditional branch instruction causes a branch only if a specified condition is
satisfied. If the condition is not satisfied, the PC is incremented in the normal way, and the next
instruction in sequential address order is fetched and executed.
In the program in Figure 2.10, the instruction

10
KTU - CST202 - Computer Organization and Architecture Module: 1

Branch>0 LOOP is same as


Branch_if_[R1]>0 LOOP
is a conditional branch instruction that
causes a branch to location LOOP if the contents of
register R1 are greater than zero. This means that
the loop is executed as long as there are entries in
the list that are yet to be added to R0. At the end of
the nth pass through the loop, the Decrement
instruction produces a zero and hence branching
does not occur.
Move instruction is fetched and executed. It
moves the final result from R0 into memory
location SUM.
Condition Codes
The processor keeps track of instruction about the
results of various operations for use by subsequent
conditional branch instructions. This is
accomplished by recording the required information
in individual bits often called as conditional code flags. These flags are usually grouped
together in a special processor register called the condition code register or status register.
Individual condition codes are set to 1 or cleared to 0, depending upon the outcome of the
operation performed.

11
KTU - CST202 - Computer Organization and Architecture Module: 1

ADDRESSING MODES

The different ways for specifying the locations of instruction operands are known as
addressing modes.

1. Implementation of Variables and Constants


Variables are found in almost every computer program. In assembly language, a variable
is represented by allocating a register or a memory location to hold its value. This value can be
changed as needed using appropriate instructions.
Register mode: The operand is the contents of a processor register; the name (address) of
the register is given in the instruction.
Example: The instruction Add R1, R2, R3 uses the Register mode for all three operands.
Registers R1 and R2 hold the two source operands, while R3 is the destination.
Absolute/Direct mode: The operand is in a memory location; the address of this location
is given explicitly in the instruction.
Example: The Absolute mode is used in the instruction Move LOC, R1 which copies the
value in the memory location LOC into register R1.
Immediate mode: The operand is given explicitly in the instruction.
Example: The instruction Add #200, R1, R2 adds the value 200 to the contents of register
R1, and places the result into register R2. A common convention is to use the number sign (#) in
front of the value to indicate that this value is to be used as an immediate operand.

2. Indirection and Pointers

In the addressing modes that follow, the instruction does not give the operand or its
address explicitly. Instead, it provides information from which the memory address of the
operand can be determined. This address is known as Effective Address (EA) of the
operand.
Indirect mode: The effective address of the operand is the contents of a register or
memory location whose address appears in the instruction. We denote indirection by placing the
name of the register given in the instruction in parentheses ().
To execute the Add instruction in Figure 2.11(a), the processor uses the value B, which is
in register R1, as the effective address of the operand. It requests a read operation from the
memory to read the contents of location B. The value read is the desired operand, which the
processor adds to the contents of register R0. Indirect addressing through a memory location is
also possible as shown in Figure 2.11(b). In this case the processor first reads the contents of
memory location A, then request a second read operation using the value B as an address to
obtain the operand.

12
KTU - CST202 - Computer Organization and Architecture Module: 1

The register or memory location that contains the address of an operand is called a
pointer.

3. Indexing and Arrays


The next addressing mode we discuss provides a different kind of flexibility for accessing
operands. It is useful in dealing with lists and arrays.
Index mode: The effective address of the operand is generated by adding a constant
value to the contents of a register. The register used in this mode is referred as the index
register.
We indicate the Index mode symbolically as X(Ri) where X denotes a constant signed
integer value contained in the instruction and Ri is the name of the register involved. The
effective address of the operand is given by EA = X + [Ri]. The contents of index register are
not changed in the process of generating the effective address.

13
KTU - CST202 - Computer Organization and Architecture Module: 1

Figure 2.13 illustrates two ways of using the index mode. In Figure 2.13(a), the index
register R1 contains the address of memory location and the value X defines an offset
(displacement) from this address to the location where the operand is found. Figure 2.13(b), Here
the constant X corresponds to a memory address and the contents of the index register define the
offset of the operand. in either case, the effective address is the sum of two values; one is given
explicitly in the instruction and the other is stored in a register.
Base with index: A second register may be used to contain the offset X, in which case the
index mode is written as, (Ri,Rj). The effective address is the sum of the contents of register Ri
and Rj.The second register is called as base register. This form of addressing provides more
flexibility in accessing operands, because both components of the effective address can be
changed.
Base with index and offset: Uses two registers plus a constant denoted as X(Ri,Rj). The
effective address is the sum of the constant X and the contents of register Ri and Rj. This added
flexibility is useful in accessing multiple components inside each item in a record, where the
beginning of an item is specified by (Ri,Rj) part of the addeessing mode.

4. Relative Addressing
In index addressing, if the program counter PC, is used instead of a general-purpose
register then X(PC) can be used to address a memory location that is X bytes away from the
location presently pointed to by the program counter. Since the addressed location is identified
"relative" to the program counter, which always identifies the current execution point in a
program, the name Relative mode is associated with this type of addressing.
Relative mode: The effective address is determined by the Index mode using the
program counter in place of the general-purpose register Ri.
This mode can be used to access data operands. But, its most common use is to specify
the target address in branch instructions. An instruction such as Branch>0 LOOP causes
program execution to go to the branch target location identified by the name LOOP if the branch
condition is satisfied.

5. Additional Modes
Many computers provide additional modes intended to aid certain programming tasks.
The two modes described next are useful for accessing data items in successive locations in the
memory.
Auto-increment mode:The effective address of the operand is the contents of a register
specified in the instruction. After accessing the operand, the contents of this register are
automatically incremented to point to the next item in a list.
We denote the Auto-increment mode by putting the specified register in parentheses, to
show that the contents of the register are used as the effective address, followed by a plus sign to

14
KTU - CST202 - Computer Organization and Architecture Module: 1

indicate that these contents are to be incremented after the operand is accessed. Thus, the Auto-
increment mode is written as (Ri)+
Auto-decrement mode:The contents of a register specified in the instruction are first
automatically decremented and are then used as the effective address of the operand.
The Auto-increment mode is written as -(Ri)

BASIC PROCESSING UNIT - SOME FUNDAMENTAL CONCEPTS

To execute a program, the processor fetches one instruction at a time and performs the
operations specified. Instructions are fetched from successive memory locations until a branch
or a jump instruction is encountered.
The processor keeps track of the address of the memory location containing the next
instruction to be fetched using the program counter, PC. After fetching an instruction, the
contents of the PC are updated to point to the next instruction in the sequence. A branch
instruction may load a different value into the PC. Another key register in the processor is the
instruction register, IR.
Suppose that each instruction comprises 4 bytes, and that it is stored in one memory
word. To execute an instruction, the processor has to perform the following three *steps:
1. Fetch the contents of the memory location pointed to by the PC. The contents of this
location are the instruction to be executed; hence they are loaded into the IR. In
register transfer notation, the required action is
IR←[[PC]]

15
KTU - CST202 - Computer Organization and Architecture Module: 1

2. Increment the PC to point to the next instruction. Assuming that the memory is byte
addressable, the PC is incremented by 4; that is
PC←[PC] + 4
3. Carry out the operation specified by the instruction in the IR.
Fetching an instruction and loading it into the IR is usually referred to as the instruction
fetch phase. Performing the operation specified in the instruction constitutes the instruction
execution phase.
Single Bus organization of Processor
Figure shows the organization in which the arithmetic and logic unit (ALU) and all the
registers are interconnected via a single common bus. This bus is internal to the processor and
should not be confused with the external bus that connects the processor to the memory and I/O
devices.

Figure 7: Single Bus Organization

The data and address lines of the external memory bus are connected to the internal
processor bus via the memory data register, MDR, and the memory address register, MAR,

16
KTU - CST202 - Computer Organization and Architecture Module: 1

respectively. Register MDR has two inputs and two outputs. Data may be loaded into MDR
either from the memory bus or from the internal processor bus.
The data stored in MDR may be placed on either bus. The input of MAR is connected to
the internal bus, and its output is connected to the external bus. The control lines of the memory
bus are connected to the instruction decoder and control logic block.
Three registers Y, Z, and TEMP registers are used by the processor for temporary
storage during execution of some instructions. The multiplexer MUX selects either the output of
register Y or a constant value 4 to be provided as input A of the ALU. The constant 4 is used to
increment the contents of the program counter.
With few exceptions, an instruction can be executed by performing one or more of the
following operations in some specified sequence:
 Transfer a word of data from one processor register to another or to the ALU
 Perform an arithmetic or a logic operation and store the result in a processor register
 Fetch the contents of a given memory location and load them into a processor
register
 Store a word of data from a processor register into a given memory location

Register Transfers
Instruction execution involves a sequence of steps in which data are transferred from one
register to another. For each register, two control signals are used to place the contents of that
register on the bus or to load the data on the bus into
the register.
The input and output of register Ri are
connected to the bus via switches controlled by the
signals Riin and Riout, respectively. When Riin is set to
1, the data on the bus are loaded into Ri. Similarly,
when Riout, is set to 1, the contents of register Ri are
placed on the bus. While Riout is equal to 0, the bus
can be used for transferring data from other registers.
Suppose that we wish to transfer the contents
of register R1 to register R4. This can be
accomplished as follows: Figure 8

 Enable the output of register R1 by setting R1out to 1. This places the contents of R1
on the processor bus.
 Enable the input of register R4 by setting R4in, to 1. This loads data from the
processor bus into register R4.

17
KTU - CST202 - Computer Organization and Architecture Module: 1

7All operations and data transfers within the processor take place within time periods
defined by the processor clock.

Performing an Arithmetic or Logic Operation


The ALU is a combinational circuit that has no internal storage. It performs arithmetic
and logic operations on the two operands applied to its A and B inputs, one of the operands is
the output of the multiplexer MUX and the other operand is obtained directly from the bus. The
result produced by the ALU is stored temporarily in register Z.
Therefore, a sequence of operations to add the contents of register R1 to those of register
R2 and store the result in register R3 is
1. R1out, Yin
2. R2out, Select Y, Add, Zin
3. Zout R3in
Step 1:The output of register R1 and the input
of register Y are enabled, causing the contents of R1 to
be transferred over the bus to Y.
Step 2:The multiplexer's Select signal is set to
SelectY, causing the multiplexer to gate the contents of
register Y to input A of the ALU. At the same time, the
contents of register R2 are gated onto the bus and,
hence, to input B. The function performed by the ALU
depends on the signals applied to its control lines. In
this case, the Add line is set to 1, causing the output of
the ALU to be the sum of the two numbers at inputs A
and B. This sum is loaded into register Z because its
input control signal is activated.
Step 3:The contents of register Z are transferred
to the destination register, R3. This last transfer cannot
be carried out during step 2, because only one register
output can be connected to the bus during any clock
cycle.
Figure 9
Fetching a Word from Memory
To fetch a word of information from memory, the processor has to specify the address of
the memory location where this information is stored and request a Read operation. The
connections for register MDR are illustrated in Figure 4.
It has four control signals: MDRin and MDRout, control the connection to the internal
bus, and MDRinE and MDRoutE control the connection to the external bus.

18
KTU - CST202 - Computer Organization and Architecture Module: 1

Figure 10

As an example of a read operation, consider the instruction Move (R1),R2. The actions
needed to execute this instruction are:
1. MAR  [RI]
2. Start a Read operation on the memory bus
3. Wait for the MFC(Memory Function Completed) response from the memory
4. Load MDR from the memory bus
5. R2  [MDR]
These actions may be carried out as separate steps, but some can be combined into a
single step. Each action can be completed in one clock cycle, except action 3 which requires one
or more clock cycles, depending on the speed of the addressed device.
The memory read operation requires three steps, which can be described by the signals
being activated as follows:
1. Rlout, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
where WMFC is the control signal that causes the processor's control circuitry to wait for
the arrival of the MFC signal.

Storing a word in Memory


Writing a word into a memory location follows a similar procedure. The desired address
is loaded into MAR. Then, the data to be written are loaded into MDR, and a Write command is
issued. Hence, executing the instruction Move R2,(R1) requires the following sequence: 1.
1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC
As in the case of the read operation, the Write control signal causes the memory bus
interface hardware to issue a Write command on the memory bus. The processor remains in step
3 until the memory operation is completed and an MFC response is received.

19
KTU - CST202 - Computer Organization and Architecture Module: 1

EXECUTION OF A COMPLETE INSTRUCTION

Consider the instruction Add (R3),R1 which adds the contents of a memory location
pointed to by R3 to register R 1.
Executing this instruction requires the following actions:
1. Fetch the instruction.
2. Fetch the first operand (the contents of the memory location pointed to by R3).
3. Perform the addition.
4. Load the result into RI.
Instruction execution proceeds as follows.
Step 1: The instruction fetch operation is initiated
by loading the contents of the PC into the MAR and
sending a Read request to the memory. The Select signal
is set to Select4, which causes the multiplexer MUX to
select the constant 4. This value is added to the operand at
input B, which is the contents of the PC, and the result is
stored in register Z.
Step 2: The updated value is moved from register
Z back into the PC, while waiting for the memory to
respond. Figure 11

Step 3:The word fetched from the memory is loaded into the IR.
(Steps 1 through 3 constitute the instruction fetch phase, which is the same for all
instructions.)
Step 4: The instruction decoding circuit interprets the contents of the IR. This enables the
control circuitry to activate the control signals for steps 4 through 7, which constitute the
execution phase. The contents of register R3 are transferred to the MAR in step 4, and a memory
read operation is initiated.
Step 5: the contents of R1 are transferred to register Y, to prepare for the addition
operation.
Step 6: When the Read operation is completed, the memory operand is available in
register MDR, and the addition operation is performed. The contents of MDR are gated to the
bus, and thus also to the B input of the ALU, and register Y is selected as the second input to the
ALU by choosing SelectY.
Step 7: The sum is stored in register Z, and then transferred to R1. The End signal causes
a new instruction fetch cycle to begin by returning to step 1.
This discussion accounts for all control signals in Figure 11 except Yin in step 2. There is
no need to copy the updated contents of PC into register Y when executing the Add instruction.

20
KTU - CST202 - Computer Organization and Architecture Module: 1

But, in Branch instructions the updated value of the PC is needed to compute the Branch target
address.
To speed up the execution of Branch instructions, this value is copied into register Y in
step 2. Since step 2 is part of the fetch phase, the same action will be performed for all
instructions. This does not cause any harm because register Y is not used for any other purpose at
that time.
Branch Instruction
A branch instruction replaces the contents of the PC with the branch target address. This
address is usually obtained by adding an offset X, which is given in the branch instruction, to the
updated value of the PC. Figure 12 gives a control sequence that implements an unconditional
branch instruction. Processing starts, as usual, with the fetch phase. This phase ends when the
instruction is loaded into the IR in step 3.
The offset value is extracted from the IR by the
instruction decoding circuit, which will also perform sign
extension if required. Since the value of the updated PC is
already available in register Y, the offset X is gated onto
the bus in step 4, and an addition operation is performed.
The result, which is the branch target address, is loaded
into the PC in step 5.
The offset X used in a branch instruction is usually Figure 12
the difference between the branch target address and the
address immediately following the branch instruction.
For example: if the branch instruction is at location 2000 and if the branch target address
is 2050, the value of X must be 46. The PC is incremented during the fetch phase before
knowing the type of the instruction being executed. Thus, when the branch address is computed
in step 4, the PC value uses the updated value, which points to the instruction following the
branch instruction in the memory.

MULTIPLE BUS ORGANIZATION

We used the simple single-bus structure to illustrate the basic ideas. The resulting control
sequences are quite long because only one data item can be transferred over the bus in a clock
cycle.
To reduce the number of steps needed, most commercial processors provide multiple
internal paths that enable several transfers to take place in parallel. Figure depicts a three-bus
structure used to connect the registers and the ALU of a processor.
The register file is said to have three ports. There are two outputs, allowing the contents
of two different registers to be accessed simultaneously and have their contents placed on buses
A and B. The third port allows the data on bus C to be loaded into a third register during the
same clock cycle.

21
KTU - CST202 - Computer Organization and Architecture Module: 1

Buses A and B are used to transfer the source operands to the A and B inputs of the ALU,
where an arithmetic or logic operation may be performed. The result is transferred to the
destination over bus C. If needed, the ALU may simply pass one of its two input operands
unmodified to bus C. We will call the ALU control signals for such an operation R=A or R=B.
The three-bus arrangement obviates the need for registers Y and Z.
A second feature is the introduction of the
Incrementer unit, which is used to increment the PC by 4.
Using the Incrementer eliminates the need to add 4 to the PC
using the main ALU. The source for the constant 4 at the
ALU input multiplexer is still useful. It can be used to
increment other addresses, such as the memory addresses in
LoadMultiple and StoreMultiple instructions.
Consider the three-operand instruction
Add R4,R5,R6
The control sequence for executing this instruction is
given as below

Step 1: the contents of the PC are passed through the


ALU, using the R=B control signal, and loaded into the MAR
to start a memory read operation. At the same time the PC is
incremented by 4. Note that the value loaded into MAR is the
original contents of the PC. The incremented value is loaded
into the PC at the end of the clock cycle and will not affect the contents of MAR.
Step 2: the processor waits for MFC and loads the data received into MDR.
Step 3: Transfers the data received in MDR to IR.
Step 4: The execution phase of the instruction requires only one control step to complete.
By providing more paths for data transfer a significant reduction in the number of clock
cycles needed to execute an instruction is achieved.

22
KTU - CST202 - Computer Organization and Architecture Module: 2

REGISTER TRANSFER LOGIC: Inter Register Transfer – Arithmetic, Logic


and Shift Micro Operations.
Module: 2 PROCESSOR LOGIC DESIGN: Processor Organisation - Arithmetic Logic
Unit- Design of Arithmetic Unit, Design of Logic circuit, Design of
Arithmetic Logic Unit – Status Register- Design of Shifter –Processor Unit
–Design of Accumulator.

REGISTER TRANSFER LOGIC:

Digital system is a collection of digital hardware modules. A digital system is a


sequential logic system constructed with flip flops and gates. The sequential circuit can be
specified by means of a state table. Specifying a large digital system with a state table would be
very difficult, since the number of states would be very large.
To overcome this difficulty, digital systems are designed using a modular approach,
where each modular subsystem performs some functional task. The modules are constructed
from such digital functions as registers, counters, decoders, multiplexers, arithmetic elements
and control logic. Various modules are interconnected with data and control path. The
interconnection of digital functions cannot be described by means of combinational or sequential
logic techniques.
The information flow and the processing task among the data stored in the registers can
be described by means of register transfer logic. The registers are selected as primitive
components of the system. Register transfer logic uses a set of expressions and statements which
compare the statements used in programming language. It provides the necessary tool for
specifying the interconnection between various digital functions.
Components of Register Transfer Logic
1. The set of registers in the system and their functions:A register also encompasses all
type of registers including shift registers, counters and memory units.
2. The binary-coded information stored in the registers: The binary information stored in
registers may be binary numbers, binary coded decimal numbers, alphanumeric characters,
control information or any other binary coded information.
3. The operations performed on the information stored in the registers: The operations
performed on data stored in registers are called micro operations. Examples are shift, count,
add, clear and load
4. The control functions that initiate the sequence of operations: The control functions
that initiate the sequence of operations consists of timing signals that sequence the operations
one at a time.
KTU - CST202 - Computer Organization and Architecture Module: 2

Register transfer language (Computer hardware description language)


Symbolic notation used for registers, for specifying operations on the contents of
registers and specifying control functions . A statement in a register transfer language consists of
control function and a list of microoperations
Micro-Operation: Operations performed in data stored in registers. Elementary
operation that can be performed parallel during one clock pulse period. The result of operation
may replace the previous binary information of a register or may be transfered to another
register. Example: Shift, count, clear, add & load
A micro-operation requires one clock pulse for the execution if the operation done in
parallel. In serial computer a microoperation requires a number of clock pulses equal to the word
time in the system.
Types of Binary Informations
Micro operations performed is based on the type of data kept in registers. Type of binary
information in the register can be classified into three categories:
 Numerical data such as binary numbers or binary-coded decimal numbers.
 Non-numerical data such as alphanumeric characters or other binary-coded symbols.
 Instruction codes, addresses and other control information used to specify the data
processing requirements in the system

Types of Micro-Operations in digital system


 Interregister transfer micro-operation: Do not change the information content when the
binary information moves from one register to another
 Arithmetic operation: Perform arithmetic on numbers stored in registers.
 Logic microoperation: Perform operations such as AND and OR on individual pairs of
bits stored in registers.
 Shift microoperation: Specify operations for shift registers.

INTER REGISTER TRANSFER

Computer registers are designated by capital letters (sometimes followed by numerals) to


denote the function of the register. [Example: R1 - Processor Register, MAR - Memory Address
Register (holds an address for a memory unit), PC - Program Counter, IR - Instruction Register,
SR: Status Register].The cells or flipflops of n-bit register are numbered in sequence from1 to n
(from 0 to n-1) starting either from left or from right
The register can be represented in 4 ways:
 Rectangular box with name of the register inside,
 The individual cells is assigned a letter with a subscript number,

2
KTU - CST202 - Computer Organization and Architecture Module: 2

 The numbering of cells from right to left can be marked on top of the box as the 12
bit register Memory Buffer Register (MBR).
 16 bit register is partitioned into 2 parts , bits 1 to 8 are assigned the letter L(for low)
and bits 9 to 16 are assigned the letter H(for high)

A a) Register A

A8 A7 A6 A5 A4 A3 A2 A1 b) Showing individual Cells

12 1
MBR c) Numbering of Cells
16 9 8 1
PC (H) PC (L) d) Portions of Register

Registers can be specified in a register transfer language with a declaration statement.


For example: Registers in the above figure can be defined with declaration statement such as
DECLARE REGISTER A(8), MBR(12), PC(16)
DECLARE SUBREGISTER PC(L) = PC(1-8), PC(H) = PC(9-16).
Information transfer from one register to another is described by a replacement
operator: A ← B. This statement denotes a transfer of the content of register B into register A
and this transfer happens in one clock cycle. After the operation, the content of the B (source)
does not change. The content of the A (destination) will be lost and replaced by the new data
transferred from B.
Conditional transfer occurs only under a control condition: The condition that
determines when the transfer is to occurs called a control function. A control function is a
Boolean function that can be equal to 1 or 0. The control function is included with the statement
as follows
x’T1: A ← B
The control function is terminated with a colon. It symbolizes the requirement that the
transfer operation be executed by the hardware only when the Boolean function x’T1 = 1. ie;
when variable x = 0 and timing variable T1 = 1.
Hardware implementation of a controlled transfer: x’T1: A ← B is as follows

The outputs of register B are connected


to the input of register A and the number
of lines in this connection is equal to the
number of bits in the registers. Register
A must have a load control input so that

3
KTU - CST202 - Computer Organization and Architecture Module: 2

it can be enabled when the control function is 1. It is assumed that register A has an additional
input that accepts continuous synchronized clock pulses. The control function is generated by
means of an inverter and an AND gate. It is also assumed that the control unit that generates the
timing variable T1 i synchronized with the same clock pulses that are applied to register A. The
control function stays on during one clock pulse period (when the timing variable is equal to 1)
and the transfer occurs during the next transaction of a clock pulse.
Destination register receives information from two sources but not at the same time. Consider,
T1 : C ← A
T5 : C ← B
The first line states that the contents of
register A are to be transferred to register C
when timimg variable T1 occurs. The second
statement uses the same destination register C
as the first but with a different source register
and a different timing variable. The
connections of two source registers to the
same destinationr register cannot be done direcly but requires a multiplexer circuit to select
between the two possibe paths. The block diagram of the circuit that implements the two
statement is shown in the figure. For registers with four bits each, we need a quadruple 2 to 1
line multiplexer inorder to select either A or B. When T5 =1, register B is selected but when
T1=1, register A is selected (beacsue T5 must be 0 when T1 is 1). The multiplexer and the load
input of register C are enabled everytime T1 and T5 occurs. This causes a transfer of information
from the selected ouce register to destination regeister.

The basic symbols of Register Transfer Logic are

Symbol Description Examples


Letter (and Numerals) Denotes a Register A, MDR, R2
Subscript Denotes a bit of a Register A2, B6
Parenthesis ( ) Denotes a portion of Register PC(H), MBR (OP)
Arrow ← Denotes transfer of information A ←B
Colon : Terminates a control function X’T0:
Comma Seperates two micro-operations A ← B, B ← A
Square Brackets [ ] Specifies an address for memory transfer MBR ← M [ MAR ]

4
KTU - CST202 - Computer Organization and Architecture Module: 2

Bus transfer
A typical digital computer has many registers, and paths must be provided to transfer
information from one register to another. The number of wires will be excessive if separate lines
are used between each register and all other registers in the system. A more efficient scheme for
transferring information between registers in a multiple-register configuration is a common bus
system.
A bus structure consists of a set of common lines, one for each bit of a register, through
which binary information is transferred one at a time. Control signals determine which register
is selected by the bus during each particular register transfer.
One way of constructing a common bus system is with multiplexers. The multiplexers
select the source register whose binary information is then placed on the bus.

The diagram shows that the bits in the same significant position in each register are
connected to the data inputs of one multiplexer to form one line of the bus. Thus, MUX 0
multiplexes the four 0 bits of the registers, MUX 1 multiplexes the four 1 bits of the registers,
and similarly for the other two bits.
Memory Transfer
The transfer of information from a memory word to the outside environment is called a
read operation. The transfer of new information to be stored into the memory is called write
operation. A memory word will be symbolized by the letter M.
The read operation is a
transfer from the selected
memory register M into MBR
(memory buffer register).
Read: MBR←M

5
KTU - CST202 - Computer Organization and Architecture Module: 2

Write operation is the transfer from MBR to the selected memory register M.
Write: M ← MBR
It is necessary to specify the address of M when writing memory transfer operations.
This will be done by enclosing the addressing square brackets following the letter M.
Consider a memory unit that
receives the address from a register, called
the address register, symbolized by AR. The
data are transferred to another register,
called the data register, symbolized by DR.
The memory read operation can be stated as
follows:
Read: DR ← M [AR]
This causes a transfer of information
into DR from the memory word M selected
by the address in AR. The memory write
operation transfers the content of a register
R1 to a memory word M selected by the
address in address AR. The notation is:
Write: M [AR] ← R1
The block diagram shows the
memory unit that communicates with
multiple registers. The address to the memory unit comes from an address bus. Four registers are
connected to the bus and any one may supply an address. The output of the memory can go to
any one of four registers which are selected by a decoder. The data input to the memory come
from the data bus which selects one of four registers. A memory word is specified in such a
system by the symbol M followed by the register enclosed in a square bracket. The contents of
the register within the square bracket specifies the address for M.

ARITHMETIC, LOGIC AND SHIFT MICRO OPERATION

Arithmetic Micro-Operation
The basic arithmetic micro-operations are:
 Addition,
 Subtraction,
 Increment,
 Decrement
 Arithmetic shift.

6
KTU - CST202 - Computer Organization and Architecture Module: 2

The increment and decrement micro-operations are implemented with a combinational


circuit or with a binary up-down counter as these micro-operations use plus-one and minus-one
operation respectively.
The arithmetic add microoperations are defined by the statement
F ← A + B.
It states that the contents of register A are to be added to the contents of register B and
the sum is transfered to register F To implement this statement require 3 registers A, B and F
and a digital functon that performs the addition operation such as parallel addder.

There must be a direct relationship between the


statements written in a register transfer language and the
registers and digital functions which are required for the
implementation
Consider the statements
T2 : A ← A + B
T5 : A ← A + 1
Timing variable T2 initiates an operation to add the
contents of register B to the present contents of A with a
parallel adder. Timing variable T5 increments register A
with a counter. The transfer of the sum from parallel adder into register A can be activated with
a load input in the register. Register be a counter with parallel load capability.
The parallel adder receives input information from registers A and B. The sum bits from
the parallel adder are applied to the inputs of A and timing variable T2 loads the sum into
register A. Timing variable T5 increments there by enabling increment input register.
Two basic arithmetic operations (multiplication and divide) are not included in the basic
set of micro-operations. They are implemented by means of a combinational circuit. In general,
the multiplication micro-operation is implemented with a sequence of add and shift micro-
operations. Division is implemented with a sequence of subtract and shift micro-operations.

7
KTU - CST202 - Computer Organization and Architecture Module: 2

Logic Micro-Operations
Logic micro-operations specify binary operations for strings of bits stored in registers.
These operations consider each bit of the register separately and treat them as binary variables.
For example, the exclusive-OR micro-operation with the contents of two registers A and B is
symbolized by the statement
F←A⊕B
It specifies a logic micro-operation that consider each pair of bits in the registers as a
binary variable. Let the content of register A is 1010 and the content of register B is 1100. The
exclusive-OR micro-operation stated above symbolizes the following logic computation:
1010 Content of A
1100 Content of B
0110 Content of F ← A ⊕ B
The content of F, after the execution of the micro-operation, is equal to the bit-by-bit
exclusive-OR operation on pairs of bits in B and values of A. The logic micro-operations are
seldom used in scientific computations, but they are very useful for bit manipulation of binary
data and for making logical decisions.
Logic and Shift Micro instructions are

The + symbol has different meaning. When + occurs in a microoperation , it denotes


arithmetic plus.. When it occurs in a control or Boolean function, it denotes a logical OR
operation.
Example: T1 + T2 : A ← A + B, C ← D ∨ F
The + between T1 and T2 is an OR operation between 2 timing variables of a control
function and the + between A & B specifies an add microoperation
Shift Micro-Operations
Shift micro-operations shift the contents of a register either left or right. These micro-
operations are generally used for serial transfer of data. They are also used along with
arithmetic, logic, and other data-processing operations.

8
KTU - CST202 - Computer Organization and Architecture Module: 2

No conventional symbol for shift operation. Here adopt symbols shl or shr [shl - shift left
shr - shift right]
Example: A ← shl A 1-bit shift to the left of register A
B ← shr B 1-bit shift to the right of register B
When the bits are shifted, the first flip-flop receives its binary information from the serial
input. During a shift-left operation the serial input transfers a bit into the rightmost position.
During a shift-right operation the serial input transfers a bit into the leftmost position. The
information transferred through the serial input determines the type of shift.
There are three types of shifts: logical, circular, and arithmetic.
Example:
A ← shl, A1 ← An
Circular shift that transfers the leftmost bit from An into the rightmost flip flop A1.
A ← shr, An ← E
Shift right operation with leftmost flip flop An receiving the value of the 1-bit register E.

PROCESSOR ORGANIZATION

The processor part of a computer CPU is sometimes referred to as the data path of the
CPU because the processor forms the paths for the data transfers between the registers in the
unit. The various paths are said to be controlled by means of gates that open the required path
and close all others. A processor unit can be designed to fulfill the requirements of a set of data
paths for a specific application.
In a processor unit, the data paths are formed by means of buses and other common lines.
The control gates that formulate the given path are essentially multiplexers and decoders whose
selection lines specify the required path. The processing of information is done by one common
digital function whose data path can be specified with a set of common selection variables.
Bus Organization
A bus organization for four processor registers is shown in Figure. Each register is connected to
two multiplexers (MUX) to form input buses A and B. The selection lines of each multiplexer
select one register for the particular bus. The A and B buses are applied to a common arithmetic
logic unit.The function selected in the ALU determines the particular operation that is to be
performed.
The shift micro-operations are implemented in the shifter .The result of the micro-operation goes
through the output bus S into the inputs of all registers. The destination register that receives the
information from the output bus is selected by a decoder.

9
KTU - CST202 - Computer Organization and Architecture Module: 2

When enabled, this decoder activates one of the register load inputs to provide a transfer
path between the data on the S bus and the inputs of the selected destination register. The output
bus S provides the terminals for transferring data to an external destination. One input of
multiplexer A or B can receive data from the outside
The control unit that supervises the processor bus system directs the information flow
through the ALU by selecting the various components in the unit.

For example, to perform the microoperation:


R1←R2+ R3
The control must provide binary selection variables to the following selector inputs:
1. MUX A selector: to place the contents of R2 onto bus A.
2. MUX B selector: to place the contents of R3 onto bus B.
3. ALU function selector: to provide the arithmetic operation A + B.
4. Shift selector: for direct transfer from the output of the ALU onto output bus S (no
shift).
5. Decoder destination selector: to transfer the contents of bus S into R 1.

10
KTU - CST202 - Computer Organization and Architecture Module: 2

Scratchpad memory
The register in a processor unit can be enclosed in a small memory unit. When included
in a processor unit,a small memory is sometime called a scratchpad memory.The use of a small
memory is a cheaper alternative to collecting processor registers through a bus system.The
difference between the two system is the manner in which information is selected for transfer
into the ALU. In a bus system, the
information transfer is selected by the
multiplexer that form the buses.
Processor unit that uses scratchpad
memory is shown in figure. Resource
register is selected from memory and
loaded into register A. A Second source
register is selected from memory and
loaded into register B. The information in
A and B is manipulated in the ALU and
shifter. Result of the operation is
transferred to a memory register
specifying its word address and activating
the memory-write input control.
Assume that the memory has eight
words, so that an address must be
specified with three bits. To perform the
operation
R1 ← R2 + R3
The control must provide binary selection variable to perform the following sequence of
micro-operations
T1: A ← M[010] read R2 to register A
T2: B ← M[011] read R3 to register B
T3: M[001] ← A + B perform operation in ALU and transfer result to R1
Control function T1 must supply
an address of 010 to the memory and
activate the read and load A inputs.
control function T2 must supply an
address 011 to the memory and activate
the read and load B inputs. Control
function T3 must supply the function
code to the ALU and shifter to perform

11
KTU - CST202 - Computer Organization and Architecture Module: 2

an add operation, apply an address 001 to the memory, select the output of the shifter for the
MUX and activate the memory write input.
Some processor employ a 2 port memory in order to overcome the delay caused when
reading two source registers. A 2-port has two separate address lines to select two words of
memory simultaneously. The organization of a processor unit with a 2-port scratchpad memory
is shown in figure.

Accumulator Register
An accumulator is a register for short-term, intermediate storage of arithmetic and logic
data in a computer's CPU (central processing unit).The most elementary use for an accumulator
is adding a sequence of numbers. The numerical value in the accumulator increases as each
number is added, exactly as it happens in a simple desktop calculator (but much faster, of
course). Once the sum has been determined, it is written to the main memory or to another
register.
The accumulator register in a processor unit is a
multipurpose register capable of performing not only the
add micro-operation, but many other operations as well.
The block diagram shows the processor unit that
employs an accumulator units.
To form the sum of two numbers stored in
processor registers, it is nessesary to add them in the A
register using the following sequence of micro-
operations:
T1: A ← 0 Clear A
T2: A ← A + R1 Transfer R1 to A
T3: A ← A + R2 Add R2 to A

The sum / result formed in A may be used for


other computation or may be transfered to a
required destination.
Status Registers
The relative magnitude of two numbers may be determined by subtracting one number
from the other andthen checking certain bit conditions in the resultant difference. This status bit
conditions (often calledcondition-code bits or flag bits) are stored in a status register.
Status register is a 4 bit register. The four bits are C (carry), Z (zero),S (sign) and V
(overflow).These bits are set or cleared as a result of an operation performed in the ALU.

12
KTU - CST202 - Computer Organization and Architecture Module: 2

 Bit C is set if the output carry of an ALU is 1.


 Bit S is set to 1 if the highest order bit of the result in the output of the ALU is 1.
 Bit Z is set to 1 if the output of the ALU contains all O's.
 Bit V is set if the exclusive —OR of carries C8 and C9 is 1, and cleared
otherwise. This is the condition for overflow when the numbers are in signed 2's
complement representation. For an 8 bit ALU, V is set if the result is greater than
127 or less than -128.
After an ALU operation, status bits can be checked to determine the relationship that
exist between the values of A and B.

If bit V is set after the addition two signed numbers, it indicates an overflow condition. If
Z is set after anexclusive OR operation, it indicates that A=B. A single bit in A can be checked to
determine if it is 0 or 1by masking all bits except the bit in question and then checking the Z
status bit.
Relative magnitudes of A and B can be checked by compare operation. If A-B is
performed for twounsigned binary numbers, relative magnitudes of A and B can be determined
from the values transferred tothe C and Z bits. If Z=1,we knows that A=B, since A-B=0. If Z=0,
then we know that A is not equal to B.
Similarly C=1 if A>=B and C=0 if A<B. The following table lists the various conditions

13
KTU - CST202 - Computer Organization and Architecture Module: 2

ARITHMETIC LOGIC UNIT

An arithmetic logic unit (ALU) is a multi operation, combinational-logic digital function.


It can perform a set of basic arithmetic operations and a set of logic operations. The ALU has a
number of selection lines to select a particular operation in the unit. The selection lines are
decoded within the ALU so that k selection variables can specify up to 2k distinct operations.The
figure shows the block diagram of 4 bit ALU.

The design of a typical ALU will be carried out in three stages. First, the design of the
arithmetic section will be undertaken. Second, the design of the logic section will be considered.
Finally, the arithmetic section will be modified so that it can perform both arithmetic and logic
operations.

Design of Arithmetic Circuit


The basic component of the arithmetic section of an ALU is a parallel adder. A parallel
adder is constructed with a number of full-adder circuits connected in cascade. By controlling
the data inputs to the parallel adder, it is possible to obtain different types of arithmetic
operations.
The figure demonstrates the arithmetic operations obtained when one set of inputs to a
parallel adder is controlled externally. The number of bits in the parallel adder may be of any
value. The input carry Cin goes to the full-adder circuit in the least significant bit position. The
output carry Cout comes from the full-adder circuit in the most significant bit position.

14
KTU - CST202 - Computer Organization and Architecture Module: 2

The circuit that controls input B to provide the functions is called a true/complement, one/zero
element. This circuit is illustrated in the following figure.
The 2 selection lines s1 and s0 control the input of each B terminal.
S0 S1 Y
0 0 0
0 1 B’1
1 0 B1
1 1 1

The input A is applied directly to the 4-bit parallel adder and the input B is
modified. The resultant arithmetic circuit is shown in below figure.
A 4-bit arithmetic circuit that performs 8 arithmetic operations is shown in following Figure.

15
KTU - CST202 - Computer Organization and Architecture Module: 2

The function table for the arithmetic circuit is given below.

The 4 full-adder (FA) circuits constitute the parallel adder.


 The carry into the first stage is the input carry.
 The carry out of the fourth stage is the output carry.
 All other carries are connected internally from one stage to the next.
The selection variables are s1, s0, and Cin .
 Variables s1ands0 control all of the B inputs to the full-adder circuits.
The A inputs go directly to the other inputs of the full adders.

The arithmetic operations implemented in the arithmetic circuit are listed in Table.
 The values of the Y inputs to the full-adder circuits are a function of selection
variabless1 and s0.
 Adding the value of Y in each case to the value of A plus the Cin value gives the
arithmetic operation in each entry.
 The arithmetic circuit of above Figure needs a combinational circuit in each stage
specified by the Boolean functions:

where n is the number of bits in the arithmetic circuit.

Effect of Output Carry in the arithmetic circuit

16
KTU - CST202 - Computer Organization and Architecture Module: 2

Design of other Arithmetic Circuits

The design of any arithmetic circuit can be done by following the procedure outlined in the
previous example.
 Assuming that all operations in the set can be generated through a parallel adder

Steps in design
i. Start by obtaining a function diagram.
ii. Obtain a function table from the function diagram that relates the inputs of the full-adder
circuit to the external inputs.
iii. Obtain the combinational gates from the function table that must be added to each full-adder
stage.

This procedure is demonstrated in the following example.

Qn) Design an adder/subtractor circuit with one selection variable s and two inputs A and
B.

When s = 0, the circuit performs A+B


When s = 1, the circuit performs A-B by taking the 2's complement of B.

17
KTU - CST202 - Computer Organization and Architecture Module: 2

Design of Logic Circuit


The logic microoperations manipulate the bits of the operands separately and treat each
bit as a binary variable. The 16 logic operations can be generated in one circuit and selected by
means of four selection lines. Since all logic operations can be obtained by means of AND, OR,
and NOT (complement) operations, it may be more convenient to employ a logic circuit with
just these operations.
For three operations, we need two selection variables. But two selection lines can select
among four logic operations, so we choose also the exclusive-OR (XOR) function for the logic
circuit to be designed in this and the next section.
The simplest and most straight forward way to design a logic circuit is shown in figure
given below. The diagram shows one typical stage designated by subscript i. The circuit must be
repeated n times for an n-bit logic circuit.

18
KTU - CST202 - Computer Organization and Architecture Module: 2

The four gate generate four logic operations OR, XOR, AND, and NOT. The two
selection variables in the multiplexer select one of the gates for the output. The function table
lists the output logic generated as a function of the two selection variables.
The logic circuit can be combined with the arithmetic circuit to produce one arithmetic
logic unit. Selection variables S1 and S0 can be made common to both sections provided we use
a third selection variable, S2, to differentiate between the two. This configuration is illustrated in
the below figure.

The outputs of the logic and arithmetic circuits in each stage go through a multiplexer
with selection variable S2.
When S2 = 0, the arithmetic output is selected,
when S2 = 1, the logic output is selected.
Although the two circuits can be combined in this manner, this is not the best way to
design an ALU.A more efficient ALU can be obtained if we investigate the possibility of
generating logic operations in an already available arithmetic circuit. This can be done by

19
KTU - CST202 - Computer Organization and Architecture Module: 2

inhibiting all input carries into the full-adder circuits of the parallel adder. Consider the Boolean
function that generates the output sum in a full-adder circuit:
F=X ⊕ Y ⊕ Cin
The input carry Cin in each stage can be made to be equal to 0 when a selection
variableS2is equal to 1. The result would be:
F=X ⊕ Y
This expression is valid because of the property of the X-OR operation:
X⊕0 = X
Thus, with the input carry to each stage equal to 0, the full-adder circuits generate the
exclusive-OR operation.
Now refer the figure of arithmetic unit circuit.
 The value of Yi can be selected by means of the two selection variables to be
equal to either 0, Bi, Bi', or l.
 The value of Xi is always equal to input Ai. Table given below shows the four
logic operations obtained when s2=0
 This selection variable forces Ci to be equal to 0 whiles1 and s0 choose a
particular value for Yi.

 The 4 logic operations obtained by this configuration are transfer, exclusive-OR,


equivalence, and complement.
The third entry is the equivalence operation because:

The last entry in the table is the NOT or complement operation because:

The table has one more column which lists the four logic operations we want to include in
the ALU.
 Two of these operations, XOR and NOT, are already available.
 It is possible to modify the arithmetic circuit further so that it will generate the logic
functions OR and AND instead of the transfer and equivalence functions.

20
KTU - CST202 - Computer Organization and Architecture Module: 2

Design of Arithmetic Logic Unit


A basic ALU with eight arithmetic operations and four logic operations can be designed with the
details already have.
We already have
 Three selection variables s2, s1, and s0to select eight different operations
 The input carry Cin is used to select four additional arithmetic operations.
 With s2 = 0, selection variables s1and s0 together with Cin will select the eight arithmetic
operations
 With s2 = l, selection variables s1 and s0 will select the four logic operations OR, XOR,
AND, and NOT.

The design of an ALU is a combinational-logic problem.


 We can design one stage of the ALU and then duplicate it for the number of stages
required.
 There are six inputs to each stage: Ai, Bi, Ci,s2, s1 and s0.
 There are two outputs in each stage: output Fi and the carry out Ci+1
One can formulate a truth table with 64 entries and simplify the two output functions.
Here we choose to employ an alternate procedure that uses the availability of a parallel adder.

Design steps of ALU


1. Design the arithmetic section independent of the logic section.
2. Determine the logic operations obtained from the arithmetic circuit in step 1, assuming
that the input carries to all stages are 0.
3. Modify the arithmetic circuit to obtain the required logic operations.
 The solution to the first design step is shown in Arithmetic unit design.
 The solution to the second design step is presented in Logic unit design.
 The solution of the third step is carried out below.

From Table, we see that


 When s2= l, the input carry Ci in each stage must be 0. Ci=0.
 With s1,s0 = 00, each stage as it stands generates the function Fi = Ai. [Since Xi=Ai and
Yi=0; refer diagram]

To change the output to an OR operation, we must change the input to each full-adder circuit
from Ai to Ai+Bi. This can be accomplished by ORing Bi and Ai when s2s1s0 = 100.

The other selection variables that give an undesirable output occur when s2s1s0= 110. The unit
as it stands generates an output but we want togenerate the AND operation
Fi = Ai.Bi.
Let us investigate the possibility of ORing each input Ai with some Boolean function Ki, to
change Xi. Since we can’t change Yi.

21
KTU - CST202 - Computer Organization and Architecture Module: 2

The function so obtained is then used for Xi when s2s1s0= 110:

Careful inspection of the result reveals that if the variable Ki = B,' , we obtain an output:

Two terms are equal to 0 because Bi.Bi' = 0. The result obtained is the AND operation as
required.
The conclusion is that, if Ai is ORed with Bi' when s2s1s0= 110, the output will generate the
AND operation.
The final ALU is shown in figure below Only the first two stages are drawn, but the
diagram can be easily extended to more stages. The inputs to each full-adder circuit are specified
by the Boolean functions:
Xi = Ai + S2 S1’ S0’Bi + S2 S1 S0’ Bi
Yi = S0 Bi + S1 Bi’
Zi = S2’ Ci
When S2 = 0, the three functions reduce to:
Xi = Ai
Yi = S0 Bi + S1 Bi’
Zi = Ci

22
KTU - CST202 - Computer Organization and Architecture Module: 2

Which are the function for the arithmetic circuit. The logical operations are generated
when S2 = 1. For S2 S1 S0 = 1 0 1 or 1 1 1, the function reduce to:
Xi = Ai Yi = S0 Bi + S1 Bi’ Zi = 0

The function table for the Arithmetic and Logic Unit is shown below. The 12 operations generated
in the ALU are summarized in Table.

The particular function is selected through s2,sl,


s0, and Cin.

The arithmetic operations are identical to the


ones listed for the arithmetic circuit.

The value of Cin for the four logic functions has


no effect on the operation of the unit and
those entries are marked with don't-care X 's.
Design of Combinational Logic Shifter

23
KTU - CST202 - Computer Organization and Architecture Module: 2

The shift unit attached to the processor transfers the output of the ALU onto the output
bus. Shifter may function in four different ways.
1. The shifter may transfer the information directly without a shift.
2. The shifter may shift the information to the right.
3. The shifter may shift the information to the left.
4. In some cases no transfer is made from ALU to the output bus.
A shifter is a bi-directional shift-register with parallel load. The information from ALU
can be transferred to the register in parallel and then shifted to the right or left. In this
configuration, a clock pulse is needed for the transfer to the shift register, and another pulse is
needed for the shift. Another clock pulse may also in need of when information is passed from
shift register to destination register.

The number of clock pulses may reduce if the shifter is implemented with a
combinational circuit. A combinational—logic shifter can be constructed with multiplexers. The
above figure will show the same.
Shifter operation can be selected by two variables H1 H0
 If H1 H0 = 0 0 No shift is executed and the signal from F go directly to S lines
 If H1 H0 = 0 1 Shift Right is executed
 If H1 H0 = 1 0 Shift Left is executed
 If H1 H0 = 1 1 No operations

PROCESSOR UNIT

A block diagram of a processor unit is shown in figure. It consists of seven registers R1


through R7 and a status register. The outputs of the seven registers go through two multiplexers
to select the inputs to the ALU.

24
KTU - CST202 - Computer Organization and Architecture Module: 2

`
Input data from an external source are also selected by the same multiplexers. The output
of the ALU goes through a shifter and then to a set of external output terminals. The output from
the shifter can be transferred to any one of the registers or to an external destination.
There are 16 selection variables in the unit, and their function is specified by a Control
Word. The 16-bit control word, when applied to the selection variables in the processor,
specifies a given microoperation. The Control Word is partitioned into 6 fields, with each field
designated by a letter name. All fields, except Cin, have a code of three bits.
The functions of all selection variables are specified in table below.

25
KTU - CST202 - Computer Organization and Architecture Module: 2

The 3 bits of A select a Source Register for the input to left side of the ALU.
 The B field is the same, but it selects the source information for the right input of the ALU.
 The D field selects a Destination Register.
 The F field, together with the bit in Cin, selects a Function for the ALU.
 TheH field selects the type of Shift in the shifter unit.

The 3-bit binary code listed in the table specifies the code for each of the five fields A,
B, D, input data, F, and H. The register selected by A, B, and D is the one whose decimal
number is equivalent to the binary number in the code. When the A or B field is 000, the
corresponding multiplexer selects the input data. When D = 000, no destination register is
selected.
The three bits in the F field, together with the input carry Cin, provide the 12 operations
of the ALU as specified in above table. Note that there are two possibilities for F = A. In one
case the carry bit C is cleared, and in the other case it is set to 1.
A control word of 16 bits is needed to specify a microoperation for the processor unit.
 The most efficient way to is to store them in a memory unit which functions as a control
memory.
 The sequence of control words is then read from the control memory, one word at a
time, to initiate the desired sequence of microoperations.
 This type of control organization is called Microprogramming.

DESIGN OF ACCUMULATOR

Some processor units distinguish one register from all others and call it an accumulator
register. The block diagram of an accumulator that
forms a sequential circuit is shown in figure below.
The A register and the associated
combinational circuit constitutes a sequential circuit.
The combinational circuit replaces the ALU but cannot

26
KTU - CST202 - Computer Organization and Architecture Module: 2

be separated from the register, since it is only the combinational-circuit part of a sequential
circuit. The A register is referred to as the accumulator register and is sometimes denoted by the
symbol AC. Here, accumulator refers to both the A register and its associated combinational
circuit.
The external inputs to the accumulator are the data inputs from B and the control
variables that determine the micro operations for the register. The next state of register A is a
function of its present state and of the external inputs.
Accumulator can also perform data processing operations. Total of nine operations is
considered here for the design of accumulator circuit.

In all listed microoperations A is the source register. B register is used as the second
source register. The destination register is also accumulator register itself. For a complete
accumulator there will be n stages.

Fig: 4 bit accumulator constructed with 4 bits

27
KTU - CST202 - Computer Organization and Architecture Module: 2

The inputs and outputs of each stage can be connected in cascade to form a complete
accumulator. Here we are discussing the design of a 4 bit accumulator. The number on top of
each block represents the bit position.
All blocks receive 8 control variables P1 to P8 and the clock pulses from CP. The other
six inputs and four outputs are same as with the typical stage. The zero detect chain is obtained
by connecting the z variables in cascade, with the first block receiving a binary constant I . The
last stage produces the zero detect variable Z.
Total number of terminals in the 4 bit accumulator is 25, including terminals for the A
outputs. Incorporating two more terminals for power supply, the circuit can be enclosed within
one IC package having 27 or 28 pins.
The number of terminals for the control variable can be reduced from 9 to 4 if a decoder
is inserted in the IC. In such cases, IC pin count is also reduced to 22 and the accumulator can be
extended to 16 microoperations without adding external pins (That is, with 4 bits we can identify
16 operations).

28
ARITHMETIC ALGORITHMS - Algorithms for multiplication and division
(restoring method) of binary numbers — Array multiplier —Booth’s
multiplication algorithm
Module: 3 Pipelining – Basic Principles, classification of pipeline processors.
instruction and arithmetic pipelines (Design examples not required),
hazard detection and resolution.

MULTIPLICATION OF UNSIGNED NUMBERS

Product of 2 n bit numbers is atmost 2n bit number. Unsigned multiplication can be


viewed as addition of shifted versions of the multiplicand. Multiplication involves the generation
of partial products, one for each digit in the multiplier. These partial products are then summed to
produce the final product. When the multiplier bit is 0, the partial product is 0. When the
multiplier is 1 the partial product is the multiplicand. The total product is produced by summing
the partial products. For this operation, each successive partial product is shifted one position to
the left relative to the preceding partial product.
Multiplication of two integer numbers 13 and 11 is,

Array Multiplier
Binary multiplication can be implemented in a combinational two-dimensional logic array
called array multiplier.
 The main component in each in each cell is a full adder, FA.
 The AND gate in each cell determines whether a multiplicand bit mj, is added to the
incoming partial product bit based on the value of the multiplier bit, qi.
 Each row i, where 0<= i <=3, adds the multiplicand (appropriately shifted) to the
incoming parcel product, PPi, to generate the outgoing partial product, PP(i+1), if
qi.=1.
 If qi.=0, PPi is passed vertically downward unchanged. PP0 is all 0’s and PP4 is the
desired product. The multiplication is shifted left one position per row by the diagonal
signal path.
(a)Array multiplication of positive binary operands (b) Multiplier cell

Disadvantages:
(1) An n bit by n bit array multiplier requires n2 AND gates and n(n-2) full adders and n
half adders.(Half aders are used if there are 2 inputs and full adder used if there are 3
inputs).
(2) The longest part of input to output through n adders in top row, n -1 adders in the
bottom row and n-3 adders in middle row. The longest in a circuit is called critical
path.
Sequential Circuit Multiplier
Multiplication is performed as a series of (n) conditional addition and shift operation such
that if the given bit of the multiplier is 0 then only a shift operation is performed, while if the
given bit of the multiplier is 1 then addition of the partial products and a shift operation are
performed.
The combinational array multiplier uses a large number of logic gates for multiplying
numbers. Multiplication of two n-bit numbers can also be performed in a sequential circuit that
uses a single n bit adder.
The block diagram in Figure shows the hardware arrangement for sequential
multiplication. This circuit performs multiplication by using single n-bit adder n times to
implement the spatial addition performed by the n rows of ripple-carry adders in Figure. Registers
A and Q are shift registers, concatenated as shown. Together, they hold partial product PPi while
multiplier bit qi generates the signal Add/Noadd. This signal causes the multiplexer MUX to
select 0 when qi = 0, or to select the multiplicand M when qi = 1, to be added to PPi to generate
PP(i + 1). The product is computed in n cycles. The partial product grows in length by one bit per
cycle from the initial vector, PP0, of n 0s in register A. The carryout from the adder is stored in
flipflop C, shown at the left end of the register C.
Algorithm:
(1) The multiplier and multiplicand are loaded into two registers Q and M. Third register
A and C are cleared to 0.
(2) In each cycle it performs 2 steps:
(a) If LSB of the multiplier qi =1, control sequencer generates Add signal which
adds the multiplicand M with the register A and the result is stored in A.
(b) If qi =0, it generates Noadd signal to restore the previous value in register A.
(3) Right shift the registers C, A and Q by 1 bit

MULTIPLICATION OF SIGNED NUMBERS

We now discuss multiplication of 2’s-complement operands, generating a double-length


product. The general strategy is still to accumulate partial products by adding versions of the
multiplicand as selected by the multiplier bits.
First, consider the case of a positive multiplier and a negative multiplicand. When we
add a negative multiplicand to a partial product, we must extend the sign-bit value of the
multiplicand to the left as far as the product will extend. Figure shows an example in which a 5-
bit signed operand, −13, is the multiplicand. It is multiplied by +11 to get the 10-bit product,
−143. The sign extension of the multiplicand is shown in blue. The hardware discussed earlier
can be used for negative multiplicands if it is augmented to provide for sign extension of the
partial products.
13 –> 1101
+13 -> 01101 [for +ve number, add 0 to MSB]
-13 -> 10010 + [for –ve number, find 2’s complement]
1
--------------
10011 -> -13
[To extend the sign bit - since its 5 bit signed operand, 10 bit product should be generated.
So, if the partial product’s MSB is 1, add 1 for sign extension (to left),
if the partial product’s MSB is 0, add 0 for sign extension (to left)]

Example: Sign extension of negative multiplicand

[product is10 bits ->(2n)]

For a negative multiplier, a straightforward solution is to form the 2’s-complement of


both the multiplier and the multiplicand and proceed as in the case of a positive multiplier.
This is possible because complementation of both operands does not change the value or the sign
of the product.
[If the sign bit is 0 then the number is positive, If the sign bit is 1, then the number is negative]
The Booth Algorithm
Algorithm & Flowchart for Booth Multiplication
1. Multiplicand is placed in BR and Multiplier in QR
2. Accumulator register AC, Qn+1 are initialized to 0
3. Sequence counter SC is initialized to n (number of
bits).
4. Compare Qn and Qn+1 and perform the following
01 –> AC=AC+BR
10 –> AC=AC+BR’+1
00 –> No arithmetic operation
11-> No arithmetic operation
5. ASHR- Arithmetic Shift right AC,QR
6. Decrement SC by 1
The final product will be store in AC, QR
Multiply -9 x -13 using Booth Algorithm
9 = 1001 13 = 1101 BR= 10111
+9 = 01001 +13 = 01101 BR’+1= 01000+
-9 = 10110+ -13 = 100010+ 1
1 1 --------------
------------------ ------------------- 01001 (BR’+1)
10111 (BR) 10011 (Q)

BR=10111
Qn Qn+1 AC Q Qn+1 SC
BR'+1=01001
Initial 00000 10011 0 101
00000+
SUB 01001
1 0
01001 10011 0 101
ASHR 00100 11001 1 100
1 1 ASHR 00010 01100 1 011
00010+
ADD 10111
0 1
11001 01100 1 011
ASHR 11100 10110 0 010
0 0 ASHR 11110 01011 0 001
11110
SUB 01001
1 0
00111 01011 0 001
ASHR 00011 10101 1 000

Resultant Product in A and Q = 00011 10101


= 26+25+24+22+20
= 117
==============
Multiply 13 x -6 using Booth Algorithm
13 = 1101 6 = 0110 BR= 01101
+13 = 01101 (BR) +6 = 00110 BR’+1= 10010+
-6 = 11001+ 1
1 --------------
-------------- 10011 (BR’+1)
11010 (Q)

BR=01101
Qn Qn+1 AC Q Qn+1 SC
BR'+1=10011
Initial 00000 11010 0 101
0 0 ASHR 00000 01101 0 100
00000+
SUB 10011
1 0
10011 01101 0 100
ASHR 11001 10110 1 011
11001+
ADD 01101
0 1
00110 10110 1 011
ASHR 00011 01011 0 010
00011
SUB 10011
1 0
10110 01011 0 010
ASHR 11011 00101 1 001
1 1 ASHR 11101 10010 1 000
[13x-6 will give a –ve product. so the resultant product’s 2’s compliment should be
determined]
Resultant Product in A and Q = 11101 10010
2’s complement = 00010 01101+
1
--------------------
0001001110
=26+23+22+21
. = -78
==================
Multiply -11 x 8 using Booth Algorithm
11 = 1011 8 = 1000 BR= 10101
+11 = 01011 +8 = 01000 (Q) BR’+1= 01010+
-11 = 10100+ - 1
1 --------------
------------------ 01011 (BR’+1)
10101 (BR)

BR=10101
Qn Qn+1 AC Q Qn+1 SC
BR'+1=01011
Initial 00000 01000 0 101
0 0 ASHR 00000 00100 0 100
0 0 ASHR 00000 00010 0 011
0 0 ASHR 00000 00001 0 010
00000+
SUB 01011
1 0
01011 00001 0 010
ASHR 00101 10000 1 001
00101
ADD 10101
0 1
11010 10000 1 010
ASHR 11101 01000 0 000
[-11x8 will give a –ve product. so the resultant product’s 2’s compliment should be
determined]

Resultant Product in A and Q = 11101 01000


2’s complement = 00010 10111+
1
---------------------
0001011000
=26+24+23
= -88
=========
Multiply each of the following pairs of signed 2’s complement number using Booth’s
algorithm. In each of the cases assume A is the multiplicand and B is the multiplier.
A=010111 B=110110

Answer:
A=010111 B=110110
[sign bit is 0, therefore +ve number] [sign bit is 1, therefore -ve number]
Find 2’s complement.
A=23 [10111] 2’s complement of 10110 is 01001+ 1
= 01010 => 10
Therefore, A=+23 [010111] Therefore, B= -10 [110110]

Multiply +23 x -10


BR= 010111
+23 = 010111 (BR) BR’+1= 101000+
-10 = 110110 (Q) 1
------------
101001 (BR’+1)

BR=010111
Qn Qn+1 AC Q Qn+1 SC
BR'+1=101001
Initial 000000 110110 0 0110
0 0 ASHR 000000 011011 0 0101
000000+
SUB 101001
1 0
101001 011011 0 0101
ASHR 110100 101101 1 0100
1 1 ASHR 111010 010110 1 0011
111010+
ADD 010111
0 1
010001 010110 1 0011
ASHR 001000 101011 0 0010
001000+
1 0 SUB 101001
110001 101011 0 0010
ASHR 111000 110101 1 0001
1 1 ASHR 111100 011010 1 0000

[+23x-10 will give a –ve product. so the resultant product’s 2’s compliment should be
determined]
Resultant Product in A and Q = 111100 011010
2’s complement = 000011 100101+
1
--------------------------
000011100110
=27 +26+25+22 + 21
= -230
=============

Features of Booth Algorithm:

 Booth algorithm works equally well for both negative and positive multipliers.
 Booth algorithm deals with signed multiplication of given number.
 Speed up the multiplication process.

Booth Recording of a Multiplier:

In general, in the Booth algorithm, −1 times the shifted multiplicand is selected when moving
from 0 to 1, and +1 times the shifted multiplicand is selected when moving from1 to 0, as the
multiplier is scanned from right to left. The case when the LSB of the multiplier is 1, it is
handled by assuming that an implied 0 lies to its right.

 In worst case multiplier, numbers of addition and subtraction operations are large.
 In ordinary multiplier, 0 indicates no operation, but still there are addition and
subtraction operations to be performed.
 In good multiplier, booth algorithm works well because majority are 0s .
 A good multiplier consists of block/sequence of 1s.
Booth algorithm achieves efficiency in the number of additions required when the multiplier had
a few large blocks of 1s. The speed gained by skipping over 1s depends on the data. On average,
the speed of doing multiplication with the booth algorithm is the same as with the normal
multiplication
• Best case – a long string of 1’s (skipping over 1s)
• Worst case – 0’s and 1’s are alternating
• The transformation 011….110 to 100….0-10 is called skipping over 1s.

INTEGER DIVISION

Figure shows examples of decimal division and binary division of the same values.
Consider the decimal version first. The 2 in the quotient is determined by the following
reasoning: First, we try to divide 13 into 2, and it does not work. Next, we try to divide 13into 27.
We go through the trial exercise of multiplying 13 by 2 to get 26, and, observing that 27 − 26 = 1
is less than 13, we enter 2 as the quotient and perform the required subtraction.

Dividend = 274
Divisor = 13
Quotient=21
Remainder =1
The next digit of the dividend, 4, is brought down, and we finish by deciding that 13 goes
into 14 once and the remainder is 1. We can discuss binary division in a similar way, with the
simplification that the only possibilities for the quotient bits are 0 and 1.
A circuit that implements division by this longhand method operates as follows: It
positions the divisor appropriately with respect to the dividend and performs a subtraction. If the
remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended by
another bit of the dividend, the divisor is repositioned, and another subtraction is performed.
If the remainder is negative, a quotient bit of 0 is determined, the dividend is restored by
adding back the divisor, and the divisor is repositioned for another subtraction. This is called the
restoring division algorithm.

Restoring Division

Figure shows a logic circuit arrangement that implements the restoring division algorithm
just discussed. An n-bit positive divisor is loaded into register M and an n-bit positive dividend
is loaded into register Q at the start of the operation. Register A is set to 0. After the division is
complete, the n-bit quotient is in register Q and the remainder is in register A.
The required subtractions are facilitated by using 2’s-complement arithmetic. The extra
bit position at the left end of both A and M accommodates the sign bit during subtractions. The
following algorithm performs restoring division.
Do the following three steps n times:
1. Shift A and Q left one bit position.
2. Subtract M from A, ie; (A-M) and place the answer back in A.
3. If the sign of A is 1, set q0 to 0 and add M back to A (that is, restore A); otherwise, set
q0 to 1.

M = 00011
M’+1 = 11100+1
= 11101

Dividend=8 [1000], Divisor=3 [11]


Quotient=2 [0010], Remainder=2 [00010]
PIPELINING

Pipelining is a technique of decomposing a sequential process into sub operations, with each
sub process being executed in a special dedicated segment that operates concurrently with
all other segments. A pipeline can be visualized as a collection of processing segments
through which binary information flows. Each segment performs partial processing dictated
by the way the task is partitioned. The result obtained from the computation in each segment
is transferred to the next segment in the pipeline. The final result is obtained after the data
have passed through all segments.
A pipeline processor may process each instruction in 4 steps:
F Fetch: Read the instruction from the memory
D Decode: Decode the instruction and fetch the source operands
E Execute: Perform the operation specified by the instruction
W Write: Store the result in the destination location.
In figure (a) four instructions progress at any given time. This means that four distinct
hardware units are needed as in figure (b). These units must be capable of performing their
tasks simultaneously without interfering with one another. Information is passed from one
unit to next through a storage buffer. As an instruction progresses through the pipeline. all
the information needed by the stages downstream must be passed along.
Pipeline Organization
The simplest way of viewing the pipeline structure is to imagine that each segment
consists of an input register followed by a combinational circuit. The register holds the data
and the combinational circuit performs the sub operation in the particular segment. The
output of the combinational circuit is applied to the input register of the next segment. A
clock is applied to all registers after enough time has elapsed to perform all segment
activity. In this way the information flows through the pipeline one step at a time. Example
demonstrating the pipeline organization

Suppose we want to perform the combined multiply


and add operations with astream of numbers.

Ai*Bi + Ci for i=1, 2, 3 ….7


Each sub operation is to implemented in a segment
within a pipeline. Each segment has one or two
registers and a combinational circuit as shown in fig.

R1 through r5 are registers that receive new data


with every clock pulse.
The multiplier and adder are combinational circuits.
The sub operations performed in each segment of the
pipeline are as follows:

R1<- Ai R2<-Bi Input Ai and Bi


R3<-R1*R2 R4<-Ci multiply and input Ci
R5<-R3+R4 add Ci to product
The five registers are loaded with new data every clock pulse.
The first clock pulse transfers A1 and B1 into R1 and R2. The second clock pulse
transfers the product of R1 and R2 into R3 and C1 into R4. The same clock pulse transfers
A2 and B2 into R1 and R2. The third clock pulse operates on all three segments
simultaneously. It places A3 and B3 into R1 and R2, transfers the product of R1 and R2 into
R3, transfers C2 into R4, and places the sum of R3 and R4 into R5. It takes three clock
pulses to fill up the pipe and retrieve the first output from R5. From there on, each clock
produces a new output and moves the data one step down the pipeline. This happens as long
as new input data flow into the system.

Four Segment Pipeline


The general structure of four segment pipeline is shown in fig. the operands are passed
through all four segments in affixed sequence. Each segment consists of a combinational
circuit Si that performs a sub operation over the data stream flowing through the pipe. The
segments are separated by registers Ri that hold the intermediate results between the stages.
Information flows between adjacent stages under the control of a common clock applied to
all the registerssimultaneously.

Space Time Diagram


The behavior of a pipeline can be illustrated with a space time diagram. This is a diagram
that shows the segment utilization as a function of time.
Fig - The horizontal axis displays the time in clock cycles and the vertical axis gives the
segment number. The diagram shows six tasks T1 through T6 executed in four segments.
Initially, task T1 is handled by segment 1. After the first clock, segment 2 is busy with T1,
while segment 1 is busy with task T2. Continuing in this manner, the first task T1 is
completed after fourth clock cycle. From then,the pipe completes a task every clock cycle.
KTU - CST202 - Computer Organization and Architecture Module: 3

Consider the case where a k-segment pipeline with a clock cycle time tp is used to
execute n tasks. The first task T1 requires a time equal to ktp to complete its operation since
there are k segments in a pipe. The remaining n-1 tasks emerge from the pipe at the rate of
one task per clock cycle and they will be completed after a time equal to (n-1) tp. Therefore,
to complete n tasksusing a k segment pipeline requires k+ (n-1) clock cycles.

Consider a non pipeline unit that performs the same operation and takes a time equal to
tn to complete each task. The total time required for n tasks is n tn. The speedup of a pipeline
processing over an equivalent non pipeline processing is defined by the ratio
S=ntn / (k+n-1)tp
As the number of tasks increases, n becomes much larger than k-1, and k+n-1 approaches the
value of n. under this condition the speed up ratio becomes
S=tn/tp
If we assume that the time it takes to process a task is the same in the pipeline and non
pipeline circuits, we will have tn=ktp. Including this assumption speedup ratio reduces to
S=ktp/tp=k

CLASSIFICATION OF PIPELINE PROCESSORS

1. Arithmetic Pipelining: The arithmetic logic units of a computer can be segmented for
pipeline operations in various data formats.
2. Instruction Pipelining: The execution of stream of instructions can be pipelined by
overlapping the execution of current instruction with the fetch, decode and execution of
subsequent instructions. This technique is known as instruction lookahead.
16
KTU - CST202 - Computer Organization and Architecture Module: 3

3. Processor Pipelining: Pipeline processing of the same data stream by a cascade of


processors, each of which processes a specific task. The data stream passes the first
processor with the results stored in memory block which is also accessible by the second
processor. The second processor then passes the refined results to the third and so on.

ARITHMETIC PIPELINES

An arithmetic pipeline divides an arithmetic operation into sub operations for


execution in the pipeline segments. Pipeline arithmetic units are usually found in very high
speed computers. They are used to implement floating point operations, multiplication of
fixed point numbers, and similar computations encountered in scientific problems.

17
KTU - CST202 - Computer Organization and Architecture Module: 3

Pipeline Unit For Floating Point Addition And Subtraction:


The inputs to the floating point adder pipeline are two normalized floating point
binary numbers. X=A *2a , Y=B*2b

A and B are two fractions that represent the mantissa and a and bare the exponents. The
floating point addition and subtraction can be performed in four segments. The registers
labeled are placed between the segments to store intermediate results. The sub operations
that are performed in the four segments are:

1. Compare the exponents


2. Align the mantissa.
3. Add or subtract the mantissas.
4. Normalize the result.

18
KTU - CST202 - Computer Organization and Architecture Module: 3

The exponents are compared by subtracting them to determine their difference. The
larger exponent is chosen as the exponent of the result. The exponent difference determines
how many times the mantissa associated with the smaller exponent must be shifted to the
right. This produces an alignment ofthe two mantissas.
The two mantissas are added or subtracted in segment3. The result is normalized in
segment 4. When an overflow occurs, the mantissa of the sum or difference is shifted to right
and the exponent incremented by one. If the underflow occurs, the number of leading zeroes
in the mantissa determines the number of left shifts in the mantissa and the number that must
be subtracted from the exponent.
[Overflow – When the result of an Arithmetic operation is finite but larger in magnitude than
the largest floating point which can be stored by the precision, Underflow – When the result
of an Arithmetic operation is smaller in magnitude than the smallest floating point which can
be stored]

INSTRUCTION PIPELINE
An instruction pipeline operates on a stream of instructions by overlapping the
fetch, decode, and execute phases of instruction cycle. An instruction pipeline reads
consecutive instructions from memory while previous instructions are being executed in
other segments. This causes the instruction fetch and executes phases to overlap and perform
simultaneous operations.
Consider a computer with an instruction fetch unit and an instruction execute unit
designed to provide a two segment pipeline. The instruction fetch segment can be
implemented by means of a first in first out (FIFO) buffer. Whenever the execution unit is
not using memory, the control increments the program counter and uses it address value to
read consecutive instructions frommemory. The instructions are inserted into the FIFO buffer
so that they can be executed on a first in first out basis. Thus an instruction stream can be
placed inqueue, waiting for decoding and processing by the execution segment.

19
KTU - CST202 - Computer Organization and Architecture Module: 3

Four Segment Instruction Pipeline

In general the computer needs to process each instruction with the following sequence of
steps. (6 steps in 4 segments)

20
KTU - CST202 - Computer Organization and Architecture Module: 3

Fig shows the operation of the instruction pipeline. The clock in thehorizontal axis is
divided into steps of equal duration. The four segments are represented in the diagram with
an abbreviated symbol.

1. FI is the segment that fetches an instruction.

2. DA is the segment that decodes the instruction and calculates theeffective address.

3. FO is the segment that fetches the operand.


4. EX is the segment that executes the instruction.

Here the instruction is fetched (FI) on first clock cycle in segment 1.


it is decoded (DA) on second clock cycle , the operands are fetched (FO) on third clock
cycle and finally the instruction is executed (EX) in the fourth cycle. Here the fetch and
decode phase overlap due to pipelining. By the time the first instruction is being decoded,
next instruction is fetched by the pipeline.

In case of third instruction we see that it is a branched instruction. Here when it is being
decoded, 4th instruction is fetched simultaneously. But as it is a branched instruction it may
point to some other instruction when it is decoded. Thus fourth instruction is kept on hold
until the branched instruction is executed. When it gets executed then the fourth instruction is
KTU - CST202 - Computer Organization and Architecture Module: 3

copied back and the other phases continue as usual. In the absence of a branch instruction,
each segment operates on different instructions.

PIPELINE CONFLICTS:
1. Resource Conflicts: They are caused by access to memory by two segments at the
same time. Most of these conflicts can be resolved by using separate instruction and
data memories.

2. Data Dependency: these conflicts arise when an instruction depends on the result of
a previous instruction, but this result is not yet available.

3. Branch Difference: they arise from branch and other instructions that change the
value of PC.

PIPELINE HAZARDS DETECTION AND RESOLUTION

Pipeline hazards are caused by resource usage conflicts among various instructions in the
pipeline. Such hazards are triggered by inter instruction dependencies when successive
instructions overlap their fetch, decode and execution through a pipeline processor, inter
instruction dependencies may arise to prevent the sequential data flow in the pipeline.

For example an instruction may depend on the results of a previous instruction. Until the
completion of the previous instruction, the present instruction cannot be initiated into the
pipeline. In other instances, two stages of a pipeline may need to update the same memory
location. Hazards of this sort, if not properly detected and resolved could result in an inter
lock situation in the pipeline or produce unreliable results by overwriting.

There are three classes of data dependent hazards, according to various data update
patterns:

1. Write After Read hazards (WAR)


2. Read After Write hazards (RAW)
3. Write After Write hazards (WAW)

Note that Read After Read does not pose a problem because nothing is changed.
KTU - CST202 - Computer Organization and Architecture Module: 3

We use resource object to refer to working registers, memory locations and special flags. The
contents of these resource objects are called data objects. Each instruction can be considered
a mapping from a set of data objects to a set of data objects. The domain D(I) of an
instruction I is a set of resource objects whose data objects may affect the execution of
instruction I. The range of an instruction R(I) is the set of resource objects whose data objects
may be modified by the execution of instruction I. Obviously, the operands to be used in an
instruction execution are retrieved (read) from its domain and the results will be stored
(written) in its range.

Consider the execution of two instructions I and J in a program. Instruction J appears after
instruction I in the program. There may be none or other instructions between instruction I
and J. The latency between the two instructions is a very subtitle matter. Instruction J may
enter the execution pipe before or after the completion of the execution of instruction l. The
improper timing and the data dependencies may create some other hazardous situations.

1. RAW hazard between the two instructions I and J may occur when they attempt to
read some data object that has been modified by I.
2. WAR hazard may occur when J attempt to modify some data object that is read by I.
3. WAW hazard may occur if both I and J attempt to modify the same data object.

The necessary conditions for these hazards are stated as follows:

Possible hazards are listed in table. Recognizing the existence of possible hazards, computer
designers wish to detect the hazard and then to resolve it efficiently. Hazard detection can be
done in the instruction fetch stage of a pipeline processor by comparing the domain and the
range of incoming instruction with those of the instructions being processed in the pipe.
Should any of the condition in equation 3.18 be detected, a warning signal can be generated
to prevent the hazard from taking place. Another approach is to allow the incoming
instruction through the pipe and distribute the detection to all the potential pipeline stages.
KTU - CST202 - Computer Organization and Architecture Module: 3

This distributed approach offers better flexibility at the expense of increased hardware
control. Note that the necessary conditions in the equation 3.18 may not be sufficient
conditions.

Once the hazard is detected, the system should resolve the interlock situation. Consider the
instruction sequence {.. I, I+1,....J, J+1,...} in which a hazard has been detected between the
KTU - CST202 - Computer Organization and Architecture Module: 3

current instructions J and a previous instruction I. A straightforward approach is to stop the


pipe and to suspend the execution of the instructions J, J+ 1, J+2....until instruction I has
passed the point of resource conflict. A more sophisticated approach is to suspend only
instruction J and continue the flow of instructions is J+1, J+2,... down the pipe. Of course,
the potential hazards due to the suspension of J should be continuously checked as
instructions J+1, J+2 to move ahead of J. Multi level hazard detection may be encountered,
requiring more complex control mechanisms to resolve a stack of hazards

In order to avoid RAW hazards, IBM engineers developed a short circuiting approach
which gives a copy of the data object to be written directly to the instruction waiting to read
the data. This concept was generalized into a technique known as data forwarding, which
forward multiple copies of the data to as many waiting instructions as may wish to read it. A
data forwarding chain can be established in some cases. The internal forwarding and register-
tagging techniques are helpful in resolving logic hazards in pipelines.
KTU - CST202 - Computer Organization and Architecture Module: 4

CONTROL LOGIC DESIGN: Control organization, Hardwired control,


Module: 4 microprogram control, control of processor unit, micro-program
sequencer, micro-programmed CPU organization, horizontal and vertical
micro instructions,

CONTROL LOGIC DESIGN

The process of logic design is a complex undertaking. The binary information found in a
digital system is stored in processor or memory registers, and it can be either data or control
information.
The logic design of a digital system is a process for deriving the digital circuits that
perform data processing and the digital circuits that provide control signals. The timing for all
registers in a synchronous digital system is controlled by a master-clock generator. The clock
pulses are applied to all flip-flops and registers in the system, including the flip-flops and
registers in the control unit.
Two representations which are helpful in the design of systems that need a control are
timing diagrams and flowcharts. A timing diagram clarifies the timing sequence and other
relationships among the various control signals in the system.A flowchart is a convenient way to
specify the sequence of procedural steps and decision paths for an algorithm.
The design of control logic cannot be separated from the algorithmic development
necessary for solving a design problem. Moreover, the control logic is directly related to the
data-processor part of the system that it controls.

CONTROL ORGANIZATION

Once a control sequence has been established, the sequential system that implements the
control operations must be designed. Since the control is a sequential circuit, it can be designed
by a sequential logic procedure.
Disadvantages of sequential control logic are
 Large number of states
 Excessive number of flip-flops and gates
 Design methods uses state and excitation tables but in practice they are cumbersome
Goal of control logic design should be development of a circuit that implements the
desired control sequence in a logical andstraightforward manner. Designers used specialized
methods for control logic design which is considered as the extension of the classical sequential
logic method combined with register transfer method.
We consider four methods of control organization
KTU - CST202 - Computer Organization and Architecture Module: 4

 One flip-flop per state methods


 Sequence register and decoder method
 PLA control
 Micro-program control
The first two methods result in a circuit that must use SSI and MSI circuits for the
implementation. A control unit implemented with SSI and MSI devices is said to be a hard-wired
control. If any alterations or modifications are needed, the circuits must be rewired to fulfill the
new requirements.
The PLA or micro-program control which uses an LSI device such as a Programmable
Logic Array or a Read-Only Memory. Any alterations or modifications in a micro-program
control can be easily achieved without wiring changes by removing the ROM from its socket and
inserting another ROM programmed to fulfill the new specifications.

One flip-flop per state methods


This method uses one flip-flop per state in the control sequential circuit. Only one flip-
flop is set at any particular time: all others are cleared. A single bit is made to propagate from
one flip-flop to the other under the control of decision logic. In such an array, each flip-flop
represents a state and is activated only when the
control bit is transferred to it.
In this method, maximum numbers of
flip-flops were used. Example: A sequential
circuit with 12 states requires a minimum of four
flip-flops because 23< 12< 24. Control circuit
uses 12 flip-flops, one for each state
The advantage of this method is the
simplicity with which it can be designed. This
type of controller can be designed by inspection
from the state diagram that describes the control
sequence. This also offers other advantages like
savings in design effort, an increase in
operational simplicity, and a potential decrease in
the combinational circuits required to implement
the complete sequential circuit. The disadvantage
is that this method would increase system cost
since more flip-flops are used.
Figure shows the configuration of a four-state sequential control logic that uses four D-
type flip-flops: one flip-flop per state Ti, i = 0, 1, 2, 3.

2
KTU - CST202 - Computer Organization and Architecture Module: 4

At any given time interval between 2 clock pulses, only one flip-flop is equal to 1; all
others are 0. The transition from the present state to next state is a function of the present Ti that
is a 1 and certain input conditions.
The next state is manifested when the previous flip-flop is cleared and a new one is set.
Each of the flip-flops is connected to the data processing section of the digital system to initiate
certain micro-operations. The control outputs are a function of the T's and external inputs. These
outputs may also initiate micro-operations
If control circuit does not need external inputs for its sequencing, the circuit reduces to
straight shift register with a single bit shifted from one position to the next. If control sequence
must repeated over and over again the control reduces to ring counter.
Ring counter is a shift register with the output of last flip-flop connected to the input of
the first flip-flop. In a ring counter single bit continuously shifts from one position to the next in
a circular manner.
For this reason, this method is also called ring counter controller

Sequence Register and Decoder Method


This method uses a register to sequence the control states. The register is decoded to
provide one output for each state. For n flip-flops in the sequence register, the circuit will have
2" states and the decoder will have 2" outputs.
For example, a 4-bit register can be in any one of 16 states. A 4 x 16 decoder will have 16
outputs, one for each state of the register. Both the sequence register and decoder are MS1
devices.

Figure shows the configuration of a four-state sequential control logic. The sequence
register has two flip-flops and the decoder establishes separate outputs for each state in the
register. The transition to the next state in the sequence register is a function of the present state
and the external input conditions.

3
KTU - CST202 - Computer Organization and Architecture Module: 4

If the control circuit does not need external inputs the sequence register reduces to a
counter that continuously sequence through the four states so called counter decoder method

PLA control
The external sequence register establishes the present state of the control circuit. The
PLA outputs determine which micro-operations should be initiated depending on the external
input conditions and the present state of the sequence register. At the same time other PLA
outputs determine the next state of the sequence register.

The sequence register is external to the PLA if the unit implements only combinational
circuits. Some PLAs include not only gates but also flip-flops within the unit. This implements a
sequential circuit by specifying the links that must be connected to the flip-flops in manner that
the gate links are specified

Micro-program Control
The purpose of control unit is to initiate a series of sequential steps of micro-operations.
At any given time certain operations are to be initiated while all others remain idle. The control
variable at any given time can be represented by a string of 1's and 0's called control word. The
control words can be programmed to initiate the various components in the system in an
organized manner.
A control unit whose control variables are stored in a memory called a micro-
programmed control unit. Each control word of memory is called Microinstruction and
Sequence of microinstructions is called Micro-program.
Control memory is usually ROM since an alteration of micro-program is seldom needed.
The use of micro-program involves placing all control variables in words of the ROM for use by
the control unit through successive read operations. The content of the word in the ROM at a
given address specifies the micro-operations for the system.
Dynamic micro-programming permits a micro-program to be loaded initially from the
computer console or from an auxiliary memory such as magnetic disk. Writable control memory

4
KTU - CST202 - Computer Organization and Architecture Module: 4

(WCM) can be used for writing but used mostly for reading. A ROM, PLA or WCM when used
in a control unit is referred as a control memory
Control memory address register specifies the control word read from control memory.
The ROM operates as a combinational circuit with address value as the input and the
corresponding word as the output. The content of the specified word remains on the output wires
as long as the address value remains in the address register.
If the address registers changes while the ROM word is still in use then the word out of
the ROM should be transferred to a buffer register. If the change in address and ROM word can
occur simultaneously no buffer register is needed. The word read from memory represents a
microinstruction. The microinstruction specifies one or more micro-operations for the
components of the system.

Once these operations are executed, the control unit must determine its next address. The
location of the next microinstruction may be next one in the sequence or it may locate
somewhere else in the control memory. Some bits of the microinstruction to control the
generation of the address for the next microinstruction.
The next address may be function of external input conditions. The next address is
computed in the next address generator circuit and then transferred into the control address
register to read the next microinstruction.

HARDWIRED CONTROL

The control hardware can be viewed as a state machine that changes from one state to
another in every clock cycle, depending on the contents of the instruction register, the condition
codes and the external inputs. The outputs of the state machine are the control signals. The
sequence of the operation carried out by this machine is determined by the wiring of the logic
elements and hence named as “hardwired”. Control logic derived in this section is a hardwired
control of the one flip-flop per state method. The design of hardwired control is carried out in 5
consecutive steps
1. The problem is stated
2. An initial equipment configuration is assumed
3. An algorithm is formulated
4. The data processor part is specified
5. The control logic is designed

5
KTU - CST202 - Computer Organization and Architecture Module: 4

Statement of the problem


The problem here is to implement with hardware the addition and subtraction of two
fixed-point binary numbers represented in sign-magnitude form. Complement arithmetic may be
used, provided the final result is in sign-magnitude form.
The addition of two numbers stored in registers of finite length may result in a sum that
exceeds the storage capacity of the register by one bit. The extra bit is said to cause an overflow.
The circuit must provide a flip-flop for storing a possible overflow bit.

Equipment Configuration
The two signed binary numbers to be added or subtracted contain n bits. The magnitudes
of the numbers contain k = n - 1 bits and are stored in registers A and B. The sign bits are stored
in flip-flops As and Bs.

Figure shows the registers and associated equipment. The ALU performs the arithmetic
operations and the 1-bit register E serves as the overflow flip-flop. The output carry from the
ALL) is transferred to E.
It is assumed that the two numbers and their signs have been transferred to their
respective registers and that the result of the operation is to be available in registers A and As.
Two input signals in the control specify the add (qa) and subtract (qs) operations. Output variable
x indicates the end of the operation.
The control logic communicates with the outside environment through the input and
output variables. Control recognizes input signal qa or qs and provides the required operation.
Upon completion of the operation, control informs the external environment with output x that
the sum or difference is in registers A and A, and that the overflow bit is in E.

Derivation of the Algorithm


Designate the magnitude of the two numbers by A and B. When the numbers are added
or subtracted algebraically, there are eight different conditions to consider, depending on the sign
of the numbers and the operation performed. The eight conditions may be expressed in a
compact form as follows:

6
KTU - CST202 - Computer Organization and Architecture Module: 4

(±𝐴) ± (±𝐵)
If the arithmetic operation specified is subtraction, we change the sign of B and add. This
is evident from the relations:
(±𝐴) − (+𝐵) = (±𝐴) + (−𝐵)
(± 𝐴) − (−𝐵) = (±𝐴) + (+𝐵)
This reduces the number of possible conditions to four, namely:
(±𝐴) + (±𝐵)
The four possible combination are
When the signs of A and B are the same: (+𝐴) + (+ 𝐵) = +(𝐴 + 𝐵)
(−𝐴) + (− 𝐵) = −(𝐴 + 𝐵)
When the signs of A and B are not the same
(+ 𝐴) + (− 𝐵) = +(𝐴 − 𝐵) [ if(A>B) ]
−(𝐵 − 𝐴) [ if(B>A) ]
(−𝐴) + (+𝐵) = −(𝐴 − 𝐵) [ if(A>B) ]
+(𝐵 − 𝐴) [ if(B>A) ]
The flowchart shows how we can
implement sign-magnitude addition and
subtraction with the equipment as shown in
previous diagram.
For more details about flowchart,
refer text book “Digital Logic and Computer
Design” by M. Morris Mano; Page 418.

7
KTU - CST202 - Computer Organization and Architecture Module: 4

Data Processor Specification


Figure (a) shows the data-processor with the required control variables. This ALU has
four selection variables, as shown in the diagram. The variable L loads the output of the ALU
into register A and also the output carry into E. Variables y, z. and w complement B, and A, and
clear E, respectively.

Figure a. Data Processor Register and ALU

The block diagram of the control logic is shown in figure (b). The control receives five
inputs: two from the external environment and three from the data-processor. To simplify the
design, we define a new variable
S = As⊕ Bs
This variable gives the result of the comparison between the two sign bits. The exclusive-
OR operation is equal to 1 if the two signs are not the same, and it is equal to 0 if the signs are
both positive or both negative. The control provides an output x for the external circuit. It also
selects the operations in the ALU through the four selection variables S2, S1, S0, and Cin. The
other four outputs go to registers in the data-processor as specified in the diagram.

Figure b. Control Block Diagram

8
KTU - CST202 - Computer Organization and Architecture Module: 4

Control State Diagram


The design of a hard-wired control is a sequential-logic problem. As such, it may be
convenient to formulate the state diagram of the sequential control. The function boxes in a
flowchart may be considered as states of the sequential circuit, and the decision boxes as next-
state conditions.
The micro-operations that must be executed at a given state are specified within the
function box. The conditions for the next state transition are specified inside the decision box or
in the directed lines between two function boxes. Consequently, different designers may produce
different state diagrams for the same flowchart, and each may be a correct representation of the
system. The control state diagram and the corresponding register-transfer operations are derived
in below figure.
We start by assigning an initial state, To, to the sequential controller. We then determine
the transition to other states T1, T2, T3, and so on. For each state, we determine the micro-
operations that must be initiated by the control circuit. The figure below shows the sequence of
register transfers. [Refer flowchart]

[Refer ALU Table in page 23 of Module 2]

9
KTU - CST202 - Computer Organization and Architecture Module: 4

Design of Hardwired Control


The control can be designed using the classical sequential-logic procedure. This
procedure requires a state table with eight states, four inputs, and nine outputs. The sequential
circuit to be derived from such a state table will not be easy to obtain because of the large
number of variables.
The circuit obtained by using this method may have a minimum number of gates, but it
will have an irregular pattern and will be difficult to analyze if a malfunction occurs. These
difficulties are removed if the control is designed by the one flip-flop per state method.
A control organization that uses one flip-flop per state has the convenient characteristic
that the circuit can be derived directly from the state diagram by inspection. No state or
excitation tables are needed if D flip-flops are employed.
Remember that the next state of a D flip-flop is a function of the D input and is
independent of the present state. Since the method requires one flip-flop for each state, we
choose eight D flip-flops and label their outputs T0, T1, T2, . . , T7. The condition for setting a
given flip-flop is specified in the state diagram.
For example, flip-flop T2 is set with the next clock pulse if T1= I or if T0 = 1 and qa = 1.
This condition can be defined with the Boolean function:
DT2 = qaT0 + T1
where DT2 designates the D input of flip-flop T2.If there is more than one directed line
going into a state, all conditions must be ORed. Using this procedure for the other flip-flops, we
obtain the input functions given in table below. [Refer flowchart and the above table]

Initially, flip-flop T0 is set and all others are cleared. At any given time, only one D input
is in the 1 state while all others are maintained at 0. The next clock pulse sets the flip-flop whose
D input is 1 and clears all others.
The circuit for the control logic is not drawn but can be easily obtained from the Boolean
functions in table. The circuit can be constructed with eight D flip-flops, seven AND gates, six
OR gates, and four inverters. Note that five control outputs are taken directly from the flip-flop
outputs.

10
KTU - CST202 - Computer Organization and Architecture Module: 4

MICROPROGRAM CONTROL

In a microprogrammed control the control variables that initiate micro operations are
stored in memory. The control memory is usually a ROM. since the control sequence is
permanent and needs no alteration. The control variable stored in memory are read one at a
time to initiate the sequence of micro operations for the system.

The words stored in a control memory are micro instructions and each microinstruction
specifies one or more microoperations for the components in the system. Once these
microoperations are executed the control unit must determine its next address. Therefore, a
few bits of micro instruction are used to control the generation of the address for the next
microinstruction. Thus, a micro instruction contains bits for initiating microoperations and
bits that determine the next address for the control memory itself. In addition to the control
memory, a microprogram control unit must include special circuits for selecting the next
address as specified by the micro instruction.

The digital system considered here is too small for a microprogram controller and in practice
hardwired control would be more efficient. The micro program control organisation is more
efficient in large and complicated systems.

A state in control memory is represented by the address of a micro instruction. An address


for control memory specifies a control word within a microinstruction, just as a state in a
sequential circuit specifies a microoperation. Since there are 8 states in the control, we
choose a control memory with 8 words having addresses 0 through 7. The address of the
control memory corresponds to subscript number under the T’s in the state diagram

Inspection of the state diagram reveals that the address sequencing in the microprogram
control must have the following capabilities
1. Provision for loading an external address as a result of the occurrence of external signals
qa and qs.
2. Provision for sequencing consecutive addresses
3. Provision for choosing between two addresses as a function of present values of the status
variables S and E.

Hardware Configuration

The organisation of the microprogram control is shown in figure.


 The control memory is an 8 word by 14 bit ROM.
 The first 9 bits of a microinstruction word contain the control variables that initiate the
micro-operations.
 The last five bits provide information for selecting the next address.
 The control address register CAR holds the adddress for the control memory.
 The register receives an input value when its load control is enabled otherwise it is
incremented by 1.
 Bits 10 11 and 12 of a micro instruction contain an address for CAR.
 Bits 13 and 14 select an input for a multiplexer.

11
KTU - CST202 - Computer Organization and Architecture Module: 4

Bit 1 provides the initial state condition denoted by variable x and also enables an external
address when qs and qa are equal to 1.
When x = 1, the address field of the micro instruction must be 000.
Then if qs =1, address 001 is available at the input of CAR,
but if qa=1, address 010 is applied to CAR.
If both qs and qa are 0, the zero address from bits 10 11 and 12 are applied to the input of
CAR. In this way control memory stays at address 0 until an external variable is enabled.

The multiplexer has 4 inputs that are selected with 13 and 14 of the micro instruction. The
functions of the multiplexer select bits are tabulated.

12
KTU - CST202 - Computer Organization and Architecture Module: 4

 If bits 13 and 14 are 00, a multiplexer input that is equal to 0 is selected. The output of
the multiplexer is 0 and increment input to CAR is enabled. This configuration
increments CAR to choose the next address in the sequence.
 An input of 1 is selected by the multiplexer when bits 13 and 14 are equal to 01. The
output of the multiplexer is 1 and external input is loaded onto the CAR.
 Status variable S is selected when bits 13 and 14 or equal to 10. If S=1, the output of the
multiplexer is 1 and address bits of the micro instructions are loaded into CAR ( provided
x=0). If S=0, the output of a multiplexer is 0 and CAR is incremented.
 With bits 13 and 14 equal to 11, status variable E is selected and the address field is
loaded into CAR. If E=1, but CAR is incremented if equal to 0.
 Thus, the multiplexer allows the control to choose between two addresses depending on
the value of the status with selected.

Microprogram

Once the configuration of a microprogram control unit is established, the designer’s task is to
generate the microcode for the control memory. This code generation is called
microprogramming and is a process that determines the bit configuration for each and all
words in control memory. Let’s derive the micro program for the adder subtractor example.

The control memory has 8 word and each word contain 14 bits. To microprogram the control
memory, we must determine the bit values of each of the eight words. The register transfer
method can be adopted for developing a micro program. The micro-operation sequence can
be specified with the register transfer statements.

Instead of a control function, we specify an address with each register transfer statement. The
address associated with each symbolic statement corresponds to the address where the micro
instruction is to be stored in memory. The sequencing from one address to the next can be
indicated by means of conditional control statements. This type of statement can specify the
address to which control goes depending on status conditions. Once the symbolic micro
program is established, it is possible to translate the register transfer statements to their
equivalent binary form.

Microprogram in symbolic form is given in the table. [Refer previous flowchart]

13
KTU - CST202 - Computer Organization and Architecture Module: 4

The symbolic designation is a convenient method for developing the microprogram in a way
that people can read and understand. But this is not the way that the microprogram is stored
in control memory. The symbolic microprogram must be translated to binary which was this
is the form that goes into memory. The translation is done by dividing the bits of each
microinstruction into their functional parts called fields. Here we have three functional parts.

 Bits 1 through 9 specify the control word for initiating the micro operations.
 Bits 10 through 12 specify address field
 Bits 13 and 14 select multiplexer input.

For each microinstruction listed in symbolic form we must choose the appropriate bits in the
corresponding microinstruction fields. The binary form of the micro program is given in
table. [Refer MUX table, symbolic form table]

The address for the ROM control memory are listed in binary. The content of each word of
ROM is also given in binary. The first nine bits in each ROM word give the control word
that initiates the specified microoperations. These bit values are taken directly from figure
10.9 (b). The last five bits in each row in word are derived from the conditional control
statements in the same symbolic program.

 At address 000, we have 0 one for the select field. This allows an external address to be
loaded into CAR if qs or qa is equal to 1. Otherwise address 000 is transferred to CAR.
 In address 001, the microinstruction select field is 01 and address field is 010. From table
in figure 10.10, we find that the clock pulse that indicates the microoperation Bs <- Bs
(dash) also transfer the address field into CAR.
 The next microinstruction out of ROM will be the one stored in address 010. The select
field at address 001 could have been chosen to be 00. This would have caused CAR to
increment and go to address 010.

 Inspection of the select field in which 13 and 14 shows that when these two bits are equal
to 01, the address field is next address.
 When these two bits are 10, status variable S is selected and when they are 11, status
variable E is selected.

14
KTU - CST202 - Computer Organization and Architecture Module: 4

 In the last two cases the next address is the one specified in the address field if the
selected status bit=1. If the selected status bit=0, the next address is one next in sequence
because CAR is incremented.

Advantages of Micro programmed control


 It simplifies the design of control unit. Thus it is both, cheaper and less error prone
implement.
 Control functions are implemented in software rather than hardware.
 The design process is orderly and systematic
 More flexible, can be changed to accommodate new system specifications or to correct
thedesign errors quickly and cheaply.
 Complex function such as floating point arithmetic can be realized efficiently.
Disadvantages
 A micro programmed control unit is somewhat slower than the hardwired control unit,
because time is required to access the microinstructions from CM
 The flexibility is achieved at some extra hardware cost due to the control memory and
itsaccess circuitry.
Microinstructions
The words stored in a control memory are microinstructions, and each microinstruction
specifies one or more micro-operations for the components in the system.
Example: Control Sequence for instruction Add (R3), R1:
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, Wait for the MFC
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, Wait for MFC
6. MDRout, Select Y, Add, Zin
7. Zout, R1in, End
The microinstruction for the above control sequence can be expressed as follows.

15
KTU - CST202 - Computer Organization and Architecture Module: 4

Once these micro-operations are executed, the control unit must determine its next
address.Therefore, a few bits of the microinstruction are used to control the generation of the
address for the next microinstruction. Thus, a microinstruction contains bits for initiating micro-
operations and bits that determine the next address for the control memory itself.
Techniques of grouping of Control Signals
The grouping of control signal can be done either by using technique called
o Vertical organization,
o Horizontal organization.
Horizontal Micro-Instructions
The scheme of micro-instruction by assigning one bit position to each control signal is
called horizontal micro-instructions.
Example: 011101001101001110
In a horizontal microinstruction every bit in the control field attaches to a
controller.Horizontal microinstructions represent several micro-operations that are executed at
the same time. However, in extreme cases, each horizontal microinstruction controls all the
hardware resources of the system.
Vertical Micro-Instructions
We can reduce the length of the horizontal micro-instruction so easily by implementing
another method known as vertical micro-instructions. In this case, Most signals are not needed
simultaneously and many others are mutually exclusive
Example:

16
KTU - CST202 - Computer Organization and Architecture Module: 4

In a vertical microinstruction, a code is used for each action to be performed andthe


decoder translates this code into individual control signals. The vertical microinstruction
resembles the conventional machine language format comprising one operation and a few
operands. As opposed to horizontal microinstructions, the vertical microinstruction represents
single micro-operations.
Advantage
 Fewer bits are required in the microinstruction.
Disadvantage
 Vertical approach results in slower operations speed.
Comparison between Horizontal and Vertical Organization
Horizontal Vertical
Long formats. Short formats
Ability to express a high degree of Limited ability to express parallel
parallelism. micro operations.
Little encoding of the control Considerable encoding of the
information. control information
Useful when higher operating
Slower operating speeds.
speed is desired.

Micro-program Sequencer

Micro program sequencer is a control unit which does the tasks of Micro-program
sequencing.There are two important factors must be considered while designing the micro
programsequencer.
o The size of the microinstruction
o The address generation time

17
KTU - CST202 - Computer Organization and Architecture Module: 4

Micro-program sequencer is attached to the control memory.It inspects certain bits in the
microinstruction to determine the next address for controlmemory.A typical sequencer has the
following address sequencing capabilities.
1. Increments the present address of control memory
2. Branches to an address which will be specified in the bits of microinstruction
3. Branches to a given address if a specified status bit is equal to 1.
4. Transfers control to a new address as specified by an external source
5. Has a facility for subroutines calls and returns.
The block diagram of Micro-program Sequencer is shown in the below figure.It consists
of a multiplexer that selects an address from four sources and routes it into a controladdress
register (CAR).
The output from CAR
provides the address for control
memory.The contents of CAR
are incremented and applied to
the multiplexer and to the stack
registerfile.
The register selected in
the stack is determined by stack
pointer. Inputs (I0-I2) specify
theoperation for the sequencer
and input T is the test point for a
status bit.Initially the address
register is cleared to zero and clock pulse synchronizes the loading intoregisters.
For more details about
this, refer text book
“Digital Logic and
Computer Design” by
M. Morris Mano; Page
446 - 449

Micro-programmed CPU Organization

Digital computer consists of: Central Processing Unit(CPU), Memory unit and Input-
output devices. CPU can be divides into 2 distinct and interactive sections namely processing
section: useful device for constructing the processor section of a CPU and control section:
controlling the entire units of computer, micro-program sequencer: constructing a micro-program
control of CPU

18
KTU - CST202 - Computer Organization and Architecture Module: 4

Micro-programmed computer
A computer CPU uses the micro-program sequencer. Micro-programmed computer
consists of
1. Memory unit: Stores instructions and data supplied by the user through an input device
2. Two processor unit: Data processor: Manipulates data&Address processor: Manipulates
the address information received frommemory
3. A Micro-program sequencer
4. A control memory
5. Other digital functions
An instruction extracted from memory unit during fetch cycle goes into
instructionregister.Code transformation constitutes a mapping function that is needed to convert
theoperation code bits of an instruction into a starting address for the control memory andis
implemented with ROM or PLA.
Mapping concept provides flexibility for adding instructions or micro-operations
forcontrol memory as need arises.The address generated in code transformation mapping
function is applied to theexternal address (EXA) input of the sequencing.
Micro-program control unit consists of
1. The sequencer: Generates next address
2. A control memory: Reads the next microinstruction while present microinstruction are
being executed in the other units of the CPU and for storing microinstructions
3. A multiplexer: Selects one of the many status bits and applies to the T(test) input of
thesequencer. One of the input of the multiplexer is always I to provide an
unconditionalbranch operation
4. A pipeline register: speed up the control operation, Allows next address to be generated
and the output of control memory tochange while the control word in pipeline register
initiates themicro-operations given by present microinstruction, It’s not always necessary.
The output of control memory can go directly to the control inputs of thevarious units in
the CPU

19
KTU - CST202 - Computer Organization and Architecture Module: 4

Micro-programmed Computer Organization

Microinstruction format contains six fields: First 3 fields(I,SL, BRA) provide information
to the sequencer to determine the next address for control memory. [I field (3 bits): Supplies
input information for the sequencer, SL field: Selects a status bit for the multiplexer, BRA field:
Address field of microinstruction and supplies a branch address (BRA) to the sequencer.]
The next 3 fields (MC, PS, DF) are for controlling micro-operations in the processorand
the memory units[Memory control (MC) field: Controls the address processor and the readand
write operations in the memory unit. The processor select(PS) field: Controls the operations in
the dataprocessor unit. The data field(DF): Used to introduce constants into the processor]
Output from data field may be used to set up control registers and introduce data
inprocessor registers.

20
KTU - CST202 [Computer Organization and Architecture] Module: 5

I/O ORGANIZATION – Accessing I/O devices – interrupts - interrupt


hardware – Direct Memory Access.
Module: 5 THE MEMORY SYSTEM - Basic concepts – Semiconductor RAMs – Memory system
considerations – Semiconductor ROMs –Content Addressable memory – Cache
Memory -Mapping Functions.
ACCESSING I/O DEVICES

A simple arrangement to connect I/O devices to a computer is to use a single bus


structure. It consists of three sets of lines to carry
❖ Address
❖ Data
❖ Control Signals.
When the processor places a particular address on address lines, the devices that
recognize this address responds to the command issued on the control lines.
The processor request either a read or write operation and the requested data are
transferred over the data lines.
When I/O devices & memory share the same address space, the arrangement is called
memory mapped I/O.

Single Bus Structure

Processor Memory

Bus
I/O device 1 I/O device n
..…
Eg:-

Move DATAIN, Ro Reads the data from DATAIN then into processor register R o.
Move Ro, DATAOUT Send the contents of register Ro to location DATAOUT.
DATAIN Input buffer associated with keyboard.
DATAOUT Output data buffer of a display unit / printer.

Fig: I/O Interface for an Input Device


Address line

Data line
Control line

Address Control Data & status I/O interface


decoder circuit register
s

Input device.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Address Decoder:

It enables the device to recognize its address when the address appears on address
lines.

Data register It holds the data being transferred to or from the processor.
Status register It contains infn/. Relevant to the operation of the I/O devices.

The address decoder, data & status registers and the control circuitry required to
co-ordinate I/O transfers constitute the device‟s I/F circuit.
For an input device, SIN status flag in used SIN = 1, when a character is entered
at the keyboard.
For an output device, SOUT status flag is used SIN = 0, once the char is read by
processor.

Eg

DIR Q Interrupt Request for display.


KIR Q Interrupt Request for keyboard.
KEN keyboard enable.
DEN Display Enable.
SIN, SOUT status flags.

The data from the keyboard are made available in the DATAIN register & the data sent to
the display are stored in DATAOUT register.

Program:
KTU - CST202 [Computer Organization and Architecture] Module: 5

WAIT K Move # Line, Ro


Test Bit #0, STATUS
Branch = 0 WAIT K
Move DATAIN, R1
WAIT D Test Bit #1, STATUS
Branch = 0 WAIT D
Move R1, DATAOUT
Move R1, (Ro)+
Compare #OD, R1
Branch = 0 WAIT K
Move #DOA, DATAOUT
Call PROCESS
EXPLANATION:
This program, reads a line of characters from the keyboard & stores it in a
memory buffer starting at locations LINE.
Then it calls the subroutine “PROCESS” to process the input line.
As each character is read, it is echoed back to the display.
Register Ro is used as a updated using Auto – increment mode so that successive
characters are stored in successive memory location.
Each character is checked to see if there is carriage return (CR), char, which has
the ASCII code 0D(hex).
If it is, a line feed character (on) is sent to more the cursor one line down on the
display & subroutine PROCESS is called. Otherwise, the program loops back to
wait for another character from the keyboard.

PROGRAM CONTROLLED I/O

Here the processor repeatedly checks a status flag to achieve the required
synchronization between Processor & I/O device.(ie) the processor polls the device.

There are 2 mechanisms to handle I/o operations. They are,


Interrupt, -
DMA (Synchronization is achieved by having I/O device send special overthe
bus where is ready for data transfer operation)

DMA:

Synchronization is achieved by having I/O device send special over the bus where
is ready for data transfer operation)
It is a technique used for high speed I/O device.
Here, the input device transfer data directly to or from the memory without
continuous involvement by the processor.
KTU - CST202 [Computer Organization and Architecture] Module: 5

INTERRUPTS

When a program enters a wait loop, it will repeatedly check the device status.
During this period, the processor will not perform any function.
The Interrupt request line will send a hardware signal called the interrupt signal to
the processor.
On receiving this signal, the processor will perform the useful function during the
waiting period.
The routine executed in response to an interrupt request is called Interrupt
Service Routine.
The interrupt resembles the subroutine calls.

Fig:Transfer of control through the use of interrupts

The processor first completes the execution of instruction i Then it loads the
PC(Program Counter) with the address of the first instruction of the ISR.
After the execution of ISR, the processor has to come back to instruction i + 1.
Therefore, when an interrupt occurs, the current contents of PC which point to i
+1 is put in temporary storage in a known location.
A return from interrupt instruction at the end of ISR reloads the PC from that
temporary storage location, causing the execution to resume at instruction i+1.
When the processor is handling the interrupts, it must inform the device that its
request has been recognized so that it remove its interrupt requests signal.
This may be accomplished by a special control signal called the interrupt
acknowledge signal.
The task of saving and restoring the information can be done automatically by the
processor.
The processor saves only the contents of program counter & status register (ie)
it saves only the minimal amount of information to maintain the integrity of the
program execution.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Saving registers also increases the delay between the time an interrupt request is
received and the start of the execution of the ISR. This delay is called the
Interrupt Latency.
Generally, the long interrupt latency in unacceptable.
The concept of interrupts is used in Operating System and in Control Applications,
where processing of certain routines must be accurately timed relative to external
events. This application is also called as real-time processing.

Interrupt Hardware:

Fig:An equivalent circuit for an open drain bus used to implement a common
interrupt request line

A single interrupt request line may be used to serve „n‟ devices. All devices are
connected to the line via switches to ground.
To request an interrupt, a device closes its associated switch, the voltage on INTR
line drops to 0(zero).
If all the interrupt request signals (INTR1 to INTRn) are inactive, all switches are
open and the voltage on INTR line is equal to Vdd.
When a device requests an interrupts, the value of INTR is the logical OR of the
requests from individual devices.

(ie) INTR = INTR1+… ........... +INTRn

INTR It is used to name the INTR signal on common line it is active in the low
voltage state.

Open collector (bipolar ckt) or Open drain (MOS circuits) is used to drive INTR
line.
The Output of the Open collector (or) Open drain control is equal to a switch to
the ground that is open when gates input is in „0‟ state and closed when the gates
input is in „1‟ state.
Resistor „R‟ is called a pull-up resistor because it pulls the line voltage upto the
high voltage state when the switches are open.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Enabling and Disabling Interrupts:

The arrival of an interrupt request from an external device causes the processor to
suspend the execution of one program & start the execution of another because the
interrupt may alter the sequence of events to be executed.
INTR is active during the execution of Interrupt Service Routine.
There are 3 mechanisms to solve the problem of infinite loop which occurs due to
successive interruptions of active INTR signals.
The following are the typical scenario.

The device raises an interrupt request.


The processor interrupts the program currently being executed.
Interrupts are disabled by changing the control bits is PS (Processor Status
register)
The device is informed that its request has been recognized & in response, it
deactivates the INTR signal.
The actions are enabled & execution of the interrupted program is resumed.
Edge-triggered:

The processor has a special interrupt request line for which the interrupt handling
circuit responds only to the leading edge of the signal. Such a line said to be edge-
triggered.

Handling Multiple Devices:

When several devices requests interrupt at the same time, it raises some questions.
They are.

➢ How can the processor recognize the device requesting an interrupt?


➢ Given that the different devices are likely to require different ISR, how can
the processor obtain the starting address of the appropriate routines ineach
case?
➢ Should a device be allowed to interrupt the processor while another
interrupt is being serviced?
➢ How should two or more simultaneous interrupt requests be handled?

Polling Scheme:

If two devices have activated the interrupt request line, the ISR for the selected
device (first device) will be completed & then the second request can be serviced.
The simplest way to identify the interrupting device is to have the ISR polls all the
encountered with the IRQ bit set is the device to be serviced
IRQ (Interrupt Request) -> when a device raises an interrupt requests, the status
register IRQ is set to 1.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Advantage: It is easy to implement.


Disadvantages:The time spent for interrogating the IRQ bits of all the devices that may not
berequesting any service.

Vectored Interrupt:

Here the device requesting an interrupt may identify itself to the processor by
sending a special code over the bus & then the processor start executing the ISR.
The code supplied by the processor indicates the starting address of the ISR for
the device.
The code length ranges from 4 to 8 bits.
The location pointed to by the interrupting device is used to store the staring
address to ISR.
The processor reads this address, called the interrupt vector & loads into PC. The
interrupt vector also includes a new value for the Processor Status Register.When
the processor is ready to receive the interrupt vector code, it activate the interrupt
acknowledge (INTA) line.

Interrupt Nesting:
Multiple Priority Scheme:

In multiple level priority scheme, we assign a priority level to the processor that
can be changed under program control.
The priority level of the processor is the priority of the program that is currently
being executed.
The processor accepts interrupts only from devices that have priorities higher than
its own.
At the time the execution of an ISR for some device is started, the priority of the
processor is raised to that of the device.
The action disables interrupts from devices at the same level of priority or lower.

Privileged Instruction:

The processor priority is usually encoded in a few bits of the Processor Status
word. It can also be changed by program instruction & then it is write into PS.
These instructions are called privileged instruction. This can be executed only
when the processor is in supervisor mode.
The processor is in supervisor mode only when executing OS routines.
It switches to the user mode before beginning to execute application program.

Privileged Exception:

User program cannot accidently or intentionally change the priority of the


processor & disrupts the system operation.
KTU - CST202 [Computer Organization and Architecture] Module: 5

An attempt to execute a privileged instruction while in user mode, leads to a


special type of interrupt called the privileged exception.

Fig: Implementation of Interrupt Priority using individual Interrupt request


acknowledge lines

Each of the interrupt request line is assigned a different priority level.


Interrupt request received over these lines are sent to a priority arbitration circuit
in the processor.
A request is accepted only if it has a higher priority level than that currently
assigned to the processor,

Simultaneous Requests:
Daisy Chain:

The interrupt request line INTR is common to all devices. The interrupt
acknowledge line INTA is connected in a daisy chain fashion such that INTA
signal propagates serially through the devices.
When several devices raise an interrupt request, the INTR is activated & the
processor responds by setting INTA line to 1. this signal is received by device.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Device1 passes the signal on to device2 only if it does not require any service.
If devices1 has a pending request for interrupt blocks that INTA signal &
proceeds to put its identification code on the data lines.
Therefore, the device that is electrically closest to the processor has the highest
priority.

Merits:
It requires fewer wires than the individual connections.

Arrangement of Priority Groups:

Here the devices are organized in groups & each group is connected at a different
priority level.
Within a group, devices are connected in a daisy chain.

Controlling Device Requests:

KEN Keyboard Interrupt Enable


DEN Display Interrupt Enable
KIRQ / DIRQ Keyboard / Display unit requesting an interrupt.

There are two mechanism for controlling interrupt requests.


At the devices end, an interrupt enable bit in a control register determines whether
the device is allowed to generate an interrupt requests.
At the processor end, either an interrupt enable bit in the PS (Processor Status) or
a priority structure determines whether a given interrupt requests will be accepted.

Initiating the Interrupt Process:

Load the starting address of ISR in location INTVEC (vectored interrupt).


Load the address LINE in a memory location PNTR. The ISR will use this
location as a pointer to store the i/p characters in the memory.
Enable the keyboard interrupts by setting bit 2 in register CONTROL to 1.
Enable interrupts in the processor by setting to 1, the IE bit in the processor status
register PS.

Exception of ISR:

Read the input characters from the keyboard input data register. This will cause
the interface circuits to remove its interrupt requests.
Store the characters in a memory location pointed to by PNTR & increment
PNTR.
When the end of line is reached, disable keyboard interrupt & inform program
main.
Return from interrupt.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Exceptions:

An interrupt is an event that causes the execution of one program to be suspended


and the execution of another program to begin.
The Exception is used to refer to any event that causes an interruption.

Kinds of exception:

❖ Recovery from errors


❖ Debugging
❖ Privileged Exception

Recovery From Errors:


Computers have error-checking code in Main Memory , which allows detection of
errors in the stored data.
If an error occurs, the control hardware detects it informs the processor by raising
an interrupt.
The processor also interrupts the program, if it detects an error or an unusual
condition while executing the instance (ie) it suspends the program being
executed and starts an execution service routine.
This routine takes appropriate action to recover from the error.

Debugging:

System software has a program called debugger, which helps to find errors in a
program.
The debugger uses exceptions to provide two important facilities
They are
❖ Trace
❖ Breakpoint

Trace Mode:

When processor is in trace mode , an exception occurs after execution of every


instance using the debugging program as the exception service routine.
The debugging program examine the contents of registers, memory location etc.
On return from the debugging program the next instance in the program being
debugged is executed
The trace exception is disabled during the execution of the debugging program.

Break point:

Here the program being debugged is interrupted only at specific points selected by
the user.
KTU - CST202 [Computer Organization and Architecture] Module: 5

An instance called the Trap (or) software interrupt is usually provided for this
purpose.
While debugging the user may interrupt the program execution after instance „I‟
When the program is executed and reaches that point it examine the memory and
register contents.

Privileged Exception:

To protect the OS of a computer from being corrupted by user program certain


instance can be executed only when the processor is in supervisor mode. These
are called privileged exceptions.
When the processor is in user mode, it will not execute instance (ie) when the
processor is in supervisor mode , it will execute instance.

DIRECT MEMORY ACCESS

A special control unit may be provided to allow the transfer of large block of data
at high speed directly between the external device and main memory , without
continous intervention by the processor. This approach is called DMA.
DMA transfers are performed by a control circuit called the DMA Controller.
To initiate the transfer of a block of words , the processor sends,

➢ Starting address
➢ Number of words in the block
➢ Direction of transfer.
When a block of data is transferred , the DMA controller increment the memory
address for successive words and keep track of number of words and it also informs
the processor by raising an interrupt signal.
While DMA control is taking place, the program requested the transfer cannotcontinue
and the processor can be used to execute another program.
After DMA transfer is completed, the processor returns to the program that requested
the transfer.
Fig:Registes in a DMA Interface

31 30 1 0
Status &
Control Flag

IRQ Done

IE

Starting Address

Word Count
KTU - CST202 [Computer Organization and Architecture] Module: 5

R/W Determines the direction of transfer .


When
R/W =1, DMA controller read data from memory to I/O device.
R/W =0, DMA controller perform write operation.
Done Flag=1, the controller has completed transferring a block of data and is
ready to receive another command.
IE=1, it causes the controller to raise an interrupt (interrupt Enabled) after it has
completed transferring the block of data.
IRQ=1, it indicates that the controller has requested an interrupt.

Fig: Use of DMA controllers in a computer system

A DMA controller connects a high speed network to the computer bus . The disk
controller two disks, also has DMA capability and it provides two DMA channels.
To start a DMA transfer of a block of data from main memory to one of the disks,
the program write s the address and the word count inf. Into the registers of the
corresponding channel of the disk controller.
When DMA transfer is completed, it will be recorded in status and controlregisters
of the DMA channel (ie) Done bit=IRQ=IE=1.

Cycle Stealing:

Requests by DMA devices for using the bus are having higher priority than
processor requests .
Top priority is given to high speed peripherals such as ,
➢ Disk
➢ High speed Network Interface and Graphics display device.

Since the processor originates most memory access cycles, the DMA controller
can be said to steal the memory cycles from the processor.
This interviewing technique is called Cycle stealing.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Burst Mode:
The DMA controller may be given exclusive access to the main memory to
transfer a block of data without interruption. This is known as Burst/Block Mode

Bus Master:
The device that is allowed to initiate data transfers on the bus at any given time is
called the bus master.

Bus Arbitration:

It is the process by which the next device to become the bus master is selected and
the bus mastership is transferred to it.
Types:
There are 2 approaches to bus arbitration. They are,

➢ Centralized arbitration ( A single bus arbiter performs arbitration)


➢ Distributed arbitration (all devices participate in the selection of next bus
master).

Centralized Arbitration:

Here the processor is the bus master and it may grants bus mastership to one of its
DMA controller.
A DMA controller indicates that it needs to become the bus master by activating
the Bus Request line (BR) which is an open drain line.
The signal on BR is the logical OR of the bus request from all devices connected
to it.
When BR is activated the processor activates the Bus Grant Signal (BGI) and
indicated the DMA controller that they may use the bus when it becomes free.
This signal is connected to all devices using a daisy chain arrangement.
If DMA requests the bus, it blocks the propagation of Grant Signal to other devices
and it indicates to all devices that it is using the bus by activating open collector
line, Bus Busy (BBSY).

Fig:A simple arrangement for bus arbitration using a daisy chain


KTU - CST202 [Computer Organization and Architecture] Module: 5

Fig: Sequence of signals during transfer of bus mastership for the devices

The timing diagram shows the sequence of events for the devices connected to the
processor is shown.
DMA controller 2 requests and acquires bus mastership and later releases the bus.
During its tenture as bus master, it may perform one or more data transfer.
After it releases the bus, the processor resources bus mastership

Distributed Arbitration:
It means that all devices waiting to use the bus have equal responsibility in carrying out
the arbitration process.

Fig:A distributed arbitration scheme

Each device on the bus is assigned a 4 bit id.


When one or more devices request the bus, they assert the Start-Arbitration signal
& place their 4 bit ID number on four open collector lines, ARB0 to ARB3.
A winner is selected as a result of the interaction among the signals transmittedover
these lines.
The net outcome is that the code on the four lines represents the request that hasthe
KTU - CST202 [Computer Organization and Architecture] Module: 5

highest ID number.
The drivers are of open collector type. Hence, if the i/p to one driver is equal to 1,the i/p
to another driver connected to the same bus line is equal to „0‟(ie. bus the is in low-
voltage state).
Example
Assume two devices A & B have their ID 5 (0101), 6(0110) and their code is 0111.
Each devices compares the pattern on the arbitration line to its own ID startingfrom
MSB.
If it detects a difference at any bit position, it disables the drivers at that bitposition.
It does this by placing „0‟ at the i/p of these drivers.
In our eg. „A‟ detects a difference in line ARB1, hence it disables the drivers onlines
ARB1 & ARB0.
This causes the pattern on the arbitration line to change to 0110 which means that
„B‟ has won the contention.
KTU - CST202 [Computer Organization and Architecture] Module: 5

MEMORY SYSTEM - INTRODUCTION


Programs and data they operate on are resided in the memory of the computer. The execution
speed of the program is dependent on how fast the transfer of data and instructions in-between memory
and processor. There are three major types of memory available: Cache, Primary and Secondary
Memories.
A good memory would be fast, large and inexpensive. Unfortunately, it is impossible to meet all
three of these requirements simultaneously. Increased speed and size are achieved at increased cost.
BASIC CONCEPTS:
A memory unit is considered as a collection of cells, in which each cell is capable of storing a
bit of information. It stores information in group of bits called byte or word.The maximum size of the
memory that can be used in any computer is determined by the addressing scheme.

Word length is the number of bits that can be transferred to or from the memory, it can be
determined from the width of data bus, if the data bus has a width of n bits, it means word length of that
computer system is n bits.
Memory access time: time elapses between the
initiation of an operation and the completion of that
operation.
Memory cycle time: minimum time delay
required between the initiations of two successive
memory locations.
Compared to processor, the main memory unit
is very slow. So in order to transfer something
between memory and processor takes a long time. The
processor has to wait a lot. To avoid this speed gap
between memory and processor a new memory called cache memory is placed in between main
memory and processor.
In the memory hierarchy, speed will decrease and size will increase from top to bottom level.An
important design issue is to provide a computer system with as large and fast a memory as possible,
within a given cost target.
Random Access Memory (RAM)is a memory system in which any location can be accessed for
a Read or Write operation in some fixed amount of time that is independent of the location’s address.
Several techniques to increase the effective size and speed of the memory: Cache memory (to
increase the effective speed) &Virtual memory (to increase the effective size

Connection of a memory to a processor


The processor reads data from the memory by loading the address of the location into the MAR
KTU - CST202 [Computer Organization and Architecture] Module: 5

register and setting the R/W line to 1.Upon receiving the MFC signal, the processor loads the data on
the data lines into the MDR register.
The processor writes data into the memory location by loading the data into MDR. It indicates that a
write operation is involved by setting the R/W line to 0. If MAR is k bits long and MDR is n bits long,
then the memory unit may contain up to 2k addressable locations. Memory access can be synchronized
by using a clock.

SEMICONDUCTOR RAM MEMORIES

Semiconductor memories are available in a wide range of speeds. Their cycle times range from
100ns to less than 10 ns.
When first introduced in late 1990s, they were much more expensive than the magnetic-core
memories they replaced. Because of rapid advances in VLSI (Very Large Scale Integration) technology,
the cost of semiconductor memories has dropped dramatically. As a result, they are now used almost
exclusively in implementing memories.

Internal Organization of Memory Chips


Memory cells are usually organized in the form of array, in which each cell is capable of storing
one bit of information.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Each row of cells consists of memory word, and all cells of a row are connected to a common
line referred to as the word line, which is driven by the address decoder on a chip.The cells in each
column are connected to a Sense/Write circuit by two bit lines. Sense/write circuits are connected to the
data input/output lines of the memory chip.
During read operation, these circuits sense or read the information stored in the cells selected by
a word line and transmit this information to the output data lines. During a write operation, the
sense/write circuits receive input information and store it in the cells of the selected word.
Two control lines, R/W and CS, are provided in addition to address and data lines. The
Read/Write input specifics the required operation, and the CS input select a given chip in multichip
memory system
Static Memories (SRAM)
Static memories are the memories that consist of circuits capable of retaining their state as long
as power is applied. Two transistor inverters are cross connected to implement a basic flip-flop. The cell
is connected to one word line and two bits lines by transistors T1 and T2. When word line is at ground
level, the transistors are turned off and the latch retains its state.

Most of the static RAMs are built using MOS (Metal Oxide Semiconductor) technology, but
some are built using bipolar technology. If the cell is in state 1/0, the signal on b is high/low and signal
on bit line b’ is low/high.
Read operation: In order to read state of SRAM cell, the word line is activated to close switches
T1 and T2. Sense/Write circuits at the bottom monitor the state of b and b’.
Write operation: During the write operation, the state of the cell is set by placing the appropriate
value on bit line b and its complement on b’ and then activating the word line. This forces the cell into
the corresponding state. The major advantage of SRAM is very quicly accessed by the processor. The
major disadvantage is that SRAM are expensive memory and SRAM are Volatile memory. If the power
is interrupted, the cell’s content will be lost. Continuous power is needed for the cell to retain its state.

Dynamic Memories (DRAM)


Static RAMs are fast, but the cost is too high because their cells require several transistors.Less
expensive RAMs can be implemented if simpler cells are used. Such cells don't retain their state
indefinitely; hence they are called dynamic RAMs
Dynamic RAMs (DRAMs) are cheap and area efficient, but they cannot retain their state
indefinitely – need to be periodically refreshed. Dynamic memory cell consists of a capacitor C, and a
KTU - CST202 [Computer Organization and Architecture] Module: 5

transistor T.
Information is stored is a dynamic memory cell in the form of charge on a capacitor and this
charge can be maintained for only tens of milliseconds.
Since the cell is required to store information for a much
longer time, its contents must be periodically refreshed by
restoring the capacitor charge to its full value.
Read Operation: Transistor turned on, Sensor check
voltage of capacitor. If voltage is less than Threshold value,
Capacitor discharged and it represents logical ‘0’ else if voltage is
above Threshold value, Capacitor charged to full voltage and it
represents Logical ‘1’
Write Operation - Transistor is turned on and a voltage is
applied/removed to the bit line.
Asynchronous Dynamic RAM:
In Asynchronous dynamic RAM, the timing of the memory device is controlled
asynchronously.A specialized memory controller circuit provides the necessary control signals, RAS
and CAS, which govern the timing. The processor must take into account the delay in the response of
the memory.

In the diagram above, we can see that there are two extra elements with two extra lines attached
to them: the Row Address Latch is controlled by the RAS (or Row Address Strobe) pin, and the Column
Address Latch is controlled by the CAS (or Column Address Strobe) pin.
Read Operation:
1. The row address is placed on the address pins via the address bus.
2. The RAS pin is activated, which places the row address onto the Row Address Latch.
3. The Row Address Decoder selects the proper row to be sent to the sense amps.
KTU - CST202 [Computer Organization and Architecture] Module: 5

4. The Write Enable (not pictured) is deactivated, so the DRAM knows that it's not being written
to.
5. The column address is placed on the address pins via the address bus.
6. The CAS pin is activated, which places the column address on the Column Address Latch.
7. The CAS pin also serves as the Output Enable, so once the CAS signal has stabilized the sense
amps, it place the data from the selected row and column on the Data Out pin so that it can
travel the data bus back out into the system.
8. RAS and CAS are both deactivated so that the cycle can begin again.
Write Operation:
1. In the write operation, the information on the data lines is transferred to the selected circuits.
For this write enable is activated
Fast Page Mode
Suppose if we want to access the consecutive bytes in the selected row. This can be done
without having to reselect the row. Add a latch at the output of the sense circuits in each row.
All the latches are loaded when the row is selected.Different column addresses can be applied to select
and place different bytes on the data lines. Consecutive sequence of column addresses can be applied
under the control signal CAS, without reselecting the row.
This methodology allows a block of data to be transferred at a much faster rate than random
accesses.A small collection/group of bytes is usually referred to as a block.This transfer capability is
referred to as the fast page mode feature. This mode of operation is useful when there is requirement for
fast transfer of data (Eg: Graphical Terminals)

Synchronous DRAM’s
Operation is directly synchronized with processor clock signal. The outputs of the sense circuits
are connected to a latch. During a Read operation, the contents of the cells in a row are loaded onto the
latches. During a refresh operation, the contents of the cells are refreshed without changing the contents
of the latches.
Data held in the latches correspond to the selected columns are transferred to the output.
KTU - CST202 [Computer Organization and Architecture] Module: 5

For a burst mode of operation, successive columns are selected using column address counter
and clock. CAS signal need not be generated externally. A new data is placed duringraising edge of the
clock
Memory latency is the time it takes to transfer a word of data to or from memory.
Memory bandwidth is the number of bits or bytes that can be transferred in one second.
Double Data Rate SDRAM
DDR-SDRAM is a faster version of SDRAM. The standard SDRAM perform all actions on the
rising edge of the clock signal.DDR SDRAM is also access the cell array in the same way but transfers
data on both edges of the clock. So bandwidth is essentially doubled for long burst transfers.
To make it possible to access the data at a high enough rate, the cell array is organized in two
banks. Each bank can access separately.Consecutive words of a given block arc stored in different
banks. Such interleaving of words allows simultaneous access to two words that are transferred on
successive edges of the clock.
Static RAM Dynamic RAM
More expensive Less expensive
No refresh Deleted & refreshed
High power Less power
Less storage capacity Higher storage capacity
MOS transistors Transistor & capacitor
Faster Slow
More reliable Less reliable

Structure of Larger Memories


Let discuss about how memory chips may be connected to form a much larger memory.
Static Memory Systems
Implement a memory unit of 2M words of 32 bits each. Use 512x8 static memory chips. Each
column consists of 4 chips. Each chip implements one byte position. A chip is selected by setting its
chip select control line to 1. Selected chip places its data on the data output line, outputs of other chips
are in high impedance state. 21 bits to address a 32-bit word and high order 2 bits are needed to select
the row, by activating the four Chip Select signals. 19 bits are used to access specific byte locations
inside the selected chip.
Dynamic Memory Systems
Large dynamic memory systems can be implemented using DRAM chips in a similar way to
static memory systems. Placing large memory systems directly on the motherboard will occupy a large
amount of space. Also, this arrangement is inflexible since the memory system cannot be expanded
easily.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Packaging considerations have led to the development of larger memory units known as SIMMs
(Single In-line Memory Modules) and DIMMs (Dual In-line Memory Modules). Memory modules are
an assembly of memory chips on a small board that plugs vertically onto a single socket on the
motherboard in order to occupy less space on the motherboard. And also allows for easy expansion by
replacement.

MEMORY SYSTEM CONSIDERATIONS

The choice of a RAM chip for a given application depends on several factors: Cost, speed,
power, size etc. SRAMs are faster, more expensive and smaller. In the case of DRAMs are slower,
cheaper and larger.
If speed is the primary requirement static RAMS are the most appropriate one. Static RAMs are
mostly used in cache memories. If cost is the prioritized factor then we are going for dynamic RAMs.
Dynamic RAMs are used for implementing computer main memories.
Refresh overhead:
All dynamic memories have to be refreshed. In DRAM, the period for refreshing all rows is
16ms whereas 64ms in SDRAM.
Eg: Suppose a SDRAM whose cells are in 8K (8192) rows; 4 clock cycles are needed to access
(read) each rows; then it takes 8192×4=32,768 cycles to refresh all rows; if the clock rate is 133 MHz,
then it takes 32,768/(133×10-6)=246×10-6 seconds; suppose the typical refreshing period is 64ms, then
the refresh overhead is 0.246/64=0.0038<0.4% of the total time available for accessing the memory.
Memory Controller
Dynamic memory chips use multiplexed address inputs so that we can reduce the number of pins.
The address is divided into two parts and they are the High order address bits and Low order address
bits. The high order selects a row in the cell array and the low order address bits selects a column in the
KTU - CST202 [Computer Organization and Architecture] Module: 5

cell array. The address selection is done under the control of RAS and CAS signal respectively for high
order and low order address bits.

READ ONLY MEMORY

SRAM and SDRAM chips are volatile: Lose the contents when the power is turned off.Many
applications need memory devices to retain contents after the power is turned off.
For example, computer is turned on; the operating system must be loaded from the diskinto the
memory. For this we need to store instructions which would load the OS from the disk that they will not
be lost after the power is turned off. So we need to store the instructions into a non-volatile memory.
Non-volatile memory is read in the same manner as volatile memory. The normal operation
involves only reading of data, this type of memory is called Read-Only memory (ROM). The data are
written into a ROM when it is manufactured and is permanent memory.

At Logic value ‘0’: Transistor(T) is connected to the ground point(P). Transistor switch is closed
& voltage on bit line nearly drops to zero. At Logic value ‘1’: Transistor switch is open. The bit line
remains at high voltage.
To read the state of the cell, the word line is activated. A Sense circuit at the end of the bit line
generates the proper output value.
Types of ROM
Different types of non-volatile memory are
• PROM
• EPROM
• EEPROM
• Flash Memory
KTU - CST202 [Computer Organization and Architecture] Module: 5

Programmable Read-Only Memory (PROM):


PROM allows the data to be loaded by the user. Programmability is achieved by inserting a
‘fuse’ at point P in a ROM cell. Before it is programmed, the memory contains all 0’s. The user can
insert 1’s at the required location by burning out the fuse at these locations using high-current pulse.
This process is irreversible.
The PROMs provides flexibility and faster data access. It is less expensive because they can be
programmed directly by the user.
Erasable Reprogrammable Read-Only Memory (EPROM):
EPROM allows the stored data to be erased and new data to be loaded. In an EPROM cell, a
connection to ground is always made at ‘P’ and a special transistor is used, which has the ability to
function either as a normal transistor or as a disabled transistor that is always turned ‘off’.
During programming, an electrical charge is trapped in an insulated gate region. The charge is
retained for more than 10 years because the charge has no leakage path. For erasing this charge, ultra-
violet light is passed through a quartz crystal window (lid). This exposure to ultra-violet light dissipates
the charge. During normal use, the quartz lid is sealed with a sticker.
EPROM can be erased by exposing it to ultra-violet light for duration of up to 40 minutes.
Usually, an EPROM eraser achieves this function.
Merits: It provides flexibility during the development phase of digital system. It is capable of
retaining the stored information for a long time.
Demerits: The chip must be physically removed from the circuit for reprogramming and its
entire contents are erased by UV light.

Electrically Erasable Programmable Read-Only Memory (EEPROM):


EEPROM is programmed and erased electrically. It can be erased and reprogrammed about ten
thousand times. Both erasing and programming take about 4 to 10 ms (millisecond). In EEPROM, any
location can be selectively erased and programmed. EEPROMs can be erased one byte at a time, rather
than erasing the entire chip. Hence, the process of reprogramming is flexible but slow.
Merits:It can be both programmed and erased electrically. It allows the erasing of all cell
contents selectively. Demerits:It requires different voltage for erasing ,writing and reading the stored
data.
Flash memory:
Flash memory is a non-volatile memory chip used for storage and for transferring data between a
personal computer (PC) and digital devices. It has the ability to be electronically reprogrammed and
erased. It is often found in USB flash drives, MP3 players, digital cameras and solid-state drives.
Flash memory is a type of electronically erasable programmable read only memory (EEPROM),
but may also be a standalone memory storage device such as a USB drives. EEPROM is a type of data
memory device using an electronic device to erase or write digital data. Flash memory is a distinct type
of EEPROM, which is programmed and erased in large blocks.
Flash memory incorporates the use of floating-gate transistors to store data. Floating-gate
transistors, or floating gate MOSFET (FGMOS), is similar to MOSFET, which is a transistor used for
KTU - CST202 [Computer Organization and Architecture] Module: 5

amplifying or switching electronic signals. Floating-gate transistors are electrically isolated and use a
floating node in direct current (DC). Flash memory is similar to the standard MOFSET, except the
transistor has two gates instead of one.

SPEED, SIZE AND COST

A big challenge in the design of a computer system is to


provide a sufficiently large memory, with a reasonable
speed at an affordable cost.
Static RAM: Very fast, but expensive, because a
basic SRAM cell has a complex circuit making it
impossible to pack a large number of cells onto a single chip.
Dynamic RAM: Simpler basic cell circuit, hence are
much less expensive, but significantly slower than
SRAMs.
Magnetic disks: Storage provided by DRAMs is
higher than SRAMs, but is still less than what is
necessary. Secondary storage such as magnetic disks
provides a large amount of storage, but is much slower than
DRAMs.

CACHE MEMORIES

Processor is much faster than the main memory. As a result, the processor has to spend much of
its time waiting while instructions and data are being fetched from the main memory. These create a
major obstacle towards achieving good performance. Speed of the main memory cannot be increased
beyond a certain point.
Cache Memory is a special very high-speed memory. It is used to speed up and synchronizing
with high-speed CPU. Cache memory is costlier than main memory or disk memory but economical
than CPU registers. Cache memory is an extremely fast memory type that acts as a buffer between
RAM and the CPU. It holds frequently requested data and instructions so that they are immediately
available to the CPU when needed.
Cache memory is used to reduce the average time to access data from the Main memory. The
cache is a smaller and faster memory which stores copies of the data from frequently used main memory
locations. There are various different independent caches in a CPU, which store instructions and data.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Cache memory is based on the property of computer programs known as “locality of reference”.
Prefetching the data into cache before the processor needs it. It needs to predict processor future access
requirement [Locality of Reference].
Locality of Reference
Analysis of programs indicates that many instructions in localized areas of a program
areexecuted repeatedly during some period of time, while the others are accessed relatively less
frequently. These instructions may be the ones in a loop, nested loop or few procedures calling each
other repeatedly. This is called “locality of reference”.
Temporal locality of reference:
Recently executed instruction is likely to be executed again very soon.

Spatial locality of reference:


Instructions with addresses close to a recently instruction are likely to be executed soon.
Basic Cache Operations
Processor issues a Read request; a block of words is
transferred from the main memory to the cache, one word at a time.
Subsequent references to the data in this block of words are
found in the cache.
At any given time, only some blocks in the main
memory are held in the cache, which blocks in the main
memory in the cache is determined by a “mapping function”.
When the cache is full, and a block of words needs to be
transferred from the main memory, some block of words in the cache
must be replaced. This is determined by a “replacement algorithm”.
Cache hit
Existence of a cache is transparent to the processor. The processor issues Read and Write
requests in the same manner. If the data is in the cache, it is called a Read or Write hit.
Read hit: The data is obtained from the cache.
Write hit: Cache has a replicaof the contents of the main memory. Contents of the cache and the
main memory may be updated simultaneously.This is the write-through protocol. Update the contents
of the cache, and mark it as updated by setting a bit knownas the dirty bit or modified bit. The contents
of the main memory are updatedwhen this block is replaced. This is write-back or copy-back protocol.
Cache miss
If the data is not present in the cache, then a Read miss or Write miss occurs.
Read miss: Block of words containing this requested word is transferred from the memory. After
the block is transferred, the desired word is forwarded to the processor.The desired word may also be
forwarded to the processor as soon as it is transferred without waiting for the entire block to be
transferred. This is called load-through or earlyrestart.
Write-miss: Write-through protocol is used, and then the contents of the main memory are
KTU - CST202 [Computer Organization and Architecture] Module: 5

updated directly. If write-back protocol is used, the block containing the addressed word is first brought
into the cache. The desired word is overwritten with new information.

MAPPING FUNCTIONS

The mapping functions are used to map a particular block of main memory to a particular block
of cache. This mapping function is used to transfer the block from main memory to cache memory.
Mapping functions determine how memory blocks are placed in the cache.
Three mapping functions:
• Direct mapping.
• Associative mapping.
• Set-associative mapping.
Direct Mapping
A particular block of main memory can be brought to a particular block of cache memory. So, it
is not flexible.
The simplest way of associating main memory
blocks with cache block is the direct mapping technique. In
this technique, block k of main memory maps into block k
modulo m of the cache, where m is the total number of
blocks in cache. In this example, the value of m is 128.
In direct mapping technique, one particular block of
main memory can be transferred to a particular block of
cache which is derived by modulo function.
Example:Block j of the main memory maps to j
modulo 128 of the cache. (ie) Block 0, 128, 256 of main
memory is maps to block 0 of cache memory.1,129,257
maps to 1, & so on.
More than one memory block is mapped onto the
same position in the cache. This may lead to contention for
cache blocks even if the cache is not full. Resolve the
contention by allowing new block to replace the old
block, leading to a trivial replacement algorithm.
Memory address is divided into three fields: Low
order 4 bits determine one of the 16 words in a block. When a new block is brought into the cache, the
next 7 bits determine whichcache block this new block is placed in. High order 5 bits determine which
of the possible32 blocks is currently presentin the cache. These are tag bits.
This mapping methodology is simple to implement but not very flexible.
Associative mapping
In the associative mapping technique, a main memory block can potentially reside in any cache
block position. In this case, the main memory address is divided into two groups, a low-order bit
identifies the location of a word within a block and a high-order bit identifies the block.
KTU - CST202 [Computer Organization and Architecture] Module: 5

In the example here, 11 bits are required to identify a main memory block when it is resident in
the cache , high-order 11 bits are used as TAG bits and low-order 5 bits are used to identify a word
within a block. The TAG bits of an address received from the CPU must be compared to the TAG bits
of each block of the cache to see if the desired block is present.

In the associative mapping, any block of main memory can go to any block of cache, so it has
got the complete flexibility and we have to use proper replacement policy to replace a block from cache
if the currently accessed block of main memory is not present in cache.
It might not be practical to use this complete flexibility of associative mapping technique due to
searching overhead, because the TAG field of main memory address has to be compared with the TAG
field of the entire cache block.
In this example, there are 128 blocks in cache and the size of TAG is 11 bits. The whole
arrangement of Associative Mapping Technique is shown in the figure below.
Set-Associative mapping
This mapping technique is intermediate to the previous two techniques. Blocks of the cache are
grouped into sets, and the mapping allows a block of main memory to reside in any block of a specific
set. Therefore, the flexibility of associative mapping is reduced from full freedom to a set of specific
blocks.
This also reduces the searching overhead, because the search is restricted to number of sets,
instead of number of blocks. Also the contention problem of the direct mapping is eased by having a
few choices for block replacement.
Consider the same cache memory and main memory organization of the previous example.
Organize the cache with 4 blocks in each set. The TAG field of associative mapping technique is
divided into two groups, one is termed as SET bit and the second one is termed as TAG bit. Each set
KTU - CST202 [Computer Organization and Architecture] Module: 5

contains 4 blocks, total number of set is 32. The main memory address is grouped into three parts: low-
order 5 bits are used to identifies a word within a block. Since there are total 32 sets present, next 5 bits
are used to identify the set. High-order 6 bits are used as TAG bits.

Replacement Algorithms
When the cache is full, there is a need for replacement algorithm for replacing the cache block
with a new block. For achieving the high-speed such types of the algorithm is implemented in hardware.
In the cache memory, there are three types of replacement algorithm are used that are:
• Random replacement policy.
• First in first Out (FIFO) replacement policy
• Least recently used (LRU) replacement policy.
Random replacement policy
This is a very simple algorithm which used to choose the block to be overwritten at random. In
this algorithm replace any cache line by using random selection. It is an algorithm which is simple and
has been found to be very effective in practice.
First in first out (FIFO)
In this algorithm replace the cache block which is having the longest time stamp. While using
this technique there is no need of updating when a hit occurs but when there is a miss occur then the
block is put into an empty block and the counter values are incremented by one.
KTU - CST202 [Computer Organization and Architecture] Module: 5

Least recently used (LRU)


In the LRU, replace the cache block which is having the less reference with the longest time
stamp. In this case also when a hit occurs when the counter value will be set to 0 but when the miss
occurs there will be arising of two possibilities in which one possibility is that counter value is set as 0
and in another possibility, the counter value can be incremented as 1.

CONTENT ADDRESSABLE MEMORY (CAM)/ ASSOCIATIVE MEMORY


Many data-processing applications require the search of items in a table stored in memory. An
assembler program searches the symbol address table in order to extract the symbol’s binary
equivalent. An account number may be searched in a file to determine the holder’s name and account
status.
The established way to search a table is to store all items where they can be addressed in
sequence. The search procedure is a strategy for choosing a sequence of addresses, reading the content
of memory at each address, and comparing the information read with the item being searched until a
match occurs. The number of accesses to memory depends on the location of the item and the
KTU - CST202 [Computer Organization and Architecture] Module: 5

efficiency of the search algorithm.


The time required to find an item stored in memory can be reduced considerably if stored
data can be identified for access by the content of thedata itself rather than by an address. A memory
unit accessed by content is called an associative memory or Content Addressable Memory
(CAM).
This type of memory is accessed simultaneously and in parallel on the basis of data content rather
than by specific address or location. When a word is written in an associative memory, no address
is given. The memory is capable of finding an empty unused location to store the word. When a
word is to be read from an associative memory, the content of the word, or part of the word,is
specified.
The memory locaters all words which match the specified content and marks them for reading.
Because of its organization, the associative memory isuniquely suited to do parallel searches by data
association.An associative memory is more expensive then a random access memory because each cell
must have storage capability as well as logic circuits for matching its content with an external
argument.For this reason, associative memories are used in applications where thesearch time is very
critical and must be very short.

HARDWARE ORGANIZATION
The block diagram of an associative memory consists of a memory array and logic from
words with n bits per word. The argument register A and key register K each have n bits, one for
each bit of a word.

Block Diagram of Associative Memory


KTU - CST202 [Computer Organization and Architecture] Module: 5

The match register M has m bits, one for each memory word. Each wordin memory is
compared in parallel with the content of the argument register.
The words that match the bits of the argument register set a corresponding bit in the match register.
After the matching process, those bits in the match register that have been set indicate the fact that
their corresponding words havebeen matched. Reading is accomplished by a sequential access to
memory for those words whose corresponding bits in the match register have been set.
The key register provides a mask for choosing a particular field or key in the argument word.
The entire argument is compared with each memory word if the key register contains all 1’s.
Otherwise, only those bits in the argument thathave 1’s in their corresponding position of the key
register are compared. Thus the key provides a mask or identifying piece of information which
specifies how the reference to memory is made.

To illustrate with a numerical example, suppose that the argument register A and the key register K
have the bit configuration shown below. Only the three leftmost bits of A are compared with memory
words because K has 1’s in these positions.

Word 2 matches the unmasked argument field because the three leftmost bits ofthe argument and the
word are equal.
The relation between the memory array and external registers in anassociative memory is
shown in below figure.
KTU - CST202 [Computer Organization and Architecture] Module: 5

The cells in the array are marked by the letter C with two subscripts. Thefirst subscript gives
the word number and the second specifies the bit position inthe word. Thus cell Cij is the cell for bit
j in word i. A bit A j in the argument register is compared with all the bits in column j of the array
provided that K j =1. This is done for all columns j = 1, 2,…,n. If a match occurs between all the
unmasked bits of the argument and the bits in word i, the corresponding bit Mi in the match register
is set to 1.
If one or more unmasked bits of the argument and the word do not match, Mi iscleared to 0.
Flop storage element Fij and the circuits for reading, writing, and matching thecell. The input bit is
transferred into the storage cell during a write operation. The bit stored is read out during a read
operation. The match logic compares the content of the storage cell with the corresponding
unmasked bit of the argument and provides an output for the decision logic that sets the bit in Mi.
READ OPERATION
The matched words are read in sequence by applying a read signal toeach word line whose
corresponding Mi bit is a 1. In most applications, the associative memory stores a table with no two
identical items under a givenkey. In this case, only one word may match the unmasked argument
field. By connecting output Mi directly to the read line in the same word position (instead of the M
register), the content of the matched word will be presented automatically at the output lines and no
KTU - CST202 [Computer Organization and Architecture] Module: 5

special read command signal is needed.Furthermore, if we exclude words having a zero content, an
all-zero output will indicate that no match occurred and that the searched item is not available in
memory.
WRITE OPERATION
If the entire memory is loaded with new information at once prior to a search operation then
the writing can be done by addressing each location in sequence. This will make the device a random-
access memory for writing and a content addressable memory for reading. The advantage here is that
the address for input can be decoded as in a random-access memory. Thus instead of having m address
lines, one for each word in memory, the number of addresslines can be reduced by the decoder to d
lines, where m = 2d.
If unwanted words have to be deleted and new words inserted one at a time, there is a need
for a special register to distinguish between active and inactive words. This register, sometimes called
a tag register, would have as many bits as there are words in the memory. For every active word stored
in memory, the corresponding bit in the tag register is set to 1. A word is deleted from memory by
clearing its tag bit to 0. Words are stored in memory by scanning the tag register until the first 0 bit
is encountered. This gives the first available inactive word and a position for writing a new word.
After the newword is stored in memory it is made active by setting its tag bit to 1. An unwanted word
when deleted from memory can be cleared to all 0’s if this valueis used to specify an empty location.

You might also like