CSC 323 Computer architecture and Organization II 2ND SEMESTER
CSC 323 Computer architecture and Organization II 2ND SEMESTER
- *Harvard Architecture*:
- Separate buses for instruction and data transfer
- Parallel execution of instructions
_CPU Architecture_
_1. CPU Components_
- _Control Unit (CU)_:
- Retrieves and decodes instructions
- Generates control signals
- _Arithmetic Logic Unit (ALU)_:
- Performs arithmetic and logical operations
- _Registers_:
- Small amount of on-chip memory
- Stores data and instructions
- _Cache Memory_:
- Small, fast memory for frequently accessed data
_Memory Hierarchy_
_1. Types of Memory_
- _Main Memory (RAM)_:
- Volatile memory
- Stores data and programs
- _Secondary Storage (Disk Storage)_:
- Non-volatile memory
- Stores data and programs long-term
- _Cache Memory_:
- Small, fast memory
- Stores frequently accessed data
- _Register Memory_:
- Small, on-chip memory
- Stores data temporarily
1.0 INTRODUCTION
2.0 OBJECTIVES
3.0 MAIN CONTENTS
UNIT ONE: MEMORY SYSTEM
3.1 Main Memories
3.2 Auxiliary Memories
3.3 Memory Access Methods
3.4 Memory Mapping and Virtual Memories
3.5 Replacement Algorithms
3.6 Data Transfer Modes
3.7 Parallel Processing
3.8 Pipelining
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READING
UNIT 1: Introduction
1.0 Introduction
Computer Organization is concerned with the structure and behaviour of a computer system as
seen by the user. It acts as the interface between hardware and software. Computer architecture
refers to those attributes of a system visible to a programmer, or put another way, those
attributes that have a direct impact on the logical execution of a program. Computer
organization refers to the operational units and their interconnection that realize the architecture
specification.
Examples of architecture attributes include the instruction set, the number of bit to represent
various data types (e.g.., numbers, and characters), I/O mechanisms, and technique for
addressing memory. 2
Examples of organization attributes include those hardware details transparent to the
programmer, such as control signals, interfaces between the computer and peripherals, and the
memory technology used.
As an example, it is an architectural design issue whether a computer will have a multiply
instruction. It is an organizational issue whether that instruction will be implemented by a
special multiply unit or by a mechanism that makes repeated use of the add unit of the system.
The organization decision may be bases on the anticipated frequency of use of the multiply
instruction, the relative speed of the two approaches, and the cost and physical size of a special
multiply unit.
Historically, and still today, the distinction between architecture and organization has been an
important one. Many computer manufacturers offer a family of computer model, all with the
same architecture but with differences in organization. Consequently, the different models in
the family have different price and performance characteristics. Furthermore, an architecture
may survive many years, but its organization changes with changing technology.
1.2. Structure and Function
A computer is a complex system; contemporary computers contain millions of elementary
electronic components. How, then, can one clearly describe them? The key is to recognize the
hierarchical nature of most complex system. A hierarchical system is a set of interrelated
subsystem, each of the later, in turn, hierarchical in structure until we reach some lowest level
of elementary subsystem.
The hierarchical nature of complex systems is essential to both their design and their
description. The designer need only deal with a particular level of the system at a time. At each
level, the system consists of a set of components and their interrelationships. The behavior at
each level depends only on a simplified, abstracted characterization of the system at the next
lower level. At each level, the designer is concerned with structure and function:
Structure: The way in which the components are interrelated.
Function: The operation of each individual component as part of the structure.
In term of description, we have two choices: starting at the bottom and building up to a
complete description, or beginning with a top view and decomposing the system, describing
their structure and function, and proceed to successively lower layer of the hierarchy. The
approach taken in this course follows the latter.
1.2.1 Function
In general terms, there are four main functions of a computer:
Data processing
Data storage
Data movement
Control
Access time (latency): For random-access memory, this is the time it takes to perform a read
or write operation. That is, the time from the instant that an address is presented to the memory
to the instant that data have been stored or made available for use. For non-random-access
memory, access time is the time it takes to position the read—write mechanism at the desired
location.
Memory cycle time: This concept is primarily applied to random-access memory and consists
of the access time plus any additional time required before
2.0 OBJECTIVES
At the end of this module, the user should be able to discuss elaborately on;
1. Memory types and their functionalities
2. The history of memory devices
3. Modes to augment processing
4. Access methods
5. Pipelining
The main memory is the central storage unit in a computer system. It is a relatively large and
fast memory used to store programs and data during the computer operation. The principal
technology used for the main memory is based on semiconductor integrated circuits. Integrated
circuit RAM chips are 9
available in two possible operating modes, static and dynamic. The static RAM consists
essentially of internal flip-flops that store the binary information. The stored information
remains valid as long as power is applied to the unit. The dynamic RAM stores the binary
information in the form of electric charges that are applied to capacitors. The capacitors are
provided inside the chip by MOS transistors. The stored charge on the capacitors tend to
discharge with time and the capacitors must be periodically recharged by refreshing the
dynamic memory. Refreshing is done by cycling through the words every few milliseconds to
restore the decaying charge. The dynamic RAM offers reduced power consumption and larger
storage capacity in a single memory chip. The static RAM is easier to use and has shorter read
and write cycles. Most of the main memory in a general-purpose computer is made up of RAM
integrated circuit chips, but a portion of the memory may be constructed with ROM chips.
Originally, RAM was used to refer to a random-access memory, but now it is used to designate
a read/write memory to distinguish it from a read-only memory, although ROM is also random
access. RAM is used for storing the bulk of the programs and data that are subject to change.
ROM is used for storing programs that are permanently resident in the computer and for tables
of constants that do not change in value once the production of the computer is completed.
Among other things, the ROM portion of main memory is needed for storing an initial program
called a bootstrap loader. The bootstrap loader is a program whose function is to start the
computer software operating when power is turned on. Since RAM is volatile, its contents are
destroyed when power is turned off. The contents of ROM remain unchanged after power is
turned off and on again. The startup of a computer consists of turning the power on and starting
the execution of an initial program. Thus when power is turned on, the hardware of the
computer sets the program counter to the first address of the bootstrap loader. The bootstrap
program loads a portion of the operating system from disk to 10
main memory and control is then transferred to the operating system, which prepares the
computer for general use.
3.2 AUXILIARY MEMORIES
These high-speed storage devices are very expensive and hence the cost per bit of storage is
also very high. Again, the storage capacity of the main memory is also very limited. Often it is
necessary to store hundreds of millions of bytes of data for the CPU to process. Therefore,
additional memory is required in all the computer systems. This memory is called auxiliary
memory or secondary storage. In this type of memory the cost per bit of storage is low.
However, the operating speed is slower than that of the primary memory. Most widely used
secondary storage devices are magnetic tapes, magnetic disks and floppy disks.
It is not directly accessible by the CPU.
Computer usually uses its input / output channels to access secondary storage and transfers
the desired data using intermediate are a in primary storage.
Magnetic tape is wound on reels (or spools). These may be used on their own, as open-reel
tape, or they may be contained in some sort of magnetic tape cartridge for protection and ease
of handling. Early computers used open-reel
tape, and this is still sometimes used on large computer systems although it has been widely
superseded by cartridge tape. On smaller systems, if tape is used at all it is normally cartridge
tape.
Figure 1.2: Magnetic Tape 12
Magnetic tape is used in a tape transport (also called a tape drive, tape deck, tape unit, or
MTU), a device that moves the tape over one or more magnetic heads. An electrical signal is
applied to the write head to record data as a magnetic pattern on the tape; as the recorded tape
passes over the read head it generates an electrical signal from which the stored data can be
reconstructed. The two heads may be combined into a single read/write head. There may also
be a separate erase head to erase the magnetic pattern remaining from previous use of the tape.
Most magnetic-tape formats have several separate data tracks running the length of the tape.
These may be recorded simultaneously, in which case, for example, a byte of data may be
recorded with one bit in each track (parallel recording); alternatively, tracks may be recorded
one at a time (serial recording) with the byte written serially along one track.
Magnetic tape has been used for offline data storage, backup, archiving, data interchange,
and software distribution, and in the early days (before disk storage was available) also as
online backing store. For many of these purposes it has been superseded by magnetic or optical
disk or by online communications. For example, although tape is a non-volatile medium, it
tends to deteriorate in long-term storage and so needs regular attention (typically an annual
rewinding and inspection) as well as a controlled environment. It is therefore being superseded
for archival purposes by optical disk.
Magnetic tape is still extensively used for backup; for this purpose, interchange standards are
of minor importance, so proprietary cartridge-tape formats are widely used.
Magnetic tapes are used for large computers like mainframe computers where large volume
of data is stored for a longer time. In PCs also you can use tapes in the form of cassettes.
The cost of storing data in tapes is inexpensive. Tapes consist of magnetic materials that
store data permanently. It can be 12.5 mm to
13
25 mm wide plastic film-type and 500 meter to 1200 meter long which is coated with magnetic
material. The deck is connected to the central processor and information is fed into or read from
the tape through the processor. It is similar to cassette tape recorder.
In PCs, the most commonly used optical storage technology is called Compact Disk Read-
Only Memory (CD-ROM).
A standard CD-ROM disk can store up to 650 MB of data, or about 70 minutes of audio.
Once data is written to a standard CD-ROM disk, the data cannot be altered or overwritten.
CD‐ROM SPEEDS AND USES Storage capacity 1 CD can store about 600 to 700 MB = 600
000 to 700 000 KB. For comparison, we should realize that a common A4 sheet of paper can
store an amount of information in the form of printed characters that would require about 2 kB
of space on a computer. So one CD can store about the same amount of text information
equivalent as 300 000 of such A4 sheets. Yellow Book standard
21
The basic technology of CD-ROM remains the same as that for CD audio, but CD-ROM
requires greater data integrity, because a corrupt bit that is not noticeable during audio playback
becomes intolerable with computer data.
So CD-ROM (Yellow Book) dedicates more bits to error detection and correction than CD
audio (Red Book).
Data is laid out in a format known as ISO 960. Advantages in comparison with other
information carriers
The information density is high.
The cost of information storage per information unit is low.
The disks are easy to store, to transport and to mail.
Random access to information is possible.
Advantages
Easier access to a range of CD-ROMs.
Ideally, access from the user’s own workstation in the office or at home.
Simultaneous access by several users to the same data.
Better security avoids damage to discs and equipment.
Less personnel time needed to provide disks to users.
Automated, detailed registration of usage statistics to support the management
Disadvantages
Costs of the network software and computer hardware.
Increased charges imposed by the information suppliers.
Need for expensive, technical expertise to select, set up, manage, and maintain the network
system.
Technical problems when the CD-ROM product is not designed for use in the network.
22
The network software component for the workstation side must be installed on each
microcomputer before this can be applied to access the CD-ROM’s.
Direct access
Random access
Associative access
Virtual memory is one of the great ideas in computer systems. A major reason for its success is
that it works silently and automatically, without any intervention from the application
programmer. Since virtual memory works so well behind the scenes, why would a programmer
need to understand it? There are several reasons. 30
• Virtual memory is central. Virtual memory pervades all levels of computer systems, playing
key roles in the design of hardware exceptions, assemblers, linkers, loaders, shared objects,
files, and processes. Understanding virtual memory will help you better understand how
systems work in general.
• Virtual memory is powerful. Virtual memory gives applications powerful capabilities to
create and destroy chunks of memory, map chunks of memory to portions of disk files, and
share memory with other processes. For example, did you know that you can read or modify the
contents of a disk file by reading and writing memory locations? Or that you can load the
contents of a file into memory without doing any explicit copying? Understanding virtual
memory will help you harness its powerful capabilities in your applications.
When a page fault occurs, the page being pointed to by the hand is inspected.
If its R bit is 0, the page is evicted, the new page is inserted into the clock in its place, and the
hand is advanced one position. If R is 1, it is cleared and the hand is advanced to the next page.
This process is repeated until a page is found with R = 0. Not surprisingly, this algorithm is
called clock. It differs from second chance only in the implementation.
3.5.6 The Least Recently Used (LRU) Page Replacement Algorithm
A good approximation to the optimal algorithm is based on the observation that pages that have
been heavily used in the last few instructions will probably be heavily used again in the next
few. Conversely, pages that have not been used for ages will probably remain unused for a long
time. This idea suggests a realizable algorithm: when a page fault occurs, throw out the page
that has been unused for the longest time. This strategy is called LRU (Least Recently Used)
paging. Although LRU is theoretically realizable, it is not cheap. To fully implement LRU, it is
necessary to maintain a linked list of all pages in memory, with the most recently used page at
the front and the least recently used page at the rear. The difficulty is that the list must be 38
updated on every memory reference. Finding a page in the list, deleting it, and then moving it to
the front is a very time-consuming operation, even in hardware (assuming that such hardware
could be built).
3.6 DATA TRANSFER MODES
The DMA mode of data transfer reduces CPU’s overhead in handling I/O operations. It also
allows parallelism in CPU and I/O operations. Such parallelism is necessary to avoid wastage
of valuable CPU time while handling I/O devices whose speeds are much slower as compared
to CPU. The concept of DMA operation can be extended to relieve the CPU further from
getting involved with the execution of I/O operations. This gives rises to the development of
special purpose processor called Input-Output Processor (IOP) or IO channel. The Input
Output Processor (IOP) is just like a CPU that handles the details of I/O operations. It is more
equipped with facilities than those are available in typical DMA controller.
1. Programmed I/O: It is due to the result of the I/O instructions that are written in the
computer program. Each data item transfer is initiated by an instruction in the program.
Usually, the transfer is from a CPU register and
40
memory. In this case it requires constant monitoring by the CPU of the peripheral devices.
Example of Programmed I/O: In this case, the I/O device does not have direct access to the
memory unit. A transfer from I/O device to memory requires the execution of several
instructions by the CPU, including an input instruction to transfer the data from device to the
CPU and store instruction to transfer the data from CPU to memory. In programmed I/O, the
CPU stays in the program loop until the I/O unit indicates that it is ready for data transfer. This
is a time consuming process since it needlessly keeps the CPU busy. This situation can be
avoided by using an interrupt facility. This is discussed below.
2. Interrupt- initiated I/O: Since in the above case we saw the CPU is kept busy
unnecessarily. This situation can very well be avoided by using an interrupt driven method for
data transfer. By using interrupt facility and special commands to inform the interface to issue
an interrupt request signal whenever data is available from any device. In the meantime the
CPU can proceed for any other program execution. The interface meanwhile keeps monitoring
the device. Whenever it is determined that the device is ready for data transfer it initiates an
interrupt request signal to the computer. Upon detection of an external interrupt signal the CPU
stops momentarily the task that it was already performing, branches to the service program to
process the I/O transfer, and then return to the task it was originally performing.
Note:Both the methods programmed I/O and Interrupt-driven I/O require the active
intervention of the processor to transfer data between memory and the I/O module, and any
data transfer must transverse a path through the processor. Thus, both these forms of I/O suffer
from two inherent drawbacks.
• The I/O transfer rate is limited by the speed with which the processor can test and service a
device.
41
• The processor is tied up in managing an I/O transfer; a number of instructions must be
executed for each I/O transfer.
3. Direct Memory Access: The data transfer between a fast storage media such as magnetic
disk and memory unit is limited by the speed of the CPU. Thus we can allow the peripherals
directly communicate with each other using the memory buses, removing the intervention of
the CPU. This type of data transfer technique is known as DMA or direct memory access.
During DMA the CPU is idle and it has no control over the memory buses. The DMA
controller takes over the buses to manage the transfer directly between the I/O devices and the
memory unit.
Bus Request: It is used by the DMA controller to request the CPU to relinquish the control
of the buses.
Bus Grant: It is activated by the CPU to Inform the external DMA controller that the buses
are in high impedance state and the requesting DMA can take control of the buses. Once the
DMA has taken the control of the buses it transfers the data. This transfer can take place in
many ways.
The quest for higher-performance digital computers seems unending. In the past two decades,
the performance of microprocessors has enjoyed an exponential growth. The growth of
microprocessor speed/performance by a factor of 2 every 18 months (or about 60% per year) is
known as Moore’s law. 42
This growth is the result of a combination of two factors:
Increase in complexity (related both to higher device density and to larger size) of VLSI
chips, projected to rise to around 10 M transistors per chip for microprocessors, and 1B for
dynamic random-access memories (DRAMs), by the year 2000
Introduction of, and improvements in, architectural features such as on-chip cache memories,
large instruction buffers, multiple instruction issue per cycle, multithreading, deep pipelines,
out-of-order instruction execution, and branch prediction.
All three aspects above are captured by a figure-of-merit often used in connection with parallel
processors: the computation speed-up factor with respect to a uniprocessor. The ultimate
efficiency in parallel systems is to achieve a computation speed-up factor of p with p
processors. Although in many cases this ideal cannot be achieved, some speed-up is generally
possible. The actual gain in speed depends on the architecture used for the system and the
algorithm run on it. Of course, for a task that is (virtually) impossible to perform on a single
processor in view of its excessive running time, the computation speed-up factor can rightly be
taken to be larger than p 43
or even infinite. This situation, which is the analogue of several men moving a heavy piece of
machinery or furniture in a few minutes, whereas one of them could not move it at all, is
sometimes referred to as parallel synergy.
A major issue in devising a parallel algorithm for a given problem is the way in which the
computational load is divided between the multiple processors. The most efficient scheme often
depends both on the problem and on the parallel machine’s architecture.
Example
Consider the problem of constructing the list of all prime numbers in the interval [1, n] for a
given integer n > 0. A simple algorithm that can be used for this computation is the sieve of
Eratosthenes. Start with the list of numbers 1, 2, 3, 4, ... , n represented as a “mark” bit-vector
initialized to 1000 . . . 00. In each step, the next unmarked number m (associated with a 0 in
element m of the mark bit-vector) is a prime. Find this element m and mark all multiples of m
beginning with m². When m² > n, the computation stops and all unmarked elements are prime
numbers. The computation steps for n = 30 are shown in the figure below
3.8 PIPELINING
There exist two basic techniques to increase the instruction execution rate of a processor. These
are to increase the clock rate, thus decreasing the instruction execution time, or alternatively to
increase the number of instructions that can be executed simultaneously. Pipelining and
instruction-level parallelism are examples of the latter technique. Pipelining owes its origin to
car assembly lines. The idea is to have more than one instruction being processed by the
processor at the same time. Similar to the assembly line, the success of a pipeline depends upon
dividing the execution of an instruction among a number of subunits (stages), each performing
part of the required operations. A possible division is to consider instruction fetch (F),
instruction decode (D), operand fetch (F), instruction execution (E), and store of results (S) as
the subtasks needed for the execution of an instruction. In this case, it is possible to have up to
five instructions in the pipeline at the same time, thus reducing instruction execution latency.
Pipeline system is like the modern day assembly line setup in factories. For example in a car
manufacturing industry, huge assembly lines are setup and at each point, there are robotic arms
to perform a certain task, and then the car moves on ahead to the next arm. 50
Types of Pipeline:
It is divided into 2 categories:
Arithmetic Pipeline- Arithmetic pipelines are usually found in most of the computers. They
are used for floating point operations, multiplication of fixed point numbers etc.
Pipeline Conflicts
There are some factors that cause the pipeline to deviate its normal performance. Some of
these factors are given below:
Timing Variations: All stages cannot take same amount of time. This problem generally
occurs in instruction processing where different instructions have different operand
requirements and thus different processing time.
Data Hazards: When several instructions are in partial execution, and if they reference same
data then the problem arises. We must ensure that next instruction does not attempt to access
data before the current instruction, because this will lead to incorrect results. Branching In order
to fetch and execute the next instruction, we must know what that instruction is. If the present
instruction is a conditional branch, and its
51
result will lead us to the next instruction, then the next instruction may not be known until the
current one is processed.
Interrupts: Interrupts set unwanted instruction into the instruction stream. Interrupts effect
the execution of instruction.
Data Dependency: It arises when an instruction depends upon the result of a previous
instruction but this result is not yet available.
Advantages of Pipelining
The cycle time of the processor is reduced.
It increases the throughput of the system
It makes the system reliable.
Disadvantages of Pipelining
The design of pipelined processor is complex and costly to manufacture.
The instruction latency is more.
Pipelining refers to the technique in which a given task is divided into a number of subtasks
that need to be performed in sequence. Each subtask is performed by a given functional unit.
The units are connected in a serial fashion and all of them operate simultaneously. The use of
pipelining improves the performance compared to the traditional sequential execution of tasks.
Figure 3.20 shows an illustration of the basic difference between executing four subtasks of a
given instruction (in this case fetching F, decoding D, execution E, and writing the results W)
using pipelining and sequential processing. 52
Figure 1.20: Pictorial Representation of a simple Pipelining Example
It is clear from the figure that the total time required to process three instructions (I1, I2, I3) is
only six time units if four-stage pipelining is used as compared to 12 time units if sequential
processing is used. A possible saving of up to 50% in the execution time of these three
instructions is obtained. In order to formulate some performance measures for the goodness of a
pipeline in processing a series of tasks, a space time chart (called the Gantt’s chart) is used.
As can be seen from the figure 3.20, 13 time units are needed to finish executing 10 instructions
(I1 to I10). This is to be compared to 40 time units if sequential processing is used (ten
instructions each requiring four time units).
4.0 CONCLUSION
Computer memory is central to the operation of a modern computer system; it stores data or
program instructions on a temporary or permanent basis for use in a computer. However, there
is an increasing gap between the speed of memory and the speed of microprocessors. In this
paper, various memory management and optimization techniques are reviewed to reduce the
gap, including the hardware designs of the memory organization such as memory hierarchical
structure and cache design; the memory management techniques varying from replacement
algorithms to optimization techniques; and virtual memory strategies from a primitive bare-
machine approach to paging and segmentation strategies. 53
5.0 SUMMARY
This module studied the memory system of a computer, starting with the organisation of its
main memory, which, in some simple systems, is the only form of data storage to the
understanding of more complex systems and the additional components they carry. Cache
systems, which aim at speeding up access to the primary storage were also studied, and there
was a greater focus on virtual memory systems, which make possible the transparent use of
secondary storage as if it was main memory, by the processor.
6.0 TUTOR MARKED ASSIGNMENT
1. Consider the execution of 500 instructions on a five-stage pipeline machine. Compute the
speed-up due to the use of pipelining given that the probability of an instruction being a branch
is p = 0.3? What must be the value of p and the expected number of branch instructions such
that a speed-up of at least 4 is possible? What must be the value of p such that a speed-up of at
least 5 is possible? Assume that each stage takes one cycle to perform its task.
2. A computer system has a three-stage pipeline consisting of a Fetch unit (F), a Decode unit
(D), and an Execute (E) unit. Determine (using the space–time chart) the time required to
execute 20 sequential instructions using two-way interleaved memory if all three units require
the use of the memory simultaneously.
3. What is the average instruction processing time of a five-stage instruction pipeline for 36
instructions if conditional branch instructions occur as follows: I5, I7, I10, I25, I27. Use both
the space–time chart and the analytical model.
4. Parallelism in everyday life Discuss the various forms of parallelism used to speed up the
following processes:
Shopping at a supermarket.
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READING
1.0 INTRODUCTION
In computing, a memory address is a reference to a specific memory location used at various
levels by software and hardware. Memory addresses are fixed-length sequences of digits
conventionally displayed and manipulated as unsigned integers. Such numerical semantic bases
itself upon features of CPU, as well upon use of the memory like an array endorsed by various
programming languages. There are many ways to locate data and instructions in primary
memory and these methods are called “memory address modes”.
Memory address modes determine the method used within the program to access data either
from the Cache or the RAM.
2.0 OBJECTIVES
The objectives of this module include;
To ensure students have adequate knowledge of memory addressing systems
To carefully study through the elements of memory hierarchy
To analyze virtual control systems
UNIT ONE
3.1 MEMORY ADDRESSING
3.1.1 What is memory addressing mode?
Memory addressing mode is the method by which an instruction operand is specified. One of
the functions of a microprocessor is to execute a sequence of instructions or programs stored in
a computer memory (register) in order to perform a particular task. The way the operands are
chosen during program execution is dependent on the addressing mode of the instruction. The
addressing mode specifies a rule for interpreting or modifying the address field of the
instruction before the operand is actually referenced. This technique is used by the computers to
give programming versatility to the user by providing such facilities as pointers to memory,
counters for loop control, 57
indexing of data, and program relocation. And as well reduce the number of bits in the
addressing field of the instruction.
However, there are basic requirement for the operation to take effect. First, the must be an
operator to indicate what action to take and secondly, there must be an operand that portray the
data to be executed. For instance; if the numbers 5 and 2 are to be added to have a result, it
could be expressed numerically as 5 + 2. In this expression, our operator is (+), or expansion,
and the numbers 5 and 2 are our operands. It is important to tell the machine in a
microprocessor how to get the operands to perform the task. The data stored in the operation
code is the operand value or the result. A word that defines the address of an operand that is
stored in memory is the effective address. The availability of the addressing modes gives the
experienced assembly language programmer flexibility for writing programs that are more
efficient with respect to the number of instructions and execution time.
3.1.2 Modes of addressing
In Direct Address Mode, the effective address is equal to the address part of the instruction. The
operand resides in memory and its address is given directly by the address field of the
instruction. In a branch-type 59
instruction the address field specifies the actual branch address. But in the Indirect Address
Mode, the address field of the instruction gives the address where the effective address is stored
in memory. Control fetches the instruction from memory and uses its address part to access
memory again to read the effective address. A few addressing modes require that the address
field of the instruction be added to the content of a specific register in the CPU. The effective
address in these modes is obtained from the following computation:
Effective address = address part of instruction + content of CPU register.
The CPU register used in the computation may be the program counter, an index register, or a
base register. In either case we have a different addressing mode which is used for a different
application.
c. Immediate Addressing Mode
In this mode the operand is specified in the instruction itself. In other words, an immediate-
mode instruction has an operand field rather than an address field. The operand field contains
the actual operand to be used in conjunction with the operation specified in the instruction.
Immediate-mode instructions are useful for initializing registers to a constant value. It was
mentioned previously that the address field of an instruction may specify either a memory word
or a processor register. When the address field specifies a processor register, the instruction is
said to be in the register mode.
d. Register Indirect Addressing Mode
In this mode the instruction specifies a register in the CPU whose contents give the address of
the operand in memory. In other words, the selected register contains the address of the operand
rather than the operand itself. Before using a register indirect mode instruction, the programmer
must ensure that the memory address of the operand is placed in the processor register with a
previous instruction. A reference 60
to the register is then equivalent to specifying a memory address. The advantage of a register
indirect mode instruction is that the address field of the instruction uses fewer bits to select a
register than would have been required to specify a memory address directly.
e. Indexed Addressing Mode
In this mode the content of an index register is added to the address part of the instruction to
obtain the effective address. The index register is a special CPU register that contains an index
value. The address field of the instruction defines the beginning address of a data array in
memory. Each operand in the array is stored in memory relative to the beginning address. The
distance between the beginning address and the address of the operand is the index value stored
in the index register. Any operand in the array can be accessed with the same instruction
provided that the index register contains the correct index value. The index register can be
incremented to facilitate access to consecutive operands. Note that if an index type instruction
does not include an address field in its format, the instruction converts to the register indirect
mode of operation. Some computers dedicate one CPU register to function solely as an index
register. This register is involved implicitly when the index-mode instruction is used. In
computers with many processor registers, any one of the CPU registers can contain the index
number. In such a case the register must be specified explicitly in a register field within the
instruction format.
f. Auto Increment Mode and Auto Decrement Mode
This is similar to the register indirect mode except that the register is incremented or
decremented after (or before) its value is used to access memory. When the address stored in
the register refers to a table of data in memory, it is necessary to increment or decrement the
register after every access to the table. This can be achieved by using the increment or
decrement instruction. However, because it is such a common requirement, some computers
incorporate a special mode that automatically increments or decrements the content of the
register after data access. The address field of an instruction is used by the control 61
unit in the CPU to obtain the operand from memory. Sometimes the value given in the address
field is the address of the operand, but sometimes it is just an address from which the address of
the operand is calculated. To differentiate among the various addressing modes it is necessary
to distinguish between the address part of the instruction and the effective address used by the
control when executing the instruction. The effective address is defined to be the memory
address obtained from the computation dictated by the given addressing mode. The effective
address is the address of the operand in a computational type instruction. It is the address where
control branches in response to a branch-type instruction.
g. Relative Addressing Mode:
In this mode the content of the program counter is added to the address part of the instruction in
order to obtain the effective address. The address part of the instruction is usually a signed
number which can be either positive or negative. When this number is added to the content of
the program counter, the result produces an effective address whose position in memory is
relative to the address of the next instruction. For instance, let’s assume that the program
counter contains the number 682 and the address part of the instruction contains the number 21.
The instruction at location 682 is read from memory during the fetch phase and the program
counter is then incremented by one to 683. The effective address computation for the relative
address mode is 683 + 21 = 704. This is 21 memory locations forward from the address of the
next instruction. Relative addressing is often used with branch-type instructions when the
branch address is in the area surrounding the instruction word itself. It results in a shorter
address field in the instruction format since the relative address can be specified with a smaller
number of bits compared to the number of bits required to designate the entire memory address.
62
3.1.4 Advantages of addressing modes
Some direction set models, for instance, Intel x86 and its substitutions, had a pile ground-
breaking area direction. This plays out an assessment of the fruitful operand location, anyway
rather following up on that memory territory, it stacks the area that might have been gotten in
the register. This may be significant during passing the area of a display part to a browse mode.
It can similarly be a fairly precarious strategy for achieving a greater number of includes than
average in one direction; for example, using such a direction with the keeping an eye on mode
“base+ index+ balance” (unequivocal underneath) licenses one to assemble two registers and a
consistent into a solitary unit in one direction. 63
UNIT TWO
3.2 ELEMENTS OF MEMORY HIERARCHY
3.2.1 What is memory hierarchy?
Memory is one of the important units in any computer system. Its serves as a storage for all the
processed and the unprocessed data or programs in a computer system. However, due to the
fact that most computer users often stored large amount of files in their computer memory
devices, the use of one memory device in a computer system has become inefficient and
unsatisfactory. This is because only one memory cannot contain all the files needed by the
computer users and when the memory is large, it decreases the speed of the processor and the
general performance of the computer system.
Therefore, to curb this challenges, memory unit must be divided into smaller memories for
more storage, speedy program executions and the enhancement of the processor performance.
The recently accessed files or programs must be placed in the fastest memory. Since the
memory with large capacity is cheap and slow and the memory with smaller capacity is fast and
costly. The organization of smaller memories to hold the recently accessed files or programs
closer to the CPU is term memory hierarchy. These memories are successively larger as they
move away from the CPU.
The strength and performance of memory hierarchy can be measured using the model below;
Memory_Stall_Cycles = IC*Mem_Refs * Miss_Rate * Miss_Penalty
Where,
IC = Instruction Count
Mem_Refs = Memory References per Instruction
Miss_Rate = Fraction of Accesses that are not in the cache
Miss_Penalty = Additional time to service the Miss
The memory hierarchy system encompasses all the storage devices used in a computer system.
Its ranges from the cache memory, which is smaller in size but faster in speed to a relatively
auxiliary memory which is larger in size but slower in speed. The smaller the size of the
memory the costlier it becomes. 64
The element of the memory hierarchy includes
a. Cache memory,
b. Main memory and
c. Auxiliary memory
The cache memory is the fastest and smallest memory. It is easily accessible by the CPU
because it closer to the CPU. Cache memory is very costly compare to the main memory and
the auxiliary memory.
The main memory also known as primary memory, communicates directly to the CPU. Its
also communicates to the auxiliary memory through the I/O processor. During program
execution, the files that are not currently needed by the CPU are often moved to the auxiliary
storage devices in order to create space in the main memory for the currently needed files to be
stored. The main memory is made up of Random Access Memory (RAM) and Read Only
Memory (ROM).
The auxiliary memory is very large in size and relatively slow in speed. Its includes the
magnetic tapes and the magnetic disks which are used for the storage and backup of removable
files. The auxiliary memories store programs that are not currently needed by the CPU. They
are very cheap when compare to the both cache and main memories.
3.3.2 Paging
In memory management, paging can be described as a storage mechanism that allows operating
system (OS) to retrieve processes from the 71
secondary storage into the main memory in the form of pages. It is a function of memory
management where a computer will store and retrieve data from a device’s secondary storage to
the primary storage. Memory management is a crucial aspect of any computing device, and
paging specifically is important to the implementation of virtual memory.
In the Paging method, the main memory is divided into small fixed-size blocks of physical
memory, which is called frames. The size of a frame should be kept the same as that of a page
to have maximum utilization of the main memory and to avoid external fragmentation. Paging
is used for faster access to data, and it is a logical concept. For instance, if the main memory
size is 16 KB and Frame size is 1 KB. Here, the main memory will be divided into the
collection of 16 frames of 1 KB each. There are 4 separate processes in the system that is A1,
A2, A3, and A4 of 4 KB each. Here, all the processes are divided into pages of 1 KB each so
that operating system can store one page in one frame. At the beginning of the process, all the
frames remain empty so that all the pages of the processes will get stored in a contiguous way.
A typical paging process is presented in Figure 3.1 below. 72
1 Frame = 1KB
Frame size = Page size
Main memory
(Collection of frames) Paging
Paging
Process A 1
16KB
A1
A1
A1
A1
A1
A1
Process A 2
A1
A1
A2
A2
A2
A2
A2
A2
A2
Process A 3
A3
A3
A3
A3
A3
A2
A3
A3
Process A 4
A3
A4
A4
A4
A4
A4
A4
A4
A4
Figure 2.3.1: Paging process
From the above diagram you can see that A2 and A4 are moved to the waiting state after some
time. Therefore, eight frames become empty, and so other pages can be loaded in that empty
blocks. The process A5 of size 8 pages (8 KB) are waiting in the ready queue.
In conclusion, paging is a function of memory management where a computer will store and
retrieve data from a device’s secondary storage to the primary storage. Memory management is
a crucial aspect of any computing device, and paging specifically is important to the
implementation of virtual memory.
3.3.2.1. Paging Protection
The paging process should be protected by using the concept of insertion of an additional bit
called Valid/Invalid bit. Paging Memory protection in paging is achieved by associating
protection bits with each page. These bits are 73
associated with each page table entry and specify protection on the corresponding page.
3.3.2.2. Advantages and Disadvantages of Paging
Advantages
The following are the advantages of using Paging method:
a. No need for external Fragmentation
Disadvantages
The following are the disadvantages of using Paging method
a. May cause Internal fragmentation
b. Page tables consume additional memory.
c. Multi-level paging may lead to memory reference overhead.
Disadvantages of Multiprogramming:
a. Long time jobs have to wait long
b. Tracking all processes sometimes difficult
c. CPU scheduling is required
d. Requires efficient memory management
e. User interaction not possible during program execution
Full read and write privileges are given to a program when it is executing its own instructions.
Write protection is useful for sharing system programs such as utility programs and other library
routines. These system programs are stored in an area of memory where they can be shared by
many users. They 81
can be read by all programs, but no writing is allowed. This protects them from being changed
by other programs. The execute-only condition protects programs from being copied. It restricts
the segment to be referenced only during the instruction fetch phase but not during the execute
phase. Thus it allows the users to execute the segment program instructions but prevents them
from reading the instructions as data for the purpose of copying their content. Portions of the
operating system will reside in memory at any given time. These system programs must be
protected by making them inaccessible to unauthorized users. The operating system protection
condition is placed in the descriptors of all operating system programs to prevent the occasional
user from accessing operating system segments.
3.3.8 Hierarchical memory systems
In the Computer System Design, Memory Hierarchy is used to enhance the organization of
memory such that it can minimize the access time. It was developed based on a program
behavior known as locality of references. Hierarchical memory system is the collection of
storage units or devices together. The memory unit stores the binary information in the form of
bits. Generally, memory/storage is classified into 2 categories:
External Memory or Secondary Memory: This is a permanent storage (non volatile) and
does not lose any data when power is switched off. It is made up of Magnetic Disk, Optical Disk,
Magnetic Tape i.e. peripheral storage devices which are accessible by the processor via I/O
Module.
Internal Memory or Primary Memory: This memory is volatile in nature. it loses its data,
when power is switched off. It is made up of Main Memory, Cache Memory & CPU registers.
This is directly accessible by the processor.
82
Properties of Hierarchical Memory Organization
There are three important properties for maintaining consistency in the memory hierarchy these
three properties are;
Inclusion
Coherence and
Locality.
4.0 CONCLUSION
The memory hierarchy system encompasses all the storage devices used in a computer system.
Its ranges from fastest but smaller in size (cache memory ) to a relatively fast but small in size
( main memory) and more slower but larger in size (auxiliary memory). A memory element is a
set of storage devices that stores binary in bits. They include; register, cache memory, main
memory, magnetic disk and magnetic tape. This set of storage devices can be classified into two
categories such as; the primary memory and the secondary memory.
5.0 SUMMARY
Memory addresses act just like the indexes of a normal array. The computer can access any
address in memory at any time (hence the name "random access memory"). It can also group
bytes together as it needs to form larger variables, arrays, and structures. Memory hierarchy is
the hierarchy of memory and storage devices found in a computer system. It ranges from the
slowest but high capacity auxiliary memory to the fastest but low capacity cache memory.
Memory hierarchy is employed to balance this trade-off. 83
6.0 TUTOR MARKED ASSIGNMENT
1. What are the properties of hierarchical memory organization?
2. Explain the concept of memory protection
3. How do you perform address mapping using segments?
4.0 CONCLUSION
5.0 SUMMARY
6.0 TUTOR MARKED ASSIGNMENT
7.0 REFERENCES/FURTHER READING
1.0 INTRODUCTION
Control Unit is the part of the computer’s central processing unit (CPU), which directs the
operation of the processor. It was included as part of the Von Neumann Architecture by John von
Neumann. It is the responsibility of the 85
Control Unit to tell the computer’s memory, arithmetic/logic unit and input and output devices
how to respond to the instructions that have been sent to the processor. It fetches internal
instructions of the programs from the main memory to the processor instruction register, and
based on this register contents, the control unit generates a control signal that supervises the
execution of these instructions. A control unit works by receiving input information to which it
converts into control signals, which are then sent to the central processor. The computer’s
processor then tells the attached hardware what operations to perform. The functions that a
control unit performs are dependent on the type of CPU because the architecture of CPU varies
from manufacturer to manufacturer. Examples of devices that require a CU are:
Control Processing Units (CPUs)
Graphics Processing Units (GPUs)
UNIT ONE
HARDWARE CONTROL
3.1.1 Hardwired Control Unit
A hardwired control is a mechanism of producing control signals using Finite State Machines
(FSM) appropriately. It is designed as a sequential logic circuit. The final circuit is constructed
by physically connecting the components such as gates, flip flops, and drums. Hence, it is named
a hardwired controller. In the Hardwired control unit, the control signals that are important for
instruction execution control are generated by specially designed hardware logical circuits, in
which we cannot modify the signal generation method without physical change of the circuit
structure. The operation code of an instruction contains the basic data for control signal
generation. In the instruction decoder, the operation code is decoded. The instruction decoder
constitutes a set of many decoders that decode different fields of the instruction opcode. As a
result, few output lines going out from the instruction decoder obtains active signal values. These
output lines are connected to the inputs of the matrix that generates control signals for executive
units of the computer.
This matrix implements logical combinations of the decoded signals from the instruction opcode
with the outputs from the matrix that generates signals representing consecutive control unit
states and with signals coming from the 87
outside of the processor, e.g. interrupt signals. The matrices are built in a similar way as a
programmable logic arrays.
3.1.2 Design of a hardwired Control Unit
Control signals for an instruction execution have to be generated not in a single time point but
during the entire time interval that corresponds to the instruction execution cycle. Following the
structure of this cycle, the suitable sequence of internal states is organized in the control unit. A
number of signals generated by the control signal generator matrix are sent back to inputs of the
next control state generator matrix. This matrix combines these signals with the timing signals,
which are generated by the timing unit based on the rectangular patterns usually supplied by the
quartz generator. When a new instruction arrives at the control unit, the control units is in the
initial state of new instruction fetching. Instruction decoding allows the control unit enters the
first state relating execution of the new instruction, which lasts as long as the timing signals and
other input signals as flags and state information of the computer remain unaltered. A change of
any of the earlier mentioned signals stimulates the change of the control unit state. This causes
that a new respective input is generated for the control signal generator matrix. When an external
signal appears, (e.g. an interrupt) the control unit takes entry into a next control state that is the
state concerned with the reaction to this external
Figure 3.2.1: Hardwired Control Unit 88
signal (e.g. interrupt processing). The values of flags and state variables of the computer are used
to select suitable states for the instruction execution cycle. The last states in the cycle are control
states that commence fetching the next instruction of the program: sending the program counter
content to the main memory address buffer register and next, reading the instruction word to the
instruction register of computer. When the ongoing instruction is the stop instruction that ends
program execution, the control unit enters an operating system state, in which it waits for a next
user directive.
Advantages of Hardwired Control Unit:
1. Because of the use of combinational circuits to generate signals, Hardwired Control Unit is
fast.
2. It depends on number of gates, how much delay can occur in generation of control signals.
3. It can be optimized to produce the fast mode of operation.
4. Faster than micro- programmed control unit.
The fundamental difference between these unit structures and the structure of the hardwired
control unit is the existence of the control store that is used for storing words containing encoded
control signals mandatory for instruction execution. In microprogrammed control units,
subsequent instruction words are fetched into the instruction register in a normal way. However,
the operation code of each instruction is not directly decoded to enable immediate control signal
generation but it comprises the initial address of a microprogram contained in the control store.
Figure 3.2.2: Single level control store 90
With a single-level control store: In this, the instruction opcode from the instruction register is
sent to the control store address register. Based on this address, the first microinstruction of a
microprogram that interprets execution of this instruction is read to the microinstruction register.
This microinstruction contains in its operation part encoded control signals, normally as few bit
fields. In a set microinstruction field decoders, the fields are decoded. The microinstruction also
contains the address of the next microinstruction of the given instruction microprogram and a
control field used to control activities of the microinstruction address generator.
The last mentioned field decides the addressing mode (addressing operation) to be applied to the
address embedded in the ongoing microinstruction. In microinstructions along with conditional
addressing mode, this address is refined by using the processor condition flags that represent the
status of computations in the current program. The last microinstruction in the instruction of the
given microprogram is the microinstruction that fetches the next instruction from the main
memory to the instruction register.
With a two-level control store: In this, in a control unit with a two-level control store,
besides the control memory for microinstructions, a Nano-instruction memory is included. In
such a control unit, microinstructions do not contain encoded control signals. The operation part
of microinstructions contains the address of the word in the Nano-instruction memory, which
contains encoded control signals. The Nano-instruction memory contains all combinations of
control signals that appear in microprograms that interpret the complete instruction set of a given
computer, written once in the form of Nano-instructions
91
Figure 3.2.3: Two-level control store
In this way, unnecessary storing of the same operation parts of microinstructions is avoided. In
this case, microinstruction word can be much shorter than with the single level control store. It
gives a much smaller size in bits of the microinstruction memory and, as a result, a much smaller
size of the entire control memory. The microinstruction memory contains the control for
selection of consecutive microinstructions, while those control signals are generated at the basis
of Nano-instructions. In Nano-instructions, control signals are frequently encoded using 1 bit/ 1
signal method that eliminates decoding. 92
3.2.2 DIFFERENCES BETWEEN HARDWIRED AND MICROPROGRAMMED
CONTROL
Advantages OF Micro programmed Control Unit
There are the following advantages of microprogrammed control are as follows:
It can more systematic design of the control unit.
It is simpler to debug and change.
It can retain the underlying structure of the control function.
It can make the design of the control unit much simpler. Hence, it is inexpensive and less
error-prone.
93
It can orderly and systematic design process.
It is more flexible.
The control signals are represented in the decoded binary format that is 1 bit/CS. Example: If 53
Control signals are present in the processor than 53 bits are required. More than 1 control signal
can be enabled at a time.
It supports longer control word.
It is used in parallel processing applications.
It allows higher degree of parallelism. If degree is n, n CS are enabled at a time.
95
It requires no additional hardware(decoders). It means it is faster than Vertical
Microprogrammed.
It is more flexible than vertical microprogrammed
Easing of global timing issues: In systems such as a synchronous microprocessor, the system
clock, and thus system performance, is dictated by the slowest (critical) path. Thus, most
portions of a circuit must be carefully optimized to achieve the highest clock rate, including
rarely used portions of the system. Since many asynchronous systems operate at the speed of the
circuit path currently in operation, rarely used portions of the circuit can be left un-optimized
without adversely affecting system performance.
Both synchronous and asynchronous transmission have their benefits and limitations.
Asynchronous transmission is used for sending a small amount of data while the synchronous
transmission is used for sending bulk amounts of data. Thus, we can say that both synchronous
and asynchronous transmission are essential for the overall process of data transmission. 108
3.3.7 Emerging application areas
Beyond more classical design targets, a number of novel application areas have recently emerged
where asynchronous design is poised to make an impact.
Large-scale heterogenous system integration. In multi- and many-core processors and
systems-onchip (SoC’s), some level of asynchrony is inevitable in the integration of
heterogeneous components.
Typically, there are several distinct timing domains, which are glued together using an
asynchronous communication fabric. There has been much recent work on asynchronous and
mixed synchronous asynchronous systems
Ultra-low-energy systems and energy harvesting.
Asynchronous design is also playing a crucial role in the design of systems that operate in
regimes where energy availability is extremely limited. In one application, Such fine-grain
adaptation, in which the datapath latency can vary subtly for each input sample, is not possible in
a fixed-rate synchronous design. In a recent in-depth case study by Chang et al., focusing on
ultra-low-energy 8051 microcontroller cores with voltage scaling, it was shown that under
extreme process, voltage, and temperature (PVT) variations, a synchronous core requires its
delay margins to be increased by a factor of 12_, while a comparable asynchronous core can
operate at actual speed.
Continuous-time digital signal processors (CTDSP’s).
Another intriguing direction is the development of continuous-time digital signal processors,
where input samples are generated at irregular rates by a level-crossing analog-to-digital
converter, depending on the actual rate of change of the input’s waveform.
An early specialized approach, using finel discretized sampling, demonstrated a 10_ power
reduction 109
Alternative computing paradigms.
Finally, there is increasing interest in asynchronous circuits as the organizing backbone of
systems based on emerging computing technologies, such as cellular nano-array and nano
magnetics, where highly-robust asynchronous approaches are crucial to mitigating timing
irregularities.
3.3.8 Asynchronous Datapaths and Data Transfer
The internal operations in an individual unit of a digital system are synchronized using clock
pulse. It means clock pulse is given to all registers within a unit. And all data transfer among
internal registers occurs simultaneously during the occurrence of the clock pulse. Now, suppose
any two units of a digital system are designed independently, such as CPU and I/O interface. If
the registers in the I/O interface share a common clock with CPU registers, then transfer between
the two units is said to be synchronous. But in most cases, the internal timing in each unit is
independent of each other, so each uses its private clock for its internal registers. In this case, the
two units are said to be asynchronous to each other, and if data transfer occurs between them,
this data transfer is called Asynchronous Data Transfer. In other words, the two units are said
to be asynchronous to each other. CPU and I/O device must coordinate for data transfers.
But, the Asynchronous Data Transfer between two independent units requires that control signals
be transmitted between the communicating units so that the time can be indicated at which they
send data.
These two methods can achieve this asynchronous way of data transfer:
Strobe Control: This is one way of transfer i.e. by means of strobe pulse supplied by one of
the units to indicate to the other unit when the transfer has to occur.
Handshaking: This method is used to accompany each data item being transferred with a
control signal that indicates the presence of data in the
110
bus. The unit receiving the data item responds with another control signal to acknowledge receipt
of the data.
The strobe pulse and handshaking method of asynchronous data transfer is not restricted to I/O
transfer. They are used extensively on numerous occasions requiring the transfer of data between
two independent units. So, here we consider the transmitting unit as a source and receiving unit
as a destination.
Strobe control method of data transfer uses a single control signal for each transfer. The strobe
may be activated by either the source unit or the destination unit. This control line is also known
as a strobe, and it may be achieved either by source or destination, depending on which initiate
the transfer.
Source Initiated Strobe
Destination Initiated Strobe
SOURCE INITIATED STROBE: The data bus carries the binary information from source unit
to the destination unit as shown below.
Figure 3.3.3: Strobe control method
The strobe is a single line that informs the destination unit when a valid data word is available in
the bus.
Figure 3.3.4: Source Initiated Strobe 111
The source unit first places the data on the bus.
After a brief delay to ensure that the data settle to a steady value, the source activities the
strobe pulse.
The information of the data bus and the strobe signal remain in the active state for a sufficient
time period to allow the destination unit to receive the data.
The source removes the data from the bus for a brief period of time after it disables its strobe
pulse.
The source unit responds by placing the requested binary information on the unit to accept it.
The data must be valid and remain in the bus long enough for the destination unit to accept it.
The falling edge of the strobe pulse can be used again to trigger a destination register.
The source removes the data from the bus after a predetermined time interval.
Similarly, destination initiated transfer has no method of knowing whether the source unit has
placed the data on the data bus.
Handshaking mechanism solves this problem by introducing a second control signal that
provides a reply to the unit that initiate the transfer.
The two handshaking lines are data valid, which is generated by the source unit, and data
accepted, generated by the destination unit.
The data accepted signals is activated by the destination unit after it accepts the data from the
bus.
The source unit then disables its data valid signal, which invalidates the data on the bus.
The destination unit the disables its data accepted signal and the system goes into its initial
state.
The source unit does not place the data on the bus until it receives the ready for data signal
from the destination unit.
The handshaking procedure follows the same pattern as in source initiated case. The sequence
of events in both the cases is almost same except the ready for signal has been converted from
data accepted in case of source initiated.
114
Figure 3.3.7: Destination IT
Advantages of Asynchronous Data Transfer
Asynchronous Data Transfer in computer organization has the following advantages, such as:
o It is more flexible, and devices can exchange information at their own pace. In addition,
individual data characters can complete themselves so that even if one packet is corrupted, its
predecessors and successors will not be affected.
o It does not require complex processes by the receiving device. Furthermore, it means that
inconsistency in data transfer does not result in a big crisis since the device can keep up with the
data stream. It also
A large portion of the transmitted data is used to control and identify header bits and thus
carries no helpful information related to the transmitted data. This invariably means that more
data packets need to be sent.
4.0 CONCLUSION
In this module, we have discussed in detail the implementation of control units. We started with
an implementation of the hardwired, The logic micro-operation implementation has also been
discussed. Thus, leading to a logical construction of a simple arithmetic– logic –shift unit. The
unit revolves around the basic ALU with the help of the units that are constructed for the
implementation of micro-operations. We have also we discussed the arithmetic processors and
the organization of control units. Various types of control units such as Hardwired, Wilkes and
micro-programmed control units are also discussed. The key to such control units are micro-
instruction, which are briefly (that is types and formats) described in this unit. Finally the
function of a micro-programmed unit, that is, micro-programmed execution, has also been
discussed. The control unit is the key for the optimized performance of a computer. More detail
study can be done by going through suggested readings. 116
5.0 SUMMARY
Control Unit is the part of the computer’s central processing unit (CPU), which directs the
operation of the processor. It was included as part of the Von Neumann Architecture by John von
Neumann. It is the responsibility of the Control Unit to tell the computer’s memory,
arithmetic/logic unit and input and output devices how to respond to the instructions that have
been sent to the processor. It fetches internal instructions of the programs from the main memory
to the processor instruction register, and based on this register contents, the control unit generates
a control signal that supervises the execution of these instructions.
6.0 TUTOR MARKED ASSIGNMENT
1. What is control unit?
2. Explain functional requirement of control unit.
3. What are the inputs to control unit?
4. Describe different types of control unit with diagram.