Unit III - Bocs 2020 Final
Unit III - Bocs 2020 Final
Introduction
Peripherals devices are the input and output devices attached to the computer system to add
functionality.
Usually, the word peripheral is used to refer to a device external to the computer case but the
devices located inside the computer case are also technically peripherals.
Devices that exist outside the computer case are called external peripherals, or auxiliary
components and devices that are inside the case such as internal hard drives or CD-ROM drives
are called internal peripherals.
Other new devices such as digital watches, smartphones and tablet computers have interfaces
which allow them to be used as a peripheral by a full computer, though they are not host-dependent
as other peripheral devices are.
Types of Peripherals
1. Input peripherals:
They allow user input, from the outside world to the computer.
• Keyboard
- Card Reader
- Touch Screen
- Light Pen
- Mouse
•
2. Output peripherals:
They allow information output from the computer to the outside world.
• CRT
• Plotter
• Speakers
• Flash drive
• Disk drive
• Smartphone
• CD/DVD drive
3. Input-Output peripherals:
They allow both input (from outside world to computer) as well as, output (from computer
to the outside world).
Example:
• Touch screen
• Modem
Introduction
One distinguishing factor among input devices and also among output devices is their data
processing rate, defines as the average number of characters that can be processed by the device
per second.
For example, while the data processing rate of an input device such as the keyboard is about 10
characters (bytes)/second, a scanner send data at a rate of about 200,000 characters/second.
Similarly, while a laser printer can output data at a rate of about 100,000 characters/second, a
graphic display can output data at a rate of about 30,000,000 characters/second.
Striking a character on the keyboard of a computer will cause a character (in the form of an ASCII
code) to be sent to the computer.
The amount of time passed before the next character is sent to the computer will depend on the
skill of the user and even sometimes on the user’s speed of thinking.
It is often the case that the user knows what he/she wants to input, but sometimes, they need to
think before touching the next button on the keyboard.
Therefore, input from a keyboard is slow and burst in nature and it will be a waste of time for the
computer spend its valuable time waiting for input from slow input devices.
A mechanism is therefore needed whereby a device will have to interrupt the processor asking for
attention whenever it is ready.
This is called interrupt-driven communication between the computer and I/O devices.
A typical disk should be capable of transferring data at rates exceeding several million
bytes/second.
It would be a waste of time to transfer data bytes by bytes or even word by word.
Therefore, it is always the case that data is transferred in the form of blocks, that is, entire program.
It is also necessary to provide a mechanism that allows a disk to transfer this huge volume of data
without the intervention of the CPU.
This allows the CPU to perform other useful operation(s) while a huge amount of data is being
transferred between the disk and the memory.
The figure below shows a simple arrangement for connecting the processor and the memory in a
given computer system to an input device, for example, a keyboard and an output device such as
a graphic display.
As single bus consisting of the required address, data, control lines is used to connect the system’s
component in the figure above.
We are concerned with the way the processor and the I/O devices exchange data.
It has been indicated in the introduction part that there exists a big difference in the rate at which
the processor can process information and those of input and output devices.
One simple way to accommodate this speed difference is to have the input device, for example, a
keyboard, deposit the character struck by the user in a register (input register), which indicates the
availability of that character to the processor.
When the input character has been taken by the processor, this will be indicated to the input device
in order to proceed and input the next character, and so on.
Similarly, when the processor has a character to output (display), it deposits it in a specific register
dedicated for communication with the graphic display (output register).
When the character has been taken by the graphic display, this is indicated to the processor such
that it can proceed and output the next character and so on.
This simple way of communication between the processor and I/O devices, called I/O protocol,
requires the availability of the input and output registers.
In a typical computer system, there is a number of input registers, each belonging to a specific
input device.
There is also a number of output registers each belonging to a specific output device.
In addition, a mechanism according to which the processor can address those input and output
registers must be adopted.
The execution of an input device address will cause the character stored in the input register
of the device to be transferred to a specific register in the CPU.
Similarly, the execution of an output instruction at an output device address will cause the
characters stored in a specific register in the CPU to be transferred to the output register of
that output device.
In this case the address and data lines from the CPU can be shared between the memory
and the I/O devices,
This is because of the need for executing input and output instructions.
In a typical computer system, there exists more than one input and more than one output
device.
Therefore, there is a need to have address decoder circuitry for device identification.
There is also a need for status register for each input and output devices.
The status of an input device, whether it is ready to send data to the processor, should be
stored in the status register of that device.
Similarly, the status of an output device, whether it is ready to receive data from the
processor, should be stored in the status register of that device.
Input (output) registers, status registers, and address decoder circuitry represents the main
component of an I/O interface (module).
The main advantage of the shared I/O arrangement is the separation between the memory
address space and that of the I/O devices.
Its main disadvantage is the need to have special input and output instructions in the
processor instruction set.
2) The second possible I/O arrangement is to deal with input and output registers as if they
are regular memory locations.
In this case, a read operation from the address corresponding to the input register of an
input device, for example, Read Device 6, is equivalent to performing an input operation
from the input register in Device #6.
Similarly, a write operation to the address corresponding to the output register of an output
device, for example, Write Device 9, is equivalent to performing an output operation into
the output register in Device #9.
The main advantage of the memory-mapped I/O is the use of the read and write instructions
of the processor to perform the input and output operations, respectively.
The main disadvantage of the memory-mapped I/O is the need to reserve a certain part of
the memory address space for addressing I/O devices, that is, a reduction in the availabale
memory address space.
INPUT-OUTPUT INTERFACES
Input-output interface provides a method for transferring information between internal storage and
external I/O devices.
Peripherals connected to a computer need special communication links for interfacing them with
the central processing unit.
The purpose of the communication link is to resolve the differences that exist between the central
computer and each peripheral.
The data transfer rate of peripherals is usually slower than the transfer rate of the CPU, and
consequently, a synchronization mechanism may be need.
Data codes and formats in peripherals differ from the word format in the CPU and memory.
The operating modes of peripherals are different from each other and each must be
controlled so as not to disturb the operation of other peripherals connected to the CPU.
To resolve these differences, computer systems include special hardware components between the
CPU and peripherals to supervise and synchronize all input and output transfers.
These components are called interface units because they interface between the processor bus and
the peripheral device.
In addition, each device may have its own controller that supervises the operations of the particular
mechanism in the peripheral.
A typical communication link between the processor and several peripherals is shown in the figure
below.
The I/O bus consists of data lines, address lines, and control lines.
The magnetic disk, printer, and terminal are employed in practically any general-purpose
computer.
Each interface decodes the address and control received from the I/O bus, interprets them for the
peripheral, and provides signals for the peripheral controller.
It also synchronizes the data flow and supervises the transfer between peripheral and processor.
Each peripheral has its own controller that operates the particular electromechanical device.
For example, the printer controller controls the paper motion, the print timing, and the selection of
printing characters.
A controller may be housed separately or may be physically integrated with the peripheral.
The I/O bus from the processor is attached to all peripheral interfaces.
To communicate with a particular device, the processor places a device address on the address
lines.
Each interface attached to the I/O bus contains an address decoder that monitors the address lines.
When the interface detects its own address, it activates the path between the bus lines and the
device that it controls.
All peripherals whose address does not correspond to the address in the bus disable their interface.
At the same time that the address is made available in the address lines, the processor provides a
function code in the control lines.
The interface selected responds to the function code and proceeds to execute it.
The function code is referred to as an I/O command and is in essence an instruction that is executed
in the interface and its attached peripheral unit.
The interpretation of the command depends on the peripheral that the processor is addressing.
They are classified as control, status, status, data output, and data input.
1) A control command is issued to activate the peripheral and to inform it what to do.
For example, a magnetic tape unit may be instructed to backspace the tape by one record,
to rewind the tape, or to start the tape moving in the forward direction.
The particular control command issued depends on the peripheral, and each peripheral
receives its own distinguished sequence of control commands, depending on its mode of
operation.
2) A status command is used to test various status conditions in the interface and the
peripheral.
For example, the computer may wish to check the status of the peripheral before a transfer
is initiated.
During the transfer, one or more errors may occur which are detected by the interface.
These errors are designated by setting bits in a status register that the processor can read at
certain intervals.
3) A data output command causes the interface to respond by transferring data from the bus
into one of its registers.
The processor then monitors the status of the tape by means of a status command.
When the tape is in the correct position, the processor issues a data output command.
The interface responds to the address and command and transfers the information from the
data lines in the bus to its buffer register.
The interface then communicates with the tape controller and sends the data to be stored
on tape.
In this case the interface receives an item of data from the peripheral and places it in its
buffer register.
The processor checks if data are available by means of a status command and then issues
a data input command.
The interface places the data on the data lines, where they are accepted by the processor.
In addition to communicating with I/O, the processor must communicate with the memory unit.
Like the I/O bus, the memory bus contains data, address, and read/write control lines.
There are three ways that computer buses can be used to communicate with memory and I/O:
1) Use two separate buses, one for memory and the other for I/O.
2) Use one common bus for both memory and I/O but have separate control lines for each.
3) Use one common bus for memory and I/O with common control lines.
In the first method, the computer has independent sets of data, address, and control buses, one for
accessing memory and the other for I/O.
This is done in computers that provide a separate I/O processor (IOP) in addition to the central
processing unit (CPU).
The memory communicates with both the CPU and the IOP through a memory bus.
The IOP communicates also with the input and output devices through a separate I/O bus with its
own address, data and control lines.
The purpose of the IOP is to provide an independent pathway for the transfer of information
between external devices and internal memory.
Many computers use one common bus to transfer information between memory or I/O and the
CPU.
The distinction between a memory transfer and I/O transfer is made through separate read and
write lines.
The CPU specifies whether the address on the address lines is for a memory word or for an
interface register by enabling one of two possible read or write lines.
The I/O read and I/O writes control lines are enabled during an I/O transfer.
The memory read and memory write control lines are enabled during a memory transfer.
This configuration isolates all I/O interface addresses from the addresses assigned to memory and
is referred to as the isolated I/O method for assigning addresses in a common bus.
In the isolated I/O configuration, the CPU has distinct input and output instructions, and each of
these instructions is associated with the address of an interface register.
When the CPU fetches and decodes the operation code of an input or output instruction, it places
the address associated with the instruction into the common address lines.
At the same time, it enables the I/O read (for input) or I/O write (for output) control line.
This informs the external components that are attached to the common bus that the address in the
address lines is for an interface register and not for a memory word.
On the other hand, when the CPU is fetching an instruction or an operand from memory, it places
the memory address on the address lines and enables the memory read or memory write control
line.
This informs the external components that the address is for a memory word and not for an I/O
interface.
The isolated I/O method isolates memory word and not for an I/O addresses so that memory
address values are not affected by interface address assignment since each has its own address
space.
The other alternative is to use the same address space for both memory and I/O.
This is the case in computers that employ only one set of read and write signals and do not
distinguish between memory and I/O addresses.
The computer treats an interface Register as being part of the memory system.
The assigned addresses for interface registers cannot be used for memory words, which reduce the
memory address range available.
The CPU can manipulate I/O data residing in interface registers with the same instructions that are
used to manipulate memory words.
Each interface is organized as a set of registers that respond to read and write requests in the normal
address space.
Typically, a segment of the total address space is reserved for interface registers, but in general,
they can be located at any address as long as there is not also a memory word that responds to the
same address.
Computers with memory-mapped I/O can use memory-type instructions to access I/O data.
It allows the computer to use the same instructions for either input-output transfers or for memory
transfers.
The advantage is that the load and store instructions used for reading and writing from memory
can be used to input and output data from I/O registers.
In a typical computer, there are more memory-reference instructions than I/O instructions.
With memory-mapped I/O all instructions that refer to memory are also available for I/O.
The internal operations in a digital system are synchronized by means of clock pulses supplied by
a common pulse generator.
Clock pulses are applied to all registers within a unit and all data transfers among internal registers
occur simultaneously during the occurrence of a clock pulse.
Two units, such as a CPU and an I/O interface, are designed independently of each other.
If the registers in the interface share a common clock with the CPU registers, the transfer between
the two units is said to be synchronous.
In most cases, the internal timing in each unit is independent from the other in that each uses its
own private clock for internal registers.
In that case, the two units are said to be asynchronous to each other.
Asynchronous data transfer between two independent units requires that control signals be
transmitted between the communicating units to indicate the time at which data is being
transmitted.
One way of achieving this is by means of a strobe pulse supplied by one of the units to indicate to
the other unit when the transfer has to occur.
Another method commonly used is to accompany each data item being transferred with a control
signal that indicates the presence of data in the bus.
The unit receiving the data item responds with another control signal to acknowledge receipt of
the data.
The strobe pulse method and the handshaking method of asynchronous data transfer are not
restricted to I/O transfers.
In fact, they are used extensively on numerous occasions requiring the transfer of data between
two independent units.
In the general case we consider the transmitting unit as the source and the receiving unit as the
destination.
For example, the CPU is the source unit during an output or a write transfer and it is the destination
unit during an input or a read transfer.
It is customary to specify the asynchronous transfer between two independent units by means of a
timing diagram that shows the timing relationship that must exist between the control signals and
the data in buses.
The sequence of control during an asynchronous transfer depends on whether the transfer is
initiated by the source or by the destination unit.
The internal operations in an individual unit of a digital system are synchronized using clock pulse.
The strobe control method of asynchronous data transfer operates a single control line to
time each transfer.
A strobe pulse is supplied by one unit to indicate to the other unit when the transfer has to
occur.
The strobe can be activated by either the source or the destination unit.
Typically, the bus has multiple lines to transfer an entire byte or word.
The strobe is a single line that informs the destination unit when a valid data word
is available in the bus.
As shown in the timing diagram of Fig. (b), the source unit first places the data on
the data bus.
After a brief delay to ensure that the data settle to a steady value, the source activates
the strobe pulse.
The information on the data bus and the strobe signal remain in the active state for
a sufficient time period to allow the destination unit to receive the data.
Often, the destination unit uses the falling edge of the strobe pulse to transfer the
contents of the data bus into one of its internal registers.
The source removes the data from the bus a brief period after it disables its strobe
pulse.
Actually, the source does not have to change the information in the data bus.
The fact that the strobe signal is disabled indicates that the data bus does not contain
valued data.
New correct data will be available only after the strobe is allowed again.
Thus, new valid data will be available only after the strobe is enabled again.
In this case, the strobe may be a memory-write control signal from the CPU to a
memory unit.
The CPU places the word on the data bus and informs the memory unit, which is
the destination.
In this case the destination unit activates the strobe pulse, informing the source to
provide the data.
The source unit responds by placing the requested binary information on the data
bus.
The data must be valid and remain in the bus long enough for the destination unit
to accept it.
The falling edge of the Strobe pulse can be used again to trigger a destination
register.
The source removes the data from the bus after a predetermined time interval.
In many computers the strobe pulse is actually controlled by the clock pulses in the
CPU.
The CPU is always in control of the buses and informs the external units how to
transfer data.
For example, the strobe of above Fig. could be a memory - write control signal from
the CPU to a memory unit.
The source, being the CPU, places a word on the data bus and informs the memory
units
The destination, the CPU, initiates the read operation to inform the memory, which
is the source, to place a selected word into the data bus.
The transfer of data between the CPU and an interface unit is similar to the strobe
transfer just described.
2. Handshaking Method
The disadvantage of the strobe method is that the source unit that initiates the transfer has
no way of knowing whether the destination unit has actually received the data item that
was placed in the bus.
Similarly, a destination unit that initiates the transfer has no way of knowing whether the
source unit has actually placed the data on the bus,
The handshake method solves this problem by introducing a second control signal that
provides a reply to the unit that initiates the transfer.
One control line is in the same direction as the data flow in the bus from the source to the
destination.
It is used by the source unit to inform the destination unit whether there are valid data in
the bus.
The other control line is in the other direction from the destination to the source.
It is used by the destination unit to inform the source whether it can accept data.
The sequence of control during the transfer depends on the unit that initiates the transfer.
Figure below shows the data transfer procedure when initiated by the source.
The two handshaking lines are data valid, which is generated by the source unit, and data
accepted, generated by the destination unit.
The timing diagram shows the exchange of signals between the two units.
The sequence of events listed in part (c) shows the four possible states that the system can
be at any given time.
The source unit initiates the transfer by placing the data on the bus and enabling its data
valid signal.
The data accepted signal is activated by the destination unit after it accepts the data from
the bus.
The source unit then disables its data valid signal, which invalidates the data on the bus.
The destination unit then disables its data accepted signal and the system goes into its initial
state.
The source does not send the next data item until after the destination unit shows its
readiness to accept new data by disabling its data accepted signal.
This scheme allows arbitrary delays from one state to the next and permits each unit to
respond at its own data transfer rate.
The destination - initiated transfer using handshaking lines is shown in figure below.
Note that the name of the signal generated by the destination unit has been changed to ready
from data to reflect its new meaning.
The source unit in this case does not place data on the bus until after it receives the ready
for data signal from the destination unit.
From there on, the handshaking procedure follows the same pattern as in the source-
initiated case.
Note that the sequence of events in both cases would be identical if we consider the ready
for data signal as the complement of data accepted.
In fact, the only difference between the source-initiated and the destination-initiated
transfer is in their choice of initial state.
It consists of signals:
DATA VALID: if ON tells data on the data bus is valid otherwise invalid.
REQUEST FOR DATA: if ON requests for putting data on the data bus.
DATA VALID: if ON tells data is valid on the data bus otherwise invalid data.
The handshaking scheme provides a high degree of flexibility and reliability because the
successful completion of a data transfer relies on active participation by both units.
Such an error can be detected by means of a timeout mechanism, which produces an alarm
if the data transfer is not completed within a predetermined time.
The timeout is implemented by means of an internal clock that starts counting time when
the unit enables one of its handshaking control signals.
If the return handshake signal does not respond within a given time period, the unit assumes
that an error has occurred.
The timeout signal can be used to interrupt the processor and hence execute a service
routine that takes appropriates error recovery action.
The transfer of data between two units may be done in parallel or serial.
In parallel data transmission, each bit of the message has its own path and the total message is
transmitted at the same time.
This means that an n-bit message must be transmitted through in n separate conductor paths.
In serial data transmission, each bit in the message is sent in sequence one at a time.
This method requires the use of one conductor and a common ground.
Serial transmission is slower but is less expensive since it requires only one conductor.
In synchronous transmission, the two units share a common clock frequency and bits are
transmitted continuously at the rate dictated by the clock pulses.
In long-distant serial transmission, each unit is driven by a separate clock of the same frequency.
Synchronization signals are transmitted periodically between the two units to keep their clocks in
step with each other.
In asynchronous transmission, binary information is sent only when it is available and the line
remains idle when there is no information to be transmitted.
Serial asynchronous data transmission technique used in many interactive terminals employs
special bits that are inserted at both ends of the character code.
With this technique, each character consists of three parts: a start bit, the character bits, and stop
bits.
The convention is that the transmitter rests at the 1-state when no characters are transmitted.
The first bit, called the start bit, is always a 0 and is used to indicate the beginning of a character.
The last bit called the stop bit is always a 1.
A transmitted character can be detected by the receiver from knowledge of the transmission rules:
1) When a character is not being sent, the line is kept in the 1- state.
2) The initiation of a character transmission is detected from the start bit, which is always 0.
4) After the last bit of the character is transmitted, a stop bit is detected when the line returns
to the 1-state for at least one bit time.
As an illustration, consider the serial transmission of a terminal whose transfer rate is 10 characters
per second.
Each transmitted character consists of a start bit, eight information bits and two stop bits, for a
total of 11 bits.
Ten characters per second means that each character takes 0.1s for transfer.
Since there are 11 bits to be transmitted, it follows that the bit time is 9.09 ms.
Ten characters per second with an 11- bit format has a transfer rate of 110 baud.
The baud rate is defined as the rate at which serial information is transmitted and is equivalent to
the data transfer in bits per second.
Every time a key is pressed, the terminal sends 11 bits serially along a wire.
To print a character in the printer, an 11-bit message must be received along another wire.
The transmitter accepts an 8-bit character from the computer and proceeds to send a serial 11-bit
message into the printer line.
The receiver accepts a serial 11 - bit message from the keyboard line and forwards the 8-bit
character code into the computer.
Integrated circuits are available which are specifically designed to provide the interface between
computer and similar interactive terminals.
The figure below shows the block diagram of an asynchronous communication interface.
The interface is initialized for a particular mode of transfer by means of a control byte that is loaded
into its control register.
Control register define baud rate, no of bits in each character, whether to generate and check parity,
and no. of stop bits.
The transmitter register accepts a data byte from the CPU through the data bus.
The CPU can select the receiver register to read the byte through the data bus.
The bits in the status register are used for input and output flags and for recording certain errors
that may occur during the transmission.
The CPU can read the status register to check the status of the flag bits and to determine if any
errors have occurred.
The chip select and the read and write control lines communicate with the CPU.
The chip select (CS) input is used to select the interface through the address bus.
The register select (RS) is associated with the read (RD) and writes (WR) controls.
The register selected is a function of the RS value and the RD and WR status, as listed in the table
accompanying the diagram.
MODES OF TRANSFER
Binary information received from an external device is usually stored in memory for later
processing.
Information transferred from the central computer into an external device originates in the memory
unit.
The CPU merely executes the I/O instructions and may accept the data temporarily, but the
ultimate source or destination is the memory unit.
Data transfer between the central computer and I/O devices may be handled in a variety of modes.
Some modes use the CPU as an intermediate path; other transfer the data directly to and from the
memory unit.
Data transfer to and from peripherals may be handled in one of three possible modes:
In this section, we present the main hardware components required for communications
between the processor and I/O devices.
The way according to which such communication take place (protocol) is also indicated.
The protocol has to be programmed in the form of routines that run under the control of the
CPU.
Consider, for example an input operation from device 6 (could be the keyboard) in the case
of shared I/O arrangement.
Let us assume that there are eight different I/O devices connected to the processor as shown
in figure below;
1) The processor executes an input instruction from device 6, for example, INPUT 6.
The effect of executing this instruction is to send the device number to the address
decoder circuitry in each input device in order to identify the specific input device
to be involved.
In this case, the output of the decoder is Device #6 will be enabled, while the output
of all other decoders will be disabled.
2) The buffers (in the figure we assumed that there are eight such buffers) holding the
data in the specified input device (Device #6) will be enabled by the output of the
address decoder circuitry.
3) The data output of the enabled buffers will be available on the data bus.
4) The instruction decoding will get the data available on the data bus into the input
of a particular register in the CPU, normally the accumulator.
Output operations can be performed in a way similar to the input operation explained
above.
The only difference will be from a specific CPU register to the output register in the
specified output device.
The I/O operations performed in this manner are called programmed I/O.
A complete instruction fetch, decode and execute cycle will have to be executed for every
input and output operation.
Programmed I/O is useful in cases whereby one character at a time is to be transferred, for
example, keyboard and character mode printers.
One point that was overlooked in the above description of the programmed I/O is how to
handle the substantial speed difference between I/O devices and the processor.
A mechanism should be adopted in order to ensure that a character sent to the output
register of an output device, such as a screen, is not overwritten by the processor (due to
the processor’s high speed) before it is displayed and that a character available in the input
register of a keyboard is read only once by the processor.
This brings up the issue of the status of the input and output devices.
A mechanism that can be implemented requires the availability of the Status Bit (Bin) in
the interface of each input device and Status Bit (Bin) in the interface of each output device.
Whenever an input device such as a keyboard has a character available in its input register,
it indicates that by setting Bin = 1.
When the program sees that Bin = 1, it will interpret that to mean a character is available
in the input register of the device.
Reading such character will require executing the protocol explained above.
Whenever the character is read, then the program can reset Bin = 0, thus avoiding multiple
read of the same character.
In a similar manner, the processor can deposit a character in the output register of an output
device such as a screen only when Bout = 0.
It is only after the screen has displayed the character that it sets Bout = 1, indicating to the
program that the monitors Bout that the screen is ready to receive the next character.
The process of checking the status of I/O devices in order to determine their readiness for
receiving and /or sending characters, is called software I/O polling.
In the figure, each of the N I/O devices has access to the interrupt line INR, the processor
polls the devices to determine the requesting device.
Upon recognizing the arrival of a request (called interrupt Request) on INR, the processor
polls the devices to determine the requesting device.
The priority of the requesting device will determine the order in which addresses are put
on the polling lines.
The address of the highest priority is put first, followed by the next priority, and so on until
the least priority device.
In addition to the I/O polling, two other mechanisms can be used to carry out I/O operations.
It is often necessary to have the normal flow of a program interrupted, for example, to react
to abnormal events, such as power failure.
An interrupt can also be used in time-sharing systems to allocate CPU time among different
programs.
The instruction set of modern CPUs often include instructions that mimic the different
actions of hardware interrupts.
When the CPU is interrupted, it is required to discontinue its current activity, attend to the
interrupting condition (serve the interrupt) and then resume its activity from wherever it
stopped.
Discontinuity of the processor’s current activity requires finishing executing the current
instruction, saving the processor status (mostly in the form of pushing register values onto
a stack), and transferring control (jump) to what is called the interrupt service routine (ISR).
The service offered to an interrupt will depend on the source of the interrupt.
For example, if the interrupt is due to power failure, then the action taken will be to save
the values of all processor registers and pointers such that resumption of correct operation
can be guaranteed upon power return.
In the case of an I/O interrupt, serving an interrupt means to perform the required data
transfer.
Upon finishing serving an interrupt, the processor returns the original status by popping the
relevant values from the stack.
Once the processor returns to the normal state, it can enable sources of interrupt again.
One important point that was overlooked in the above scenario is the issue of serving
multiple interrupts, for example, the occurrence of yet another interrupt while the processor
is currently serving an interrupt.
Response to the new interrupt will depend upon the priority of the newly arrived interrupt
with respect to that of the interrupt being currently served.
If the newly arrived interrupt has priority less than or equal to that of the currently served
one, then it can wait until the processor finishes serving the current interrupt.
If, on the other hand, the newly arrived interrupt has priority higher than that of the
currently served interrupt, for example, power failure interrupt occurring while serving an
I/O interrupt, then the processor will have to push its status onto the stack and serve the
higher priority.
Correct handling of multiple interrupts in terms of storing and restoring the correct
processor status is guaranteed due to the way the push and pop operations are performed.
For example, to serve the first interrupt, STATUS 1 will be pushed onto the stack.
Upon receiving the second interrupt, STATUS 2 will be pushed onto the stack.
Upon serving the second interrupt, STATUS 2 will be popped out of the stack and upon
serving the first interrupt, STATUS 1 will be popped out of the stack.
It is possible to have the interrupting device identify itself to the processor by sending a
code following the interrupt request.
The code sent by a given I/O device can represent its I/O address or the memory address
location of the start of the ISR for that device.
The main idea of direct memory address (DMA) is to enable peripheral devices to cut out
the “middle man” role of the CPU in data transfer.
It allows peripheral devices to transfer data directly from and to memory without the
intervention of the CPU.
Having peripheral devices access memory directly would allow the CPU to do other works,
which would lead to improved performance especially in the cases of large transfers.
The DMA controller is a piece of hardware that controls one or more peripheral devices.
It allows devices to transfer data to or from the system’s memory without the help of the
processor.
In a typical DMA transfer, some event notifies the DMA controller that data needs to be
transferred to or from memory.
Both the DMA and CPU use memory bus and only one or the other can use the memory at
the same time.
The DMA controller then sends a request to the CPU asking its permission to use the bus.
The CPU returns an acknowledgment to the DMA controller granting it bus access.
The figure below shows two control signals in the CPU that facilitate the DMA transfer.
The bus request (BR) input is used by the DMA controller to request the CPU to relinquish
control of the buses.
The DMA can now take control of the bus to independently conduct memory transfers.
When the transfer is completed the DMA relinquishes its control of the bus to the CPU.
Processors that support DMA provide one or more input signals that the bus requester can
assert to gain control of the bus and one or more output signals that the CPU asserts to
indicate it has relinquished the bus.
The figure below shows how the DMA controller shares the CPU’s memory bus.
Typical setup parameters include the address of the source area, the address of the
destination area, the length of the block, and whether the DMA controller should generate
a processor interrupt once the block transfer is completed.
A DMA controller has an address register, a word counter register and a control register.
The address register contains an address that specifies the memory location of the data to
be transferred.
It is typically possible to have the DMA controller automatically increment the address
register after each word transfer, so that the next transfer will be from the next memory
location.
Direct memory access data transfer can be performed in burst mode or single cycle mode.
In burst mode, the DMA controller keeps control of the bus until all the data has
been transferred to (from) memory from (to) the peripheral device.
This mode of transfer is needed for fast devices where data transfer cannot be
stopped until the entire transfer is done.
In single-cycle mode (cycle stealing), the DMA controller relinquishes the bus after
each transfer of one word.
This minimizes the amount of time that the DMA controller keeps the CPU from
controlling the bus, but it requires that the bus request/acknowledge sequence be
performed for every single transfer.
The single-cycle mode is preferred if the system cannot tolerate more than a few
cycles of added interrupt latency or peripheral devices can buffer very large
amounts of data, causing the DMA controller to tie up the bus for an excessive
amount of time.
2) Data is moved (increasing the address in memory, and reducing the count of words
to be moved).
3) When word count reaches zero, the DMA informs the CPU of the termination by
means of an interrupt.
4) The CPU regains access to the memory bus.
Each channel has associated with it an address register and a count register.
To initiate a data transfer the device sets up the DMA channel’s address and count registers
together with the direction of the data transfer, read or write.
While the transfer is taking place, the CPU is free to do other things.
Some devices have a fixed DMA channel, while others are more flexible, where the device
driver can simply pick a free DMA channel to use
Advantages
1) DMA speedups the memory operations by bypassing the involvement of the CPU.
3) For each transfer, only a few numbers of clock cycles are required
Disadvantages
1) Cache coherence problem can be seen when DMA is used for data transfer.
The IOP is similar to a CPU except that it is designed to handle the details of I/O processing.
Unlike the DMA controller that must be set up entirely by the CPU, the IOP can fetch and execute
its own instructions.
The block diagram of a computer with two processors is shown in the figure below.
The memory unit occupies a central position and can communicate with each processor by means
of direct memory access.
The CPU is responsible for processing data needed in the solution of computational tasks.
The IOP provides a path for transfer of data between various peripheral devices and the memory
unit.
The data formats of peripheral devices differ from memory and CPU data formats.
The IOP must structure data words from many different sources.
For example, it may be necessary to take four bytes from an input device and pack them into on
32-bit word before the transfer to memory.
Data are gathered in the IOP at the device rate and bit capacity while the CPU is executing its own
program.
After the input data are assembled into a memory word, they are transferred from IOP directly into
memory by "stealing" one memory cycle from the CPU.
Similarly, an output word transferred from memory to the IOP is directed from the IOP to the
output device at the device rate and bit capacity.
The communication between the IOP and the devices attached to it is similar to the program control
method of transfer.
The way by which the CPU and IOP communicate depends on the level of sophistication included
in the system.
In most computer systems, the CPU is the master while the IOP is a slave processor.
The CPU is assigned the task of initiating all operations, but I/O instructions are executed in the
IOP.
CPU instructions provide operations to start an I/O transfer and also to test I/O status conditions
needed for making decisions on various I/O activities.
The IOP, in turn, typically asks for CPU attention by means of an interrupt.
It also responds to CPU requests by placing a status word in a prescribed location in memory to
be examined later by a CPU program.
When an I/O operation is desired, the CPU informs the IOP where to find the I/O program and
then leaves the transfer details to the IOP.
PRIORITY INTERRUPT
A priority interrupt is a system which decides the priority at which various devices, which
generates the interrupt signal at the same time, will be serviced by the CPU.
The system has authority to decide which conditions are allowed to interrupt the CPU, while some
other interrupt is being serviced.
Generally, devices with high speed transfer such as magnetic disks are given high priority and
slow devices such as keyboards are given low priority.
When two or more devices interrupt the computer simultaneously, the computer services the device
with the higher priority first.
Types of Interrupts:
1) Hardware Interrupts
When the signal for the processor is from an external device or hardware then this interrupts
is known as hardware interrupt.
Let us consider an example: when we press any key on our keyboard to do some action,
then this pressing of the key will generate an interrupt signal for the processor to perform
certain action.
a) Maskable Interrupt
The hardware interrupts which can be delayed when a much high priority interrupt has
occurred at the same time.
b) Non Maskable Interrupt
The hardware interrupts which cannot be delayed and should be processed by the
processor immediately.
2) Software Interrupts
The interrupt that is caused by any internal system of the computer system is known as a
software interrupt.
a) Normal Interrupt
The interrupts that are caused by software instructions are called normal software
interrupts.
b) Exception
Unplanned interrupts which are produced during the execution of some program are
called exceptions, such as division by zero.
This is a way of deciding the interrupt priority which consists of serial connection of all the devices
which generates an interrupt signal.
The device with the highest priority is placed at the first position followed by lower priority devices
and the device which has lowest priority among all is placed at the last in the chain.
If any device has interrupt signal in low level state, then interrupt line goes to low level state and
enables the interrupt input in the CPU.
When there is no interrupt the interrupt line stays in high level state.
The CPU respond to the interrupt by enabling the interrupt acknowledge line.
The acknowledge signal passes to next device through PO output only if device 1 is not requesting
an interrupt.
The following figure shows the block diagram for daisy chaining priority system.
Parallel Priority Interrupt
Parallel priority interrupt method uses a register whose bits are set separately by the interrupt signal
from each device.
In addition to the interrupt register, the circuit may include a mask register whose purpose is to
control the status of each interrupt request.
The mask register can be programmed to disable lower-priority interrupts while a higher-priority
device is being serviced.
It can also provide a facility that allows a higher-priority device to interrupt the CPU while a lower
priority device is serviced.
The priority logic for a system of four interrupt sources is shown in figure below;
It consists of an interrupt register whose individual bits are set by external conditions and cleared
by program instructions.
The mask register has the same number of bits as the interrupt register.
By means of program instructions, it is possible to set or reset any bit in the mask register.
Each interrupt bit and its corresponding mask bit are applied to an AND gate to produce the four
inputs to a priority encoder.
In this way, an interrupt is recognized only if its corresponding mask bit is set to 1 by the program.
The priority encoder generates two bits of the vector address, which is transferred to the CPU.
Another output from the encoder sets an interrupt status flip-flop IST when an interrupt that is not
masked occurs.
The interrupt enable flip-flop IEN can be set or cleared by the program to provide an overall control
over the interrupt system.
The outputs of IST ANDed with IEN provide a common interrupt signal for the CPU.
The interrupt acknowledge INTACK signal from the CPU enables the bus buffers in the output
register and a vector address VAD is placed into the data bus.
Priority Encoder
The logic of the priority encoder is such that if two or more inputs arrive at the same time, the input
having the highest priority will take precedence.
BUSES
A Bus in a computer terminology represents a physical connection used to carry a signal from one
point to another.
The signal carried by a bus may represent address, data, control signal or power.
For example, the group of bus lines 1 to 16 in a given computer system may be used to carry the
address of memory locations, and are therefore identified as address lines.
Depending on the signal carried, there exists at least four types of buses: address, data, control and
power.
Data buses carry data, control buses carry control signal, and power buses carry the power-
supply/ground voltage.
The size (number of lines) of the address, data and control bus varies from one system to another.
Consider for example, the bus connecting a CPU and memory in a given system, called the CPU
bus.
The size of the memory in that system is 512M words and each is 32 bits.
In such system, the size of the address bus should be log2(512 * 220) = 29 lines, the size of
the data bus should be 32 lines, and at least one control line (RR = W)should exist in that
system.
In addition to carrying control signals, a control bus can carry timing signals.
These signals used to determine the exact timing for data transfer to and from a bus.
That is they determine when a given computer system component such as the processor, memory,
or I/O devices, can place data on the bus and when they can receive data from the bus.
A bus can be synchronous if data transfer over the bus is controlled by bus clock.
The clock acts as the timing reference for all bus signals.
The bus is asynchronous if data transfer over the bus is based on the availability of data and not a
clock.
To understand the difference between synchronous and asynchronous, let us consider the case
when a master such as a CPU or DMA is the source of data to be transferred to a slave such as an
I/O device.
Bus Protocol
A bus is a communication channel shared by many devices and hence rules are needed to be
established in order for communication to happen correctly.
Design of a bus architecture involves several tradeoffs related to the width of the data bus, data
transfer size, bus protocols, clocking etc.
Depending on whether the bus transactions are controlled by a clock or not, buses are classified
into synchronous and asynchronous buses.
Depending on whether the data bits are sent on parallel wires or multiplexed onto one single wire,
there are parallel and serial buses.
Control of the bus communication in the presence of multiple devices necessitates defined
procedures called arbitration schemes.
In this section, different kinds of buses and arbitration schemes are described.
Synchronous Buses
In synchronous buses, the steps of data transfer take place at fixed clock cycles.
Everything is synchronized to bus clock and clock signals are made available to both master and
slave.
A cycle starts at one raising edge of the clock and ends at the next raising edge, which is the
beginning of the next cycle.
A transfer may take multiple bus cycles depending on the speed parameters of the bus and the two
ends of the transfer.
One scenario would be that on the first clock cycle, the master puts the address on the address bus,
puts data on the data bus, and assert the appropriate control lines.
Slave recognizes its address on the address bus on the first cycle and reads the new values from
the bus in the second cycle.
However, when connecting devices with varying speeds to a synchronous bus, the slowest device
will determine the speed of the bus.
Also, the synchronous bus length could be limited to avoid clock-skewing problems.
A memory read transaction on the synchronous bus typically proceeds as illustrated in figure below
During the first clock cycle the CPU places the address of the location it wants to read, on the
address lines of the bus.
Later during the same clock cycle, once the address lines have stabilized, the READ request is
asserted by the CPU.
Many times, some of these control signals are active low and asserting the signal means they are
pulled low.
A few clock cycles are needed for the memory to perform accessing of the requested location.
In simple non-pipelined bus, these appear as wait states and the data is placed on the bus by the
memory after two or three wait cycles.
The CPU then releases the bus by deasserting the READ control signal.
The write transaction is similar except that the processor is the data source and the write signal is
the asserted.
Different bus architecture synchronize bus operations with respect to the rising edge or falling
edge or level of the clock signal.
Asynchronous Buses
The master asserts the data-ready line (point 1 in the figure) until it sees a data-ready signal.
When the slave sees a dataready signal, it will assert the data-accept line (point 2 in the figure).
The raising of the data-accept line will trigger the failing of the data-ready line and the removal of
the data from the bus.
The failing of the data-ready line (point 3 in the figure) will trigger the falling of the data-accept
line (point 4 in the figure).
This handshaking , which is called fully interlocked, is repeated until the data is completely
transferred.
Handshaking is done to properly conduct the transmission of data between the sender and the
receiver,
For example, in an asynchronous read operation, the bus master puts the address and control
signals on the bus and asserts a synchronized signal.
The synchronization signal from the master prompts the slave to get synchrnozed and once it hass
accessed the data, it asserts its own synchronization signal.
The slave’s synchronization signal indicates to the processor that there is valid data on the bus,
and it reads the data.
The master then deasserts its synchronization signal, which indicates to the slave that the master
has read the data.
Note that there is no clock and that starting and ending of the data transfer are indicated by special
synchronization signals.
Synchronous buses are typically faster than asynchronous buses because there is no overhead to
establish a time reference for each transaction.
Another reason that helps the synchronous bus to operate fast is that the bus protocol is
predetermined and very little logic is involved.
However, synchronous buses are affected by clock skew and they cannot be very long.
But asynchronous buses work well even when they are long because clock skew problems do not
affect them.
Thus asynchronous buses can handle longer physical distances and higher number of devices.
Processor-memory buses are typically synchronous because the devices connected are fast, are
small in number and located in close proximity.
I/O buses are typically asynchronous because many peripherals need only slow data rates and are
physically situated far away.
Bus Arbitration
Bus arbitration is needed to resolve conflicts when two or more devices want to become the bus
master at the same time.
In short, arbitration is the process of selecting the next bus master from among multiple candidates.
In a centralized arbitration schemes, a single arbiter is used to select the next master
A simple form of centralized arbitration uses a bus line, a bus grant and a bus busy line.
Each of these lines is shared by potential masters, which is daisy chained in a cascade.
In the figure, each of the potential masters can submit a bus request at any time.
When a bus request is received at the central bus arbiter, it issues a bus grant by asserting the bus
grant line.
When the potential master that is closest to the (potential master 1) sees the bus grant signal, it
checks to see if it had made a bus request.
If yes, it takes over the bus and stops propagation of the bus grant signal any further.
If it has not made a request, it will simply turn the bus grant signal to the next master to the right
(potential master 2), and so on.
Instead of using shared request and grant lines, multiple bus request and bus grant lines can be
used.
In one scheme, each master will have its own independent request and grant line as shown in the
figure below
The central arbiter can employ any priority levels.
For each priority level, there is a bus request and a bus grant line.
In this scheme, each device is attached to the daisy chain of one priority level.
If the arbiter receives multiple bus request from different levels, it grants the bus to the level with
the highest priority.
The figure below shows an example of four devices included in two priority level.
Potential master 1 and potential master 3 are daisy chained in level 1 and potential master 2 and
potential master 4 are daisy chained in level 2.
Decentralized Arbitration
In decentralized arbitration schemes, priority- based arbitration is usually used in a distributed
fashion.
Each potential master has a unique arbitration number, which is used in resolving conflicts when
multiple requests are submitted.
For example, a conflict can always be resolved in favor of the device with the highest arbitration
number.
The question now is how to determine which device has the highest arbitration number available
to all the devices.
Each device compares that number with its own arbitration number.
Eventually, the requester with the highest arbitration will survive and be granted bus access.
BUS STRUCTURES
In computer architecture, a bus is a subsystem that transfers data between components inside a
computer, or between computers.
Early computer buses were literally parallel electrical wires with multiple connections, but modern
computer buses can use both parallel and bit serial connections.
To achieve a reasonable speed of operation, a computer must be organized so that all its units can
handle one full word of data at a given time.
When a word of data is transferred between units, all its bits are transferred in parallel, that is, the
bits are transferred simultaneously over many wires, or lines, one bit per line.
A group of lines that serves as a connecting path for several devices is called a bus.
In addition to the lines that carry the data, the bus must have lines for address and control purposes.
The simplest way to interconnect functional units is to use single bus as shown below
All units are connected to this bus.
Because the bus can be used for only one transfer at a time, only two units can actively use the bus
at any given time.
Bus control lines are used to arbitrate multiple requests for use of the bus.
The main virtue of the single bus structure is its low cost and is flexibility for attaching peripheral
devices.
Systems that contain multiple buses achieve more concurrency in operations by allowing two or
more transfers to be carried out at the same time.
Processor, memory, input and output devices are connected by system bus which consists of
separate buses as shown below.
They are;
1) Address Bus
It is unidirectional bus.
The address is sent to from CPU to memory and I/O port and hence unidirectional.
2) Data Bus
Data bus is used to carry or transfer data to and from memory and I/O ports.
The processor can read on data lines from memory and I/O port and as well as it can write
data to memory.
3) Control Bus
Control bus is used to carry control signals in order to regulate the control activities.
The CPU sends control signals on the control bus to enable the outputs of addressed
memory devices or port devices.
Reset (RST)
Ready (RDY)
Hold (HLD)
Hold acknowledge (HLDA)
Some electromechanical devices, such as keyboards and printers are relatively slow.
Memory and processor units operate at electronic speeds, making them the fastest parts of a
computer.
Because all these devices must communicate with each other over a bus, an efficient transfer
mechanism that is not constrained by the slow devices and that can be used to smooth out the
differences in timing among processors, memories and external devices is necessary.
A common approach is to include buffer registers with the devices to hold the information during
transfers.
To illustrate this technique, consider the transfer of an encoded character from a processor to a
character printer.
The processor sends the character over the bus to the printer buffer.
Since the buffer is an electronic register, this transfer requires relatively little time.
Once the buffer is loaded, the printer can start printing without further intervention by the
processor.
The bus and the processor are no longer needed and can be released for other activity.
The printer continues printing the character in its buffer and is not available for further transfers
until the process is completed.
Thus, buffer registers smooth out timing differences among processors, memories, and I/O devices.
They prevent a high speed processor from being locked to a slow I/O device during a sequence of
data transfers.
This allows the processor to switch rapidly from one device to another, interweaving it processing
activity with data transfers involving several I/O devices.
The figure below shows traditional bus configuration and the other figure shows high speed bus
configuration.
The traditional bus connection uses three buses: local bus, system bus and expanded bus.
The high speed bus configuration uses high-speed bus along with the three buses used in traditional
bus.
The bus supports connection to high speed LANs such as Fiber Distributed Data Interface (FDDI),
video and graphics workstation controllers, as well as interface controllers to local peripherals.
Auxiliary Memory
Auxiliary storage also known as the auxiliary memory or secondary storage is the memory that
supplements the main memory.
The term non-volatile means that it stores and retains the programs and data even after the
computer is turned off.
Auxiliary storage devices allow the computer to record information semi – permanently so it can
be read later by the same computer or by another computer.
Auxiliary storage devices are also useful in transferring data or programs from one computer to
another.
They also function as backup devices which allow to back up the valuable information.
So even if by some accident the computer crashes and the stored data is unrecoverable, we can
restore it from the back-ups.
The most common types of auxiliary storage devices are magnetic tapes, magnetic disks, floppy
disks, hard disks etc.
1) Sequential
2) Random
Based on the type of access, they are called sequential access media or random media.
In case of sequential access media, the data stored in the media can only be read in sequence and
to get to a particular point on the media, we have to go through all the preceding points.
In contrast, disks are random access also called direct access media because a disk drive can access
any point at random without passing through intervening points.
Other examples of direct access media are floppy diskettes, optical disks etc.
Early forms of auxiliary storage included punched paper tape, punched cards, and magnetic drums.
Since the 1980s, the most common forms of auxiliary storage have been magnetic disks, magnetic
tapes, and optical discs.
a) Flash memory
Flash memory is a non-volatile memory chip used for storing and transferring data
between a personal computer (PC) and digital devices.
It is a type of (EEPROM) often found in USB flash drives, MP3 players, digital
cameras, smartphones, tablet computers, and solid-state drives.
They are highly portable to carry and reliable than other memory devices.
b) Optical disc
The familiar compact disk (CD), used in audio systems was the first practical
application of this technology.
Soon after, the optical technology was adapted to computer environment to provide
high – capacity read only storage referred to as CD – ROM.
The first generation of CDs was developed in the mid 1980 by the Sony and Philips
companies, which also published a complete specification for these devices.
CD Technology:
The optical technology that is used for CD systems is based on a laser light source.
Physical indentations in the surface are arranged along the tracks on the disk.
They reflect the focused beam towards a photo detector which detects the stored binary
patterns.
The laser emit a coherent light beam that is sharply focused on the surface of the disk.
Coherent light consists of synchronized waves that have the same wavelength.
If a coherent light beam is combined with another beam of the same kind, and the two
beams are in phase, then the result will be brighter beam.
But if the waves of the two beams are 180 degrees out pf phase, they will cancel each
other.
Thus if a phot detector is used to detect the beams, it will detect a bright spot in the first
case and a dark spot in the second case.
The surface of this plastic is programmed to store data by indenting it with pits.
The laser source and the photodetector are positioned below the polycarbonate plastic.
The emitted beam travels through this plastic, reflects off the aluminum layer, and
travels back toward the photo detector.
Note that from the laser side, the pits actually appear as bumps with respect to the lands
The figure below shows what happens as the laser beam scans across the disk and
encounters a transition from pit to a land.
Three different positions of the laser source and the detector are shown as would occur
when the disk is rotating.
When the light reflects solely from the pit, or solely from the land, the detector will see
the reflected beam as a bright spot.
But a different situation arises when the beam moves through the edge where the pit
changes to the land, and vice versa.
Thus, the reflected wave from the pit will be 180 degrees out of phase with the wave
reflected from the land, canceling each other.
Hence, at the pit-land and land-pit transitions, the detector will not see reflected beam
and will detect a dark spot.
The figure below depicts several transitions between land and pits.
Each transition, detected as a dark spot is taken to denote the binary 1, and the flat
portions represents 0s, then the detected binary pattern will be shown in the figure.
The storage medium in a magnetic disk system consists of one or more disks mounted
on a common spindle.
The disks are placed in a rotary drive so that the magnetized surface move in close
proximity to read/write heads ad shown below.
Each head consists of a magnetic yoke and a magnetizing coil as shown below
Digital information can be stored on the magnetic film by applying current of suitable
polarity to the magnetizing coil.
This causes the magnetization of the film in the area immediately underneath the head
to switch to a direction parallel to the applied field.
The same head can be used for reading the stored information.
In this case, changes in the magnetic field in the vicinity of the head caused by the
movement of the film relative to the yoke induce voltage in the coil, which now serves
as a sense coil.
The polarity of this voltage is monitored by the control circuitry to determine the state
of magnetization of the film.
Only changes in the magnetic field under the head can be sensed during the read
operation.
Therefore, if the binary state 0 and 1 are represented by two opposite states of
magnetization, a voltage is induced in the head only at 0-to1 and at 1-to0 transitions in
the bit stream.
A long string of 0s or 1s causes an induced voltage only at the beginning and and of
the string.
The modern approach is to combine the clocking information with the data.
d) Magnetic tapes
Magnetic tape is a magnetically coated strip of plastic on which data can be encoded.
Tapes for computers are similar to the tapes used to store music.
Accessing data on tapes, however, is much slower than accessing data on disks.
Because tapes are so slow, they are generally used only for long term storage and
backup.
Cache Memory
Cache memory is a very high speed memory which stores frequently requested copies of the data
and instructions so that they are immediately available to the CPU when needed.
Cache memory acts as a buffer between RAM and the CPU and is used to reduce the average time
to access data from the Main memory.
Cache Performance
When CPU refers to memory and finds the data or instruction within the Cache Memory,
it is known as cache hit.
If the desired data or instruction is not found in the cache memory and CPU refers to the
main memory to find that data or instruction, it is known as a cache miss.
If Tc is time to access cache memory and Tm is the time to access main memory then we
can write:
Tavg = h * Tc + (1-h)*Tm
It is used for bridging the speed mismatch between the fastest CPU and the main
memory.
It does not let the CPU performance suffer due to the slower speed of the main memory.
Execution of Program
Whenever any program has to be executed, it is first loaded in the main memory.
The portion of the program that is mostly probably going to be executed in the near future is kept
in the cache memory.
This allows CPU to access the most probable portion of cache memory at a faster speed.
1) Step-01:
Whenever CPU requires any word of memory, it is first searched in the CPU registers.
Case-01:
If the required word is found in the CPU registers, it is read from there.
Case-02:
If the required word is not found in the CPU registers, Step-02 is followed.
2) Step-02:
When the required word is not found in the CPU registers, it is searched in the cache
memory.
Tag directory of the cache memory is used to search whether the required word is present
in the cache memory or not.
Case-01:
When the CPU refers to memory and finds the word in cache, it is said to produce
a hit.
If the word is not found in cache, it is in main memory and it counts as a miss.
The ratio of the number of hits divided by the total CPU references to memory (hits
plus misses) is the hit ratio.
This high ratio verifies the validity of the locality of reference property.
Case-02:
If the required word is not found in the cache memory, this is known as Cache miss
and Step-03 is followed.
3) Step-03:
When the required word is not found in the cache memory, it is searched in the main
memory.
Page Table is used to determine whether the required page is present in the main memory
or not.
Case-01:
If the page containing the required word is found in the main memory, the page is
mapped from the main memory to the cache memory.
Case-02:
If the page containing the required word is not found in the main memory, a page
fault occurs.
The page containing the required word is mapped from the secondary memory to
the main memory.
Then, the page is mapped from the main memory to the cache memory.
The smallest size cache memory called primary cache is placed closest to the CPU on the
processor chip.
Secondary cache – This is placed between the primary cache and the rest of the memory.
Example
Three level cache organization consists of three cache memories of different size
organized at three different levels as shown below
Size (L1 Cache) < Size (L2 Cache) < Size (L3 Cache) < Size (Main Memory)
Locality of reference
Refers to a phenomenon in which a computer program tends to access same set of memory
locations for a particular time period.
In other words, Locality of Reference refers to the tendency of the computer program to access
instructions whose addresses are near one another.
The property of locality of reference is mainly shown by loops and subroutine calls in a program.
1) In case of loops in program control processing unit repeatedly refers to the set of
instructions that constitute the loop.
2) In case of subroutine calls, every time the set of instructions are fetched from memory.
3) References to data items also get localized that means same data item is referenced again
and again.
There are two ways with which data or instruction is fetched from main memory and get stored in
cache memory.
1) Temporal Locality
Temporal locality means current data or instruction that is being fetched may be needed
soon.
So we should store that data or instruction in the cache memory so that we can avoid again
searching in main memory for the same data.
When CPU accesses the current main memory location for reading required data or
instruction, it also gets stored in the cache memory which is based on the fact that same
data or instruction may be needed in near future.
If some data is referenced, then there is a high probability that it will be referenced again
in the near future.
2) Spatial Locality
Spatial locality means instruction or data near to the current memory location that is being
fetched, may be needed soon in the near future.
Here we are talking about nearly located memory locations while in temporal locality we
were talking about the actual memory location that was being fetched.
Cache mapping is a technique by which the contents of main memory are brought into the cache
memory.
Basic idea is mapping between the cache addresses and the main memory addresses referring the
same unit information.
Main memory is divided into equal size partitions called as blocks or frames.
Cache memory is divided into partitions having same size as that of blocks called as lines.
During cache mapping, block of main memory is simply copied to the cache.
1) Direct Mapping
A particular block of main memory can map only to a particular line of the cache.
The line number of cache to which a particular block can map is given by
Cache line number = (Main Memory Block Address) Modulo (Number of lines in
Cache)
The ith block of main memory has to be placed at the jth block of cache memory, then
Example
Then, block ‘j’ of main memory can map to line number (j mod n) only of the cache.
The possibility of using a random-access memory for the cache is investigated in Fig.
The nine least significant bits constitute the index field and the remaining six bits form the
tag and the line bits.
The number of bits in the index field is equal to the number of address bits required to
access the cache memory.
In the general case, there are 2k words in cache memory and 2n words in main memory.
The n-bit memory address is divided into two fields: k bits for the index field and n − k bits
for the tag field.
The direct mapping cache organization uses the n-bit address to access the main memory
and the k-bit index to access the cache.
The internal organization of the words in the cache memory is as shown in Fig. b above.
Each word in cache consists of the data word and its associated tag.
When a new word is first brought into the cache, the tag bits are stored alongside the data
bits.
When the CPU generates a memory request, the index field is used for the address to access
the cache.
The tag field of the CPU address is compared with the tag in the word read from the cache.
If the two tags match, there is a hit and the desired data word is in cache.
If there is no match, there is a miss and the required word is read from main memory.
It is then stored in the cache together with the new tag, replacing the previous value.
The disadvantage of direct mapping is that the hit ratio can drop considerably if two or
more words whose addresses have the same index but different tags are accessed
repeatedly.
However, this possibility is minimized by the fact that such words are relatively far apart
in the address range (multiples of 512 locations in this example).
To see how the direct-mapping organization operates, consider the numerical example
shown in Fig above.
The word at address zero is presently stored in the cache (index = 000, tag = 00, data =
1220).
Suppose that the CPU now wants to access the word at address 02000.
The cache tag is 00 but the address tag is 02, which does not produce a match.
Therefore, the main memory is accessed and the data word 5670 is transferred to the CPU.
The cache word at index address 000 is then replaced with a tag of 02 and data of 5670.
The direct-mapping example just described uses a block size of one word.
The same organization but using a block size of 8 words is shown in below Fig.
The index field is now divided into two parts: the block field and the word field.
The block number is specified with a 6-bit field and the word within the block is specified
with a 3-bit field.
The tag field stored within the cache is common to all eight words of the same block.
Every time a miss occurs, an entire block of eight words must be transferred from main
memory to cache memory.
Although this takes extra time, the hit ratio will most likely improve with a larger block
size because of the sequential nature of computer programs.
This is because a main memory block can map only to a particular line of the cache.
Thus, the new incoming block will always replace the existing block (if any) in that
particular line.
i. Tag
We had to replace a cache memory block even when other blocks in cache
memory were present as empty.
2) Associative Mapping
To avoid high conflict miss, any block of main memory can be placed anywhere in cache
memory.
A block of main memory can map to any line of the cache that is freely available at that
moment.
This makes fully associative mapping faster and flexible than direct mapping.
The associative memory stores both the address and content (data) of the memory word.
This permits any location in cache to store any word from main memory.
The diagram below shows three words presently stored in the cache.
The address value of 15 bits is shown as a five-digit octal number and its corresponding
12-bit word is shown as a four-digit octal number.
A CPU address of 15 bits is placed in the argument register and the associative memory is
searched for a matching address.
If the address is found, the corresponding 12 - bit data is read and sent to the CPU.
The address - data pair is then transferred to the associative cache memory.
If the cache is full, an address - data pair must be displaced to make room for a pair that is
needed and not presently in the cache.
The decision as to what pair is replaced is determined from the replacement algorithm that
the designer chooses for the cache.
A simple procedure is to replace cells of the cache in round-robin order whenever a new
word is requested from main memory.
Example
Thus, any block of main memory can map to any line of the cache.
Had all the cache lines been occupied, then one of the existing blocks will have to
be replaced.
Replacement algorithm suggests the block to be replaced if all the cache lines are
occupied.
Thus, replacement algorithm like FIFO (First In First Out) Algorithm, LRU (Least
Recently Used) Algorithm) etc. is employed.
In associative mapping, we divide the main memory addressing space into two parts;
Advantage
0 % of conflict miss
Disadvantage
Cache lines are grouped into sets where each set contains k number of lines.
Set size is always in the powers of 2 i.e. if cache has 2 blocks per set, then it is
called 2 – way set associative
A particular block of main memory can map to only one particular set of the cache.
However, within that set, the memory block can map to any cache line that is freely
available.
Mapping Technique
The set of the cache to which a particular block of the main memory can map is
given by
A new block from main memory can be placed anywhere within the set
Example
Here,
Block ‘j’ of main memory can map to set number (j mod 3) only of the
cache.
Within that set, block ‘j’ can map to any cache line that is freely available
at that moment.
If all the cache lines are occupied, then one of the existing blocks will have
to be replaced.
It was mentioned previously that the disadvantage of direct mapping is that two
words with the same index in their address but with different tag values cannot
reside in cache memory at the same time.
Each data word is stored together with its tag and the number of tag-data items in
one word of cache is said to form a set.
An example of a set-associative cache organization for a set size of two is shown
in Fig.
Each index address refers to two data words and their associated tags.
Each tag requires six bits and each data word has 12 bits, so the word length is 2(6
+ 12) = 36 bits.
It can accommodate 1024 words of main memory since each word of cache contains
two data words.
The octal numbers listed in above Fig. are with reference to the main memory
content illustrated in Fig.(a).
The words stored at addresses 01000 and 02000 of main memory are stored in cache
memory at index address 000.
Similarly, the words at addresses 02777 and 00777 are stored in cache at index
address 777.
When the CPU generates a memory request, the index value of the address is used
to access the cache.
The tag field of the CPU address is then compared with both tags in the cache to
determine if a catch occurs.
The comparison logic is done by an associative search of the tags in the set similar
to an associative memory search: thus the name “set-associative”.
The hit ratio will improve as the set size increases because more words with the
same index but different tag can reside in cache.
However, an increase in the set size increases the number of bit s in words of cache
and requires more complex comparison logic.
In set associative mapping, we divide the main memory addressing space into three parts;
i. Word/block/line offset
Number of bits required to identify the corresponding set number inside the
cache where a main memory block will be placed.
iii. Tag
Number of bits required to compare two block which belong to the same set
Associative Memory
Many data-processing applications require the search of items in a table stored in memory.
They use object names or number to identify the location of the named or numbered object within
a memory space.
For example, an account number may be searched in a file to determine the holder’s name and
account status.
To search an object, the number of accesses to memory depends on the location of the object and
the efficiency of the search algorithm.
Moreover, searches can be done on an entire word or on a specific field within a word.
The time required to find an object in memory can be reduced considerably if objects are selected
based on their contents, not on their locations.
A memory unit addressed by the content is called an associative memory or content addressable
memory (CAM).
This type of memory is accessed simultaneously and in parallel on the basis of data content rather
than by specific address or location.
An associative memory is more expensive than a random access memory because each cell must
have storage capability as well as logic circuits for matching its content with an external argument.
For this reason, associative memories are used in applications where the search time is very critical
and must be very short.
In general, the instructions which are available in a special kind of memories like cache, ROM and
Virtual memory are addressed by content and not by address location.
Hardware Organization
The argument register (A) and key register (K) each have n bits per word.
The match register M has m bits, one for each memory word.
Each word in memory is compared in parallel with the content of the argument register.
The words that match with the word stored in the argument register set corresponding bits in the
match register.
After the matching process, those bits in the match register that have been set indicate the fact that
their corresponding words have been matched.
Therefore, reading can be accomplished by a sequential access to memory for those words whose
corresponding bits in the match register have been set.
The key register provides a mask for choosing a particular field or bits in the argument word.
Only those bits in the argument register having 1’s in their corresponding position of the key
register are compared.
For example, if an argument register A and the key register K have the bit configuration shown
below.
Only the three rightmost bits of A are compared with memory words because K has 1’s in these
positions
The entire argument is compared with each memory word if the key register contains all 1’s.
A 11011010
K 00010111
Word 1 01010010 Match
Word 2 11011100 No match
The figure below shows the associated memory with cells of each register.
The cells in the memory array are marked by the letter C with two subscripts.
The first subscript gives the word number and the second subscript gives the bit position in the
word.
A bit Aj in the argument register is compared with all the bits in column j of the array provided
that Kj = 1.
If a match occurs between all the unmasked bits of the argument and the bits in word i, the
corresponding bit Mi in the match register is set to 1.
If one or more unmasked bits of the argument and the word do not match, Mi is cleared to 0.
Virtual Memory
Introduction
Back in the days of 8-bit computers, and today we have microcontrollers, any application or
program that is running on the CPU has access to the entire physical memory and it basically
assumes it is the only program running on that CPU.
It can write to a particular address say 4095, that address is in fact somewhere in the physical
RAM, and it writes something there and it can as well write to a different address and it goes there.
There is a one to one relationship between the addressing of the physical RAM and the addresses
of the computer program uses.
Physical Address (PA) – What the hardware uses to talk to RAM and address space is
determined by how much RAM is installed
The one to one relationship is fine when there is only one program running.
For example, on a 32 bit machine, that process thinks it has 32 bit worth of memory to play with,
which is of course 4G and the way it works is shown below.
When the process wants to access a particular address, there is a particular piece of hardware in
the CPU called the Memory Management Unit (MMU) and what it does is that it maps the virtual
address that the process thinks it is running in to an actual physical address somewhere in RAM
memory.
The idea of partitioning up the memory is taken over by the operating system, the application
doesn’t need to worry about partitioning and thinks it is the only application running and it can
write to whatever address in the memory it has been given to and it actually doesn’t care about
other addresses from other processes because it has its own virtual address space.
i. First of all, you have to decide where you are putting each program in the memory
ii. Secondly, each program in the memory has to be careful not to overwrite the data of
the other program running.
• 10 Bytes forward
• 15 bytes backward
• No use of absolute address like 4095 because that could actually belong to another
task.
For example, if you are trying to run two programs and you allocate one bit of memory
for one program and another bit of memory for another program.
When the first program exits, and you try to write the second one, it might fit in that
space of the first half or maybe it’s a bit smaller, so there is a gap left.
And when you run the third one, it can’t fit in that gap so it goes somewhere else in
memory and you actually get this little gap start to appear in memory which is called
memory fragmentation and that is a real problem and you eventually run out of
memory.
To get round the problems where we have more than one program running, we have the technology
called virtual memory and each program or application that run on windows or other platforms
thinks it is the only program running and it has access to all of the address space.
The figure below shows a scenario where two programs are running and accessing memory.
Application 1 (App1) has an address space from 0 through to 5242880 which is about 5MB of
memory and we also have application 2 (App2) which is also about 5MB of memory.
And what we actually see is that even though it is from 0 to 524880, in the physical memory, App1
might actually start at 5242880 and App2 actually starts at 10485760.
The virtual address 0 in both applications is actually mapped to different places in the physical
RAM.
Because there is mapping going on, the applications can be mapped to absolutely anywhere that
the operating system wants to put them.
The figure below is the refinement of the previous figure in which a program has been divided into
two parts and mapped in different memory spaces.
App1 has been divided into two parts and the first half of it is mapped into memory before App1
and the second part is mapped into some memory after App1.
But App1 and App 2 don’t know anything about this, they think they are running in their address
space from 0 to the end of their program.
Virtual memory is a technique used to provide an illusion of presence of large main memory to the
computer user, when in actual it’s not present physically.
The physical main memory is not as large as the address space spanned by an address issued by
the processor.
When a program does not completely fit into the main memory, the parts of it not currently being
executed are stored on secondary storage devices such as magnetic disks.
The operating system moves programs and data automatically between the main memory and
secondary storage and this process is known as swapping.
The figure below shows a typical organization that implements virtual memory.
Address Translation
The previous scenarios we have looked at are one to one mapping, every time you have particular
address, you have a table that gets looked up by the MMU and tells it where to put it in physical
RAM.
The challenge comes when we have large size programs and more than one program.
We need a lot of entries in the look up table to perform the mapping and that will create another
challenge as a lot of space will be used for mapping information.
A simple method for mapping virtual address into physical addresses is to assume that all programs
and data are composed of fixed length units.
The process of translating a virtual address into physical address is known as address translation.
To get round the issues that are as a result of mapping and memory space, the physical memory is
divided into fixed length blocks called pages.
They constitute the basic unit of information that is moved between the main memory and the disk
whenever the translation mechanism determines that a move is required.
Pages should not be too small because the access time of a magnetic disk is much longer (several
milliseconds) than the access time of the main memory.
On the other hand, if pages are too large, it is possible that a substantial portion of a page may not
be used, yet this unnecessary data will occupy valuable space in the main memory.
When an application requests something over virtual address, it actually goes to the MMU, the
MMU finds out which page it’s in and redirects it to the physical address of that particular page.
However, what happens when the address is in the middle of the page, the start of the page is easier
as we can have a one to one look?
An area in the main memory that can hold one page is called a page frame.
The starting location of the page table is stored in a register called page table base register (PTBR).
[
By adding the virtual address page number to the content of this register, the address of the
corresponding entry in the page table is obtained.
Each entry in the page table also includes some control bits that describe the status of the page
while it is in the main memory.
One bit indicates the validity of the page, that is, whether the page is actually loaded in the main
memory.
Another bit indicates whether the page has been modified during its residency in the memory.
So the first 12 bits of the address are copied directly from the virtual address to the physical address
The remaining 20 bits are used us the page look in the page table.
The remaining 20 bit address is looked up in the page table and when a page table entry is found,
that gives the 20 bits for the other part of the physical address.
The combination of the page address and the offset gives you an actual physical address in RAM.
Now one interesting question is where are all these tables (page table) held?
They are not held in the CPU but held in the RAM and mapped to cache memory in the CPU.
CPU are designed with a cache of recently looked up address and is actually called the Translation
Lookaside Buffer (TLB) and what is does is that whenever and address gets translated, it gets stuck
in the cache.
The virtual memory mechanism bridges the gap between the main memory and the secondary
storage and is usually implemented in part by the software techniques.
As a program runs, the memory addresses that it uses to reference its data is the logical address.
The real time translation to the physical address is performed in hardware by the CPU’s Memory
Management Unit (MMU).
MMU is a hardware device or circuit that supports virtual memory and paging by translating virtual
addresses into physical addresses
The MMU has two special registers that are accessed by the CPU’s control unit.
Data to be sent to main memory or retrieved from memory is stored in the Memory Data
Register (MDR).
The desired logical memory address is stored in the Memory Address Register (MAR).
The address translation is also called address binding and uses a memory mapping that is
programmed by the operating system.
The job of the operating system is to load the appropriate data into the MMU when a processes is
started and to respond to the occasional page faults by loading the needed memory and updating
the memory map.
So what happens if the MMU can’t find an entry in its table for a particular virtual address?
In that case, the MMU raises a page fault and it goes back to the kernel to inform it that it can find
the address.
i. First of all, the application is trying to access an address which is not allowed to
access, it has not been allocated that memory.
The kernel would not give the application the physical page of RAM until it actually
starts to write to it.
Before memory addresses are loaded on to the system bus, they are translated to physical addresses
by the MMU.
Virtual memory fills the gap between the physical memory and secondary memory (disc), when
one large program is being executed and it does not fit into the available physical memory, parts
of it (pages) are moved from the disk into the main memory when they are to be executed.
Some software routines are needed to manage this movement of program segments.
It is convenient to assemble the operating system routines into virtual address space, called system
space, which is separate from the virtual space in which user application programs reside.
This is arranged by providing a separate page table for each user program.
The MMU used a page table base register to determine the address of the table to be used in the
translation process.
Hence, by changing the contents of this register, the operating system can switch from one space
to another.
The physical main memory is this shared by the active pages of the system space and several user
spaces.
However, only the pages that belong to one of these spaces are accessible at any given time.
In any computer system in which independent user programs coexist in the main memory, the
notion of protection must be addressed.
No program should be allowed to destroy either the data or instructions of other programs in the
memory.
Recall that in the simplest case, the processor has two states, the supervisor state and the user state.
As the names suggest, the processor is usually placed in the supervisor stare when the operating
system routines are being executed and in the user state to execute user programs.
In the user state, some machine instructions (Access I/O devices, poll for I/O, perform DMA, catch
hardware interrupt, manipulate memory management, set up page tables, load/flush the TLB and
CPU caches etc.) cannot be executed.
These privileged instructions which include such operations as modifying the page table base
register, can only be executed while the processor is in the supervisor state.
Hence, a user program is prevented from accessing the page tables of other user spaces or of the
system space.
It is sometime desirable for one application program to have access to certain pages belonging to
another program.
The operating system arrange this by causing these pages to appear in both spaces.
The shared pages will therefore have entries in two different page tables.
The control bits in each table entry can be set to control the access privileges granted to each
program.
For example, one program may be allowed to read and write a given page, while the other program
may be given only read access.
Each application is self-contained, it doesn’t write over other applications memory space
because it has its own virtual address space.
Secondly, it does not matter where the application is in memory because there is the MMU
that does the mapping between the virtual addresses and the physical addresses.
And thirdly, the application doesn’t need to be in one continuous block in memory, it can
be split up over many different parts and it is the MMU of the operating system that makes
sure that each address arrives at the right place in physical RAM and hence we get rid of
the memory fragmentation problem.
As larger size programs are divided into blocks and partially each block is loaded into main
memory as per need.
It allows larger applications to run in systems that don’t offer enough physical RAM alone
to run them.
Occupy the storage space, which may be used otherwise for long term data storage.