0% found this document useful (0 votes)
30 views

Computer Architecture

The document discusses memory hierarchy and technologies. It describes how memory is organized in a hierarchy with faster smaller memory closer to the CPU and slower larger memory further away. It covers cache, virtual memory, and input/output systems. It also discusses different memory technologies like SRAM, DRAM, flash, and disk drives.

Uploaded by

Kaushik C
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Computer Architecture

The document discusses memory hierarchy and technologies. It describes how memory is organized in a hierarchy with faster smaller memory closer to the CPU and slower larger memory further away. It covers cache, virtual memory, and input/output systems. It also discusses different memory technologies like SRAM, DRAM, flash, and disk drives.

Uploaded by

Kaushik C
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

UNIT V - MEMORY AND CONTROL UNIT

Memory hierarchy - Memory technologies – Cache basics – Measuring and improving cache
performance - Virtual memory, TLBs - Input/output system, programmed I/O, DMA and
interrupts, I/O processors.

1.MEMORY HIERARCHY
The principle of locality
“States that programs access a relatively small portion of their address space at any instant of time.”
Eg: just as you accessed a very small portion of the library’s collection. There are two different types
of locality:
■ Temporal locality (locality in time): if an item is referenced, it will tend to be referenced again
soon. If you recently brought a book to your desk to look at, you will probably need to look at it
again soon.
■ Spatial locality (locality in space): if an item is referenced, items whose addresses are close by
will tend to be referenced soon. For example, If you are referring a book you will also refer a book
which are near to that particular book.
 A memory hierarchy consists of multiple levels of memory with different speeds and sizes. The
faster memories are more expensive per bit than the slower memories and thus are smaller.
 A structure that uses multiple levels of memories; as the distance from the processor increases, the
size of the memories and the access time both increase.

Fig:1 The basic structure of memory hierarchy

Today, there are three primary technologies used in building memory hierarchies.
 Main memory is implemented from DRAM (dynamic random access memory), while levels
closer to the processor (caches) use SRAM (static random access memory).
 The third technology, used to implement the largest and slowest level in the hierarchy, is
usually magnetic disk. (Flash memory is used instead of disks in many embedded devices)
 DRAM is less costly per bit than SRAM, although it is substantially slower. The price
difference arises because DRAM uses significantly less area per bit of memory, and DRAMs
thus have larger capacity.
 Because of the differences in cost and access time, it is advantageous to build memory as a
hierarchy of levels. In fig 1 it shows the faster memory is close to the processor and the
slower, less expensive memory is below it.
 The upper level—the one closer to the processor—is smaller and faster than the lower level,
since the upper level uses technology that is more expensive.
 The below fig shows that the minimum unit of information that can be either present or not
present in the two-level hierarchy is called a block or a line.

1
Fig 2 Every pair of levels in the memory hierarchy can be thought of as having an upper and
lower level. transfer an entire block when we copy something between levels.
 Hit: If the data requested by the processor appears in some block in the upper level, this is called a
hit..
 Miss: If the data is not found in the upper level, the request is called a miss.
 The lower level in the hierarchy is then accessed to retrieve the block containing the requested
data.
 The hit rate, or hit ratio, is the fraction of memory accesses found in the upper level; it is often
used as a measure of the performance of the memory hierarchy.
 The miss rate (1 hit rate) is the fraction of memory accesses not found in the upper level.
Since performance is the major reason for having a memory hierarchy, the time to service hits and
misses is important:
 Hit time is the time to access the upper level of the memory hierarchy, which includes the time
needed to determine whether the access is a hit or a miss .
 The miss penalty is the time to replace a block in the upper level with the corresponding block
from the lower level, plus the time to deliver this block to the processor .
 Because the upper level is smaller and built using faster memory parts, the hit time will be much
smaller than the time to access the next level in the hierarchy, which is the major component of
the miss penalty.
 Programs exhibit both temporal locality, the tendency to reuse recently accessed data items, and
spatial locality, the tendency to reference data items that are close to other recently accessed items.
 Memory hierarchies take advantage of temporal locality by keeping more recently accessed data
items closer to the processor. Memory hierarchies take advantage of spatial locality by moving
blocks consisting of multiple contiguous words in memory to upper levels of the hierarchy.

Fig.3 This diagram shows the structure of a memory hierarchy: as the distance from the
processor increases, so does the size.
 The above fig shows that a memory hierarchy uses smaller and faster memory technologies
close to the processor. Thus, accesses that hit in the highest level of the hierarchy can be
2
processed quickly. Accesses that miss go to lower levels of the hierarchy, which are larger but
slower.
 If the hit rate is high enough, the memory hierarchy has an effective access time
close to that of the highest (and fastest) level and a size equal to that of the lowest
level.
 In most systems, the memory is a true hierarchy, meaning that data cannot be present in level
i unless it is also present in level i + 1.

2.MEMORY TECHNOLOGY

SRAM Technology
 The first letter of SRAM stands for static. SRAMs don’t need to refresh and so the access time is
very close to the cycle time. SRAMs typically use six transistors per bit to prevent the information
from being disturbed when read.
 The dynamic nature of the circuits in DRAM requires data to be written back after being read—
hence the difference between the access time and the cycle time as well as the need to refresh.
 SRAM needs only minimal power to retain the charge in standby mode. SRAM designs are
concerned with speed and capacity, while in DRAM designs the emphasis is on cost per bit and
capacity.
 For memories designed in comparable technologies, the capacity of DRAMs is roughly 4–8 times
that of SRAMs. The cycle time of SRAMs is 8–16 times faster than DRAMs, but they are also 8–
16 times as expensive.

DRAM Technology
1) As early DRAMs grew in capacity, the cost of a package with all the necessary address lines was
an issue. The solution was to multiplex the address lines, thereby cutting the number of address
pins in half.
2) One-half of the address is sent first, called the row access strobe (RAS). The other half of the
address, sent during the column access strobe (CAS), follows it.
3) These names come from the internal chip organization, since the memory is organized as a
rectangular matrix addressed by rows and columns.

 Memory controllers include hardware to refresh the DRAMs periodically. This requirement means
that the memory system is occasionally unavailable because it is sending a signal telling every
chip to refresh. The time for a refresh is typically a full memory access (RAS and CAS) for each
row of the DRAM. Since the memory matrix in a DRAM is conceptually square, the number of
steps min a refresh is usually the square root of the DRAM capacity.
 DRAM designers try to keep time spent refreshing to less than 5% of the total time. So far we
have presented main memory as if it operated like a Swiss train, consistently delivering the goods
exactly according to schedule.
3
 Although we have been talking about individual chips, DRAMs are commonly sold on small
boards called dual inline memory modules (DIMMs). DIMMs typically contain 4–16 DRAMs, and
they are normally organized to be 8 bytes wide (+ ECC) for desktop systems.

Improving Memory Performance inside a DRAM Chip


 To improve bandwidth, there has been a variety of evolutionary innovations over time.
 The first was timing signals that allow repeated accesses to the row buffer without another
row access time, typically called fast page mode. Such a buffer comes naturally, as each array
will buffer 1024–2048 bits for each access. Conventional DRAMs had an asynchronous interface
to the memory controller, and hence every transfer involved overhead to synchronize with the
controller.
 The second major change was to add a clock signal to the DRAM interface, so that the
repeated transfers would not bear that overhead. Synchronous DRAM (SDRAM) is the name of
this optimization. SDRAMs typically also had a programmable register to hold the number of
bytes requested, and hence can send many bytes over several cycles per request.
 The third major DRAM innovation to increase bandwidth is to transfer data on both the
rising edge and falling edge of the DRAM clock signal, thereby doubling the peak data rate.
This optimization is called double data rate (DDR)

FLASH MEMORY
It’s a type of Electrically Erasable Programmable Read-only Memory(EEPROM). most flash
products include a controller to spread the writes by remapping blocks that have been written many
times to less trodden blocks. This technique is called wear levelling. With wear levelling, personal
mobile devices are very unlikely to exceed the write limits in the flash. Such wear levelling lowers
the potential performance of flash, but it is needed unless higher-level software monitors block wear.

DISK MEMORY
 A magnetic hard disk consists of a collection of platters, which rotate on a spindle at 5400 to
15,000 revolutions per minute.
 The metal platters are covered with magnetic recording material on both sides, similar to the
material found on a casstte or videotape.
 To read and write information on a hard disk, a movable arm containing a small electromagnetic
coil called a read-write head is located just above each surface.
 The entire drive is permanently sealed to control the environment inside the drive, which, in turn,
allows the disk heads to be much closer to the drive surface.
 Each disk surface is divided into concentric circles, called tracks. There are typically tens of
thousands of tracks per surface.
 Each track is in turn divided into sectors that contain the information; each track may have
thousands of sectors. S ectors are typically 512 to 4096 bytes in size.

Seek
The process of positioning a read/write head over the proper track on a disk.
Rotational latency
Also called rotational delay. The time required for the desired sector of a disk to rotate under
the read/write head; usually assumed to be half the rotation time. The average latency to the desired
4
information is halfway around the disk. Disks rotate at 5400 RPM to 15,000 RPM. The average
rotational latency at 5400 RPM is

Difference between SRAM and DRAM:

CACHE BASICS
 Cache is one of the fastest and smallest level of the memory hierarchy between the processor and
main memory. It is built using SRAMS.
 Figure below shows such a simple cache, before and after requesting a data item that is not
initially in the cache.
 Before the request, the cache contains a collection of recent references X1, X2, …, Xn−1
 The processor requests a word Xn that is not in the cache. This request results in a miss, and the
word Xn is brought from memory into the cache.


 The simplest way to assign a location in the cache for each word in memory is to assign the cache
location based on the address of the word in memory.
 This cache structure is called direct mapped, since each memory location is mapped directly to
exactly one location in the cache.
 For example, almost all direct-mapped caches use this mapping to find a block:


 Thus, an 8- block cache uses the three lowest bits (8=23) of the block address.
 For example, Figure below shows how the memory addresses between 1ten (00001two) and 29ten
(11101two) map to locations 1ten (001two) and 5ten (101two) in a direct-mapped cache of eight
words.
 A direct-mapped cache with eight entries showing the addresses of memory words between 0 and
31 that map to the same cache locations
5
 To know whether the data in the cache corresponds to a requested word we add a set of tags to
the cache.
 The tags contain the address information required to identify whether a word in the cache
corresponds to the requested word.
 The tag needs only to contain the upper portion of the address .Only have the upper 2 of the 5
address bits in the tag.
 Lower 3-bit index field of the address selects the block.
 The most common method is to add a valid bit to indicate whether an entry contains a valid
address.
 If the bit is not set, there cannot be a match for this block.
Accessing a Cache
 A sequence of nine memory references to an empty eight-block cache, including the action for
each reference.
 Figure below shows how the contents of the cache change on each miss.

6
• The index of a cache block, together with the tag contents of that block, uniquely specifies the
memory address of the word contained in the cache block.
• The total number of bits needed for a cache is a function of the cache size and the address size,
because the cache includes both the storage for the data and the tags.
Handling Cache Misses
• Control unit deals with cache misses. The control unit must detect a miss and process the miss
by fetching the requested data from memory
7
• If the cache reports a hit, the computer continues using the data as if nothing happened.
• If the data is not present in the cache then it is a miss. The cache miss handling is done in
collaboration with the processor control unit and with a separate controller that initiates the
memory access and refills the cache.
• The processing of a cache miss creates a pipeline stall as different to an interrupt, which would
require saving the state of all registers.
• To get the proper instruction into the cache, instruct the lower level in the memory hierarchy to
perform a read.
The steps to be taken on an instruction cache miss:
1. Send the original PC value (current PC – 4) to the memory.
2. Instruct main memory to perform a read and wait for the memory to complete its access.
3. Write the cache entry, putting the data from memory in the data portion of the entry, writing the
upper bits of the address (from the ALU) into the tag field, and turning the valid bit on.
4. Restart the instruction execution at the first step, which will refetch the instruction, this time
finding it in the cache.
Handling writes
• Writes work somewhat differently.
• Suppose on a store instruction, we wrote the data into only to the data cache (without changing
main memory)
• Then, after the write into the cache, memory would have a different value from that in the cache.
In such a case, the cache and memory are said to be inconsistent.
• The simplest way to keep the main memory and the cache consistent is always to write the data
into both the memory and the cache. This scheme is called write-through.
• write buffer- A queue that holds data while the data is waiting to be written to memory. A write
buffer stores the data while it is waiting to be written to memory.
• After writing the data into the cache and into the write buffer, the processor can continue
execution.
• The alternative to a write-through scheme is a scheme called write-back.
• In a write back scheme, when a write occurs, the new value is written only to the block in the
cache. The modified block is written to the main memory only when it is replaced.

MEASURING AND IMPROVING CACHE PERFORMANCE


• To measure and analyze cache performance. Two different techniques for improving cache
performance.
• One focuses on reducing the miss rate by reducing the probability that two different memory
blocks will participate for the same cache location.
• The second technique reduces the miss penalty by adding an additional level to the hierarchy. This
technique, called multilevel caching

• Memory-stall clock cycles come primarily from cache misses. Stalls generated by reads and writes
can be quite complex.
• Memory-stall clock cycles can be defined as the sum of the stall cycles coming from reads plus
those coming from writes:

• The read-stall cycles can be defined in terms of the number of read accesses per program, the
miss penalty in clock cycles for a read,
• and the read miss rate:

• Writes are more complicated. For a write-through scheme, we have two sources of stalls:
• Write misses, which usually require that we fetch the block before continuing the write.

8
• Write buffer stalls, which occur when the write buffer is full when a write occurs.
• Cycles stalled for writes equals the sum of these two:

• Write-back schemes also have potential additional stalls arising from the need to write a cache
block back to memory when the block is replaced.

Calculating Cache Performance

Average memory access time (AMAT)


Average memory access time is the average time to access memory considering both hits and misses
and the frequency of different accesses.

Reducing Cache Misses by More Flexible Placement of Blocks


• Direct mapped cache : A block can go in exactly one place in the cache. There is a direct
mapping from any block address in memory to a single location in the upper level of the
hierarchy.
The position of memory block in direct mapping is given by

• Fully associative –It is a scheme where a block can be placed in any location in the cache.
Such a scheme is called fully associative, because a block in memory may be associated with any
entry in the cache. To find a given block in a fully associative cache, all the entries in the cache
must be searched because a block can be placed in any one.
• The middle range of designs between direct mapped and fully associative is called set associative.
• Set-associative cache - there are a fixed number of locations where each block can be placed.
A set-associative cache with n locations for a block is called an n-way set-associative cache.
• An n-way set-associative cache consists of a number of sets, each of which consists of n blocks.
Each block in the memory maps to a unique set in the cache given by the index field, and a block
can be placed in any element of that set.
• Thus, a set associative placement combines direct-mapped placement and fully associative
placement: a block is directly mapped into a set, and then all the blocks in the set are searched for
a match.
The position of memory block in set associative mapping is given by

Figure below shows where block 12 may be placed in a cache with eight blocks total, according to
the three block placement policies. Varies for direct mapped, set-associative, and fully associative
placement.

9
 In direct-mapped placement, there is only one cache block where memory block 12 can be found,
and that block is given by (12 modulo 8)=4.
• In a two-way set-associative cache, there would be four sets, and memory block 12 must be in set
(12 mod 4)=0; the memory block could be in either element of the set.
• In a fully associative placement, the memory block for block address 12 can appear in any of the
eight cache blocks.
• An 8-block cache configured as direct-mapped, 2-way set associative, 4-way set associative, &
fully associative

VIRTUAL MEMORY:
 The main memory can act as a “cache” for the secondary storage, usually implemented with
magnetic disks. This technique is called virtual memory.
 Techniques that automatically move program and data blocks into the physical main memory
when they are required for execution is called the Virtual Memory.
 Virtual memory implements the translation of a program’s address space to physical addresses.
This translation process enforces protection of a program’s address space from other virtual
machines.
 The binary address that the processor issues either for instruction or data are called the virtual /
Logical address.
 The virtual address is translated into physical address by a combination of hardware and software
components. This kind of address translation is done by MMU(Memory Management Unit).
 When the desired data are in the main memory , these data are fetched /accessed immediately.
10
 If the data are not in the main memory, the MMU causes the Operating system to bring the data
into memory from the disk. Transfer of data between disk and main memory is performed using
DMA scheme.
Fig:Virtual Memory Organisation

Address Translation:
 In address translation, all programs and data are composed of fixed length units called Pages.
 The Page consists of a block of words that occupy contiguous locations in the main memory.
 The pages are commonly range from 2K to 16K bytes in length.
 The cache bridge speed up the gap between main memory and secondary storage and it is
implemented in software techniques.
 Each virtual address generated by the processor contains virtual Page number(Low order bit)
and offset(High order bit)
 Virtual Page number+ offset Specifies the location of a particular byte (or word) within a page.
 Page Table: It contains the information about the main memory address where the page is stored
& the current status of the page.
 Page Frame: An area in the main memory that holds one page is called the page frame.
 Page Table Base Register: It contains the starting address of the page table.
Virtual Page Number+Page Table Base register->Gives the address of the corresponding entry in
the page table.ie)it gives the starting address of the page if that page currently resides in memory.
Control Bits in Page Table:
The Control bits specifies the status of the page while it is in main memory.
Function:
 The control bit indicates the validity of the page ie)it checks whether the page is actually loaded in
the main memory.
 It also indicates that whether the page has been modified during its residency in the memory;this
information is needed to determine whether the page should be written back to the disk before it is
removed from the main memory to make room for another page.
 The Page table information is used by MMU for every read & write access.
 The Page table is placed in the main memory but a copy of the small portion of the page table is
located within MMU.
 This small portion or small cache is called Translation LookAside Buffer(TLB).
 This portion consists of the page table entries that corresponds to the most recently accessed pages
and also contains the virtual address of the entry.

11
Fig: Virtual Memory Address Translation

 In virtual memory, the address is broken into a virtual page number and a page offset.
 The figure below shows the translation of the virtual page number to a physical page number.
 The physical page number constitutes the upper portion of the physical address, while the page
offset, which is not changed, constitutes the lower portion.
 The number of bits in the page offset field determines the page size.

 In virtual memory systems, we locate pages by using a table that indexes the memory; this
structure is called a page table, and it resides in memory. Each program has its own page table,
which maps the virtual address space of that program to main memory.
 A valid bit is used in each page table entry, If the bit is 0, the page is not present in main memory
and a page fault occurs. If the bit is 1, the page is in memory and the entry contains the physical
page number.

12
Page Faults
 If the valid bit for a virtual page is 0, a page fault occurs. The operating system must be given
control.
 The operating system gets control, and it must find the page in the next level of the hierarchy
(usually flash memory or magnetic disk) and decide where to place the requested page in main
memory.
 The operating system usually creates the space on flash memory or disk for all the pages of a
process when it creates the process. This space is called the swap space.

Fig: Indicates a page fault and the data is brought from the disk storage

Making Address Translation Fast: the TLB - TRANSLATION-LOOKASIDE


BUFFER
 Modern processors include a special cache that keeps track of recently used translations. This
special address translation cache is traditionally referred to as a translation-lookaside buffer
(TLB), although it would be more accurate to call it a translation cache.

13
 On every reference, we look up the virtual page number in the TLB. If we get a hit, the physical
page number is used to form the address, and the corresponding reference bit is turned on.
 If the processor is performing a write, the dirty bit is set to 1.
 If a miss in the TLB occurs, we must determine whether it is a page fault or merely a TLB miss.
If the page exists in memory, then the TLB miss indicates only that the translation is missing.
 The processor can handle the TLB miss by loading the translation from the page table into the
TLB and then trying the reference again.
 If the page is not present in memory, the TLB miss indicates a true page fault. In this case, the
processor invokes the operating system using an exception.
 TLB misses can be handled either in hardware or in software.
 After a TLB miss occurs and the missing translation has been retrieved from the page table, we
will need to select a TLB entry to replace.
 Because the reference and dirty bits are contained in the TLB entry, we need to copy these bits
back to the page table entry when we replace an entry.
 Some systems use other techniques to approximate the reference and dirty bits, eliminating the
need to write into the TLB except to load a new table entry on a miss.
Some typical values for a TLB might be
■ TLB size: 16–512 entries
■ Block size: 1–2 page table entries (typically 4–8 bytes each)
■ Hit time: 0.5–1 clock cycle
■ Miss penalty: 10–100 clock cycles
■ Miss rate: 0.01%–1%

DIRECT MEMORY ACCESS


 A special control unit may be provided to allow the transfer of large block of data at high speed
directly between the external device and main memory, without continuous intervention by the
processor. This approach is called DMA.
 DMA transfers are performed by a control circuit called the DMA Controller.
DMA Controller.
 To initiate the transfer of a block of words , the processor sends,
i) Starting address
ii) Number of words in the block
iii)Direction of transfer.

14
 When a block of data is transferred , the DMA controller increment the memory address for
successive words and keep track of number of words and it also informs the processor by raising
an interrupt signal.
 While DMA control is taking place, the program requested the transfer cannot continue and the
processor can be used to execute another program.
 After DMA transfer is completed, the processor returns to the program that requested the transfer.

Registers in a DMA Interface


R/W->Determines the direction of transfer
When R/W =1, DMA controller read data from memory to I/O device.
o R/W =0, DMA controller perform write operation.
o Done Flag=1, the controller has completed transferring a block of data and is ready to receive
another command.
o IE=1, it causes the controller to raise an interrupt (interrupt Enabled) after it has completed
transferring the block of data.
o IRQ=1, it indicates that the controller has requested an interrupt
Use of DMA controllers in a computer system

 A DMA controller connects a high speed network to the computer bus,and the disk controller for
two disks, also has DMA capability and it provides two DMA channels.
 To start a DMA transfer of a block of data from main memory to one of the disks,the program
write’s the address and the word count information into the registers of the corresponding channel
of the disk controller.
 When DMA transfer is completed, it will be recorded in status and control registers of the DMA
channel (ie) Done bit=IRQ=IE=1.
Cycle Stealing:
 Requests by DMA devices for using the bus are having higher priority than processor requests .
 Top priority is given to high speed peripherals such as, Disk, High speed Network Interface and
Graphics display device.
 Since the processor originates most memory access cycles, the DMA controller can be said to steal
the memory cycles from the processor. This interviewing technique is called Cycle stealing.
15
Burst Mode: The DMA controller may be given exclusive access to the main memory to transfer a
block of data without interruption. This is known as Burst/Block Mode.
Bus Master: The device that is allowed to initiate data transfers on the bus at any given time is
called the bus master
Bus Arbitration:
It is the process by which the next device to become the bus master is selected and the bus
mastership is transferred to it.
 Types: There are 2 approaches to bus arbitration. They are
i)Centralized arbitration ( A single bus arbiter performs arbitration)
ii)Distributed arbitration (all devices participate in the selection of next bus master).
Centralized Arbitration:
 Here the processor is the bus master and it may grants bus mastership to one of its DMA
controller.
 A DMA controller indicates that it needs to become the bus master by activating the Bus Request
line (BR) which is an open drain line.
 The signal on BR is the logical OR of the bus request from all devices connected to it.When BR is
activated the processor activates the Bus Grant Signal (BGI) and indicated the DMA controller
that they may use the bus when it becomes free.
 This signal is connected to all devices using a daisy chain arrangement.
 If DMA requests the bus, it blocks the propagation of Grant Signal to other devices and it
indicates to all devices that it is using the bus by activating open collector line, Bus Busy (BBSY).
A simple arrangement for bus arbitration using a daisy chain

.
Sequence of signals during transfer of bus mastership for the devices

 The timing diagram shows the sequence of events for the devices connected to the processor is
shown.
 DMA controller 2 requests and acquires bus mastership and later releases the bus.
 During its tenture as bus master, it may perform one or more data transfer.
 After it releases the bus, the processor resources bus mastership.
Distributed Arbitration:
16
It means that all devices waiting to use the bus have equal responsibility in carrying out the
arbitration process.
Fig: A distributed arbitration scheme

 Each device on the bus is assigned a 4 bit id. When one or more devices request the bus, they
assert the Start-Arbitration signal & place their 4 bit ID number on four open collector lines,
ARB0 to ARB3.
 A winner is selected as a result of the interaction among the signals transmitted over these lines.
 The net outcome is that the code on the four lines represents the request that has the highest ID
number.
 The drivers are of open collector type. Hence, if the i/p to one driver is equal to 1, the i/p to
another driver connected to the same bus line is equal to „0‟(ie. bus the is in low-voltage state).
 Eg: Assume two devices A & B have their ID 5 (0101), 6(0110) and their code is 0111.
 Each devices compares the pattern on the arbitration line to its own ID starting from MSB.
 If it detects a difference at any bit position, it disables the drivers at that bit position. It does this
by placing „0‟ at the i/p of these drivers.
 In our eg. „A‟ detects a difference in line ARB1, hence it disables the drivers on lines ARB1 &
ARB0. This causes the pattern on the arbitration line to change to 0110 which means that „B‟ has
won the contention.

INTERRUPTS:
 An interrupt is an external event that causes the execution of one program to be suspended and the
execution of another program to begin.
 In program‐controlled I/O, when the processor continuously monitors the status of the device , the
processor will not perform any function.
 An alternate approach would be for the I/O device to alert the processor when it becomes ready. –
The Interrupt request line will send a hardware signal called the interrupt signal to the processor.
On receiving this signal, the processor will perform the useful function during the waiting period.
 The routine executed in response to an interrupt request is called Interrupt Service Routine. The
interrupt resembles the subroutine calls. The interrupt request uses a line in the bus called
interrupt request line.
Fig:Transfer of control through the use of interrupts

17
 The processor first completes the execution of instruction i. Then it loads the PC(Program
Counter) with the address of the first instruction of the ISR.
 After the execution of ISR, the processor has to come back to instruction i + 1.
 Therefore, when an interrupt occurs, the current contents of PC which point to i +1 is put in
temporary storage in a known location.
 A return from interrupt instruction at the end of ISR reloads the PC from that temporary storage
location, causing the execution to resume at instruction i+1.
 When the processor is handling the interrupts, it must inform the device that its request has been
recognized so that it remove its interrupt requests signal.
 This may be accomplished by a special control signal called the interrupt acknowledge signal.
 The task of saving and restoring the information can be done automatically by the processor.
 The processor saves only the contents of program counter & status register (ie) it saves only the
minimal amount of information to maintain the integrity of the program execution.
 Saving registers also increases the delay between the time an interrupt request is received and the
start of the execution of the ISR. This delay is called the Interrupt Latency.
 Generally, the long interrupt latency in unacceptable. The concept of interrupts is used in
Operating System and in Control Applications, where processing of certain routines must be
accurately timed relative to external events. This application is also called as real-time
processing.

Interrupt Hardware:
Fig: An equivalent circuit for an open drain bus used to implement a common interrupts
request line.

 A single interrupt request line may be used to serve „n‟ devices.


 All devices are connected to the line via switches to ground. To request an interrupt, a device
closes its associated switch, the voltage on INTR line drops to 0(zero).
18
 If all the interrupt request signals (INTR1 to INTRn) are inactive, all switches are open and the
voltage on INTR line is equal to Vdd.
 When a device requests an interrupts, the value of INTR is the logical OR of the requests from
individual devices.
(ie)INTR = INTR1+…………+INTRn
INTR->It is used to name the INTR signal on common line it is active in the low voltage state.
 Open collector (bipolar ckt) or Open drain (MOS circuits) is used to drive INTR line.
 The Output of the Open collector (or) Open drain control is equal to a switch to the ground that is
open when gates input is in „0‟ state and closed when the gates input is in „1‟ state.
 Resistor „R‟ is called a pull-up resistor because it pulls the line voltage upto the high voltage
state when the switches are open.
Enabling and Disabling Interrupts:
 The arrival of an interrupt request from an external device causes the processor to suspend the
execution of one program & start the execution of another because the interrupt may alter the
sequence of events to be executed.
 INTR is active during the execution of Interrupt Service Routine.
 There are 3 mechanisms to solve the problem of infinite loop which occurs due to successive
interruptions of active INTR signals.
The following are the typical scenario.
 The device raises an interrupt request.
 The processor interrupts the program currently being executed.
 Interrupts are disabled by changing the control bits is PS (Processor Status register)
 The device is informed that its request has been recognized & in response, it deactivates the INTR
signal.
 The actions are enabled & execution of the interrupted program is resumed.
Edge-triggered:
 The processor has a special interrupt request line for which the interrupt handling circuit responds
only to the leading edge of the signal. Such a line said to be edge-triggered.
Handling Multiple Devices:
 When several devices requests interrupt at the same time, it raises some questions. They are.
o How can the processor recognize the device requesting an interrupt?
o Given that the different devices are likely to require different ISR, how can the processor obtain
the starting address of the appropriate routines in each case?
o Should a device be allowed to interrupt the processor while another interrupt is being serviced?
o How should two or more simultaneous interrupt requests be handled?
Polling Scheme:
 If two devices have activated the interrupt request line, the ISR for the selected device (first
device) will be completed & then the second request can be serviced.
 The simplest way to identify the interrupting device is to have the ISR polls all the encountered
with the IRQ bit set is the device to be serviced .
 IRQ (Interrupt Request) -> when a device raises an interrupt requests, the status register IRQ is set
to 1.
Merit:
 It is easy to implement.
Demerit:
 The time spent for interrogating the IRQ bits of all the devices that may not be requesting any
service.
Vectored Interrupt:
 Here the device requesting an interrupt may identify itself to the processor by sending a special
code over the bus & then the processor start executing the ISR.
 The code supplied by the processor indicates the starting address of the ISR for the device.
 The code length ranges from 4 to 8 bits. The location pointed to by the interrupting device is used
to store the staring address to ISR.
 The processor reads this address, called the interrupt vector & loads into PC.
19
 The interrupt vector also includes a new value for the Processor Status Register.
 When the processor is ready to receive the interrupt vector code, it activate the interrupt
acknowledge (INTA) line.
Interrupt Nesting:
Multiple Priority Scheme:
 In multiple level priority scheme, we assign a priority level to the processor that can be changed
under program control.
 The priority level of the processor is the priority of the program that is currently being executed.
 The processor accepts interrupts only from devices that have priorities higher than its own.
 At the time the execution of an ISR for some device is started, the priority of the processor is
raised to that of the device.
 The action disables interrupts from devices at the same level of priority or lower.
Privileged Instruction:
 The processor priority is usually encoded in a few bits of the Processor Status word.
 It can also be changed by program instruction & then it is write into PS. These instructions are
called privileged instruction.
 This can be executed only when the processor is in supervisor mode.
 The processor is in supervisor mode only when executing OS routines. It switches to the user
mode before beginning to execute application program.
Privileged Exception:
 User program cannot accidently or intentionally change the priority of the processor & disrupts the
system operation.
 An attempt to execute a privileged instruction while in user mode, leads to a special type of
interrupt called the privileged exception.
Fig: Implementation of Interrupt Priority using individual Interrupt request acknowledge lines

 Each of the interrupt request line is assigned a different priority level.


 Interrupt request received over these lines are sent to a priority arbitration circuit in the processor.
 A request is accepted only if it has a higher priority level than that currently assigned to the
processor.
Simultaneous Requests:
Daisy Chain:

20
 The interrupt request line INTR is common to all devices.
 The interrupt acknowledge line INTA is connected in a daisy chain fashion such that INTA signal
propagates serially through the devices.
 When several devices raise an interrupt request, the INTR is activated & the processor responds
by setting INTA line to 1. this signal is received by device.
 Device1 passes the signal on to device2 only if it does not require any service.
 If devices1 has a pending request for interrupt blocks that INTA signal & proceeds to put its
identification code on the data lines. Therefore, the device that is electrically closest to the
processor has the highest priority.
Merits:
 It requires fewer wires than the individual connections.
Arrangement of Priority Groups:
 Here the devices are organized in groups & each group is connected at a different priority level.
Within a group, devices are connected in a daisy chain.
 At the devices end, an interrupt enable bit in a control register determines whether the device is
allowed to generate an interrupt requests.
 At the processor end, either an interrupt enable bit in the PS (Processor Status) or a priority
structure determines whether a given interrupt requests will be accepted.
Initiating the Interrupt Process:
 Load the starting address of ISR in location INTVEC (vectored interrupt).
 Load the address LINE in a memory location PNTR. The ISR will use this location as a pointer to
store the i/p characters in the memory.
 Enable the keyboard interrupts by setting bit 2 in register CONTROL to 1.
Exception of ISR:
 Read the input characters from the keyboard input data register. This will cause the interface
circuits to remove its interrupt requests.
 Store the characters in a memory location pointed to by PNTR & increment PNTR.
 When the end of line is reached, disable keyboard interrupt & inform program main.
 Return from interrupt.

STANDARD I/O INTERFACE


 A standard I/O Interface is required to fit the I/O device with an Interface circuit.
 The processor bus is the bus defined by the signals on the processor chip itself.
 The devices that require a very high speed connection to the processor such as the main memory,
may be connected directly to this bus.
 The bridge connects two buses, which translates the signals and protocols of one bus into another.

21
 The bridge circuit introduces a small delay in data transfer between processor and the devices.
 We have 3 Bus standards. They are,
 PCI (Peripheral Component Inter Connect)
 SCSI (Small Computer System Interface)
 USB (Universal Serial Bus)
SCSI INTERFACE
SCSI is available in a variety of interfaces. The first, still very common, was parallel SCSI (now
also called SPI), which uses a parallel bus design.
SCSI interfaces have often been included on computers from various manufacturers for use under
Microsoft Windows, Mac OS, Unix, Commodore Amiga and Linux operating systems, either
implemented on the motherboard or by the means of plug-in adaptors.
Short for Small Computer System Interface, SCSI is pronounced as "Scuzzy" and is one of the
most commonly used interface for disk drives that was first completed in 1982.
SCSI-1 is the original SCSI standard developed back in 1986 as ANSI X3.131-1986. SCSI-1 is
capable of transferring up to eight bits a second.
SCSI-2 was approved in 1990, added new features such as Fast and Wide SCSI, and support for
additional devices.
SCSI-3 was approved in 1996 as ANSI X3.270-1996.
SCSI is a standard for parallel interfaces that transfers information at a rate of eight bits per second
and faster, which is faster than the average parallel interface. SCSI-2 and above supports up to
seven peripheral devices, such as a hard drive, CD-ROM, and scanner, that can attach to a single
SCSI port on a system's bus. SCSI ports were designed for Apple Macintosh and Unix computers,
but also can be used with PCs. Although SCSI has been popular in the past, today many users are
switching over to SATA drives.
SCSI connectors
The below illustrations are examples of some of the most commonly found and used SCSI connectors
on computers and devices and illustrations of each of these connections.
 SCSI is used for connecting additional devices both inside and outside the computer box.
 SCSI bus is a high speed parallel bus intended for devices such as disk and video display.
 SCSI refers to the standard bus which is defined by ANSI (American National Standard Institute).
 SCSI bus the several options. It may be,
 Narrow bus It has 8 data lines & transfers 1 byte at a time.
 Wide bus It has 16 data lines & transfer 2 byte at a time.
 Single-Ended Transmission Each signal uses separate wire.
 HVD (High Voltage Differential) It was 5v (TTL cells)
 LVD (Low Voltage Differential) It uses 3.3v
 Because of these various options, SCSI connector may have 50, 68 or 80 pins.
 The data transfer rate ranges from 5MB/s to 160MB/s 320Mb/s, 640MB/s.
 The transfer rate depends on,
Length of the cable
Number of devices connected.
 To achieve high transfer rat, the bus length should be 1.6m for SE signaling and 12m for LVD
signalling.
 The SCSI bus us connected to the processor bus through the SCSI controller.
 The data are stored on a disk in blocks called sectors.
 Each sector contains several hundreds of bytes. These data will not be stored in contiguous
memory location.
 SCSI protocol is designed to retrieve the data in the first sector or any other selected sectors.
Using SCSI protocol, the burst of data are transferred at high speed.
 The controller connected to SCSI bus is of 2 types. They are,
Initiator
Target
Initiator:
It has the ability to select a particular target & to send commands specifying the operation to be
performed.
22
They are the controllers on the processor side.
Target:
The disk controller operates as a target.
It carries out the commands it receive from the initiator. The initiator establishes a logical connection
with the intended target.
Steps:
 Consider the disk read operation, it has the following sequence of events.
 The SCSI controller acting as initiator, contends process, it selects the target controller & hands
over control of the bus to it.
 The target starts an output operation, in response to this the initiator sends a command specifying
the required read operation.
 The target that it needs to perform a disk seek operation, sends a message to the initiator indicating
that it will temporarily suspends the connection between them.
 Then it releases the bus.
 The target controller sends a command to disk drive to move the read head to the first sector
involved in the requested read in a data buffer. When it is ready to begin transferring data to
initiator, the target requests control of the bus. After it wins arbitration, it reselects the initiator
controller, thus restoring the suspended connection.
 The target transfers the controls of the data buffer to the initiator & then suspends the connection
again. Data are transferred either 8 (or) 16 bits in parallel depending on the width of the bus.
 As the initiator controller receives the data, if stores them into main memory using DMA
approach.
 The SCSI controller sends an interrupt to the processor to inform it that the requested operation
has been completed.
Bus Signals:-
 The bus has no address lines.
 Instead, it has data lines to identify the bus controllers involved in the selection / reselection /
arbitration process.
 For narrow bus, there are 8 possible controllers numbered from 0 to 7.
 For a wide bus, there are 16 controllers.
 Once a connection is established b/w two controllers, these is no further need for addressing & the
datalines are used to carry the data.
SCSI bus signals: Category Name Function
Data - DB (0) to DB (7) Datalines
- DB(P) Parity bit for data bus.

Phases - BSY Busy


- SEL Selection

Information type - C/D Control / Data


- MSG Message

Handshake - REQ Request


- ACK Acknowledge
Direction of transfer I/O Input / Output
Oter - ATN Attention
- RST Reset.

PCI:
 PCI defines an expansion bus on the motherboard.
 PCI is developed as a low cost bus that is truly processor independent.
 It supports high speed disk, graphics and video devices.
 PCI has plug and play capability for connecting I/O devices.
 To connect new devices, the user simply connects the device interface board to the bus.

23
Data Transfer:
 The data are transferred between cache and main memory is the bursts of several words and they
are stored in successive memory locations.
 When the processor specifies an address and request a „read‟ operation from memory, the
memory responds by sending a sequence of data words starting at that address.
 During write operation, the processor sends the address followed by sequence of data words to be
written in successive memory locations.
 PCI supports read and write operation.
 A read / write operation involving a single word is treated as a burst of length one.
 PCI has three address spaces. They are
 Memory address space
 I/O address space
 Configuration address space
 I/O address space → It is intended for use with processor
 Configuration space → It is intended to give PCI, its plug and play capability.
 PCI Bridge provides a separate physical connection to main memory.
 The master maintains the address information on the bus until data transfer is completed.
 At any time, only one device acts as bus master.
 A master is called „initiator‟ in PCI which is either processor or DMA.
 The addressed device that responds to read and write commands is called a target.
 A complete transfer operation on the bus, involving an address and bust of data is called a
„transaction’.
Fig:Use of a PCI bus in a Computer system

USB – Universal Serial Bus


 USB is used for connecting additional devices both inside and outside the computer box.
 USB uses a serial transmission to suit the needs of equipment ranging from keyboard- keyboard
-to game control to internal connection.
 USB supports 3 speed of operation. They are,
 Low speed (1.5Mb/s)
 Full speed (12mb/s)
 High speed ( 480mb/s)
 The USB has been designed to meet the key objectives. They are,
 It provide a simple, low cost & easy to use interconnection s/w that overcomes the difficulties due
to the limited number of I/O ports available on a computer.
 It accommodate a wide range of data transfer characteristics for I/O devices including telephone &
Internet connections.
 Enhance user convenience through ‘Plug & Play’ mode of operation.
Port Limitation:-
 Normally the system has a few limited ports.
 To add new ports, the user must open the computer box to gain access to the internal expansion
bus & install a new interface card.
 The user may also need to know to configure the device & the s/w.
Merits of USB:-
24
 USB helps to add many devices to a computer system at any time without opening the computer
box.
Device Characteristics:-
 The kinds of devices that may be connected to a cptr cover a wide range of functionality.
 The speed, volume & timing constrains associated with data transfer to & from devices varies
significantly.
Eg:1 Keyboard ->Since the event of pressing a key is not synchronized to any other event in a
computer system, the data generated by keyboard are called asynchronous.
 The data generated from keyboard depends upon the speed of the human operator which is about
100bytes/sec.
Plug & Play:-
 The main objective of USB is that it provides a plug & play capability.
 The plug & play feature enhances the connection of new device at any time, while the system is
operation.
 The system should,
 Detect the existence of the new device automatically.
 Identify the appropriate device driver s/w.
 Establish the appropriate addresses.
 Establish the logical connection for communication.
USB Architecture:-
 USB has a serial bus format which satisfies the low-cost & flexibility requirements.
 Clock & data information are encoded together & transmitted as a single signal.
 There are no limitations on clock frequency or distance arising form data skew, & hence it is
possible to provide a high data transfer bandwidth by using a high clock frequency.
 To accommodate a large no/. of devices that can be added / removed at any time, the USB has the
tree structure.
Fig:USB Tree Structure

 Each node of the tree has a device called „hub‟, which acts as an intermediate control point b/w
host & I/O devices.
 At the root of the tree, the „root hub‟ connects the entire tree to the host computer.
 The leaves of the tree are the I/O devices being served.

25

You might also like