0% found this document useful (0 votes)
8 views

Ca 5

COMPUTER ARCHITECTURE-5

Uploaded by

ganga bhavani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Ca 5

COMPUTER ARCHITECTURE-5

Uploaded by

ganga bhavani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

MODULE –V

MEMORY ORGANIZATION
Contents
Memory organization:
• Memory hierarchy
• Main memory
• Auxiliary memory
• Associative memory
• Cache memory
• Virtual memory
Memory Hierarchy
• At the bottom of the hierarchy are the relatively slow magnetic tapes used to store
removable files. Next are the magnetic disks used as backup storage. The main memory
occupies a central position by being able to communicate directly with the CPU and with
auxiliary memory devices through an I/O processor.
• When programs not residing in main memory are needed by the CPU, they are brought
in from auxiliary memory. Programs not currently needed in main memory are
transferred into auxiliary memory to provide space for currently used programs and data.
• A special very-high speed memory called a cache sometimes used to increase the speed
of processing by making current programs and data available to the CPU at a rapid rate.
Memory Hierarchy
• The cache memory is employed in computer systems to compensate for the speed
differential between main memory access time and processor logic. CPU logic is usually
faster than main memory access time, with the result that processing speed is limited
primarily by the speed of main memory.

Figure :Memory hierarchy in a computer system.


Main Memory
• The main memory is the central storage unit in a computer system. It is a relatively large and fast
memory used to store programs and data during the computer operation. The principal
technology used for the main memory is based on semiconductor integrated circuits. Integrated
circuit RAM chips are available in two possible operating modes, static and dynamic.
• The static RAM consists essentially of internal flip-flops that store the binary information. The
stored information remains valid as long as power is applied to the unit. The dynamic RAM
stores the binary information in the form of electric charges that are applied to capacitors.
• The bootstrap loader is a program whose function is to start the computer software operating
when power is turned on. Since RAM is volatile, its contents are destroyed when power is turned
off. The contents of ROM remain unchanged after power is turned off and on again. The startup
of a computer consists of turning the power on and starting the execution of an initial program.
Main Memory
RAM and ROM Chips:
• A RAM chip is better suited for communication with the CPU if it has one or more control
inputs that select the chip only when needed. Another common feature is a bidirectional
data bus that allows the transfer of data either from memory to CPU during a read
operation, or from CPU to memory during a write operation. A bidirectional bus can be
constructed with three-state buffers.

Figure: Typical RAM chip.


Main Memory
• The block diagram of a RAM chip is shown in Fig.The capacity of the memory is 128 words of
eight bits (one byte) per word. This requires a 7-bit address and an 8-bit bidirectional data bus.
The read and write inputs specify the memory operation and the two chips select (CS) control
inputs are for enabling the chip only when it is selected by the microprocessor.
• The read and write inputs are sometimes combined into one line labeled R/W. When the chip is
selected, the two binary states in this line specify the two operations or read or write.
• The function table listed in Fig.(b) specifies the operation of the RAM chip. The unit is in
operation only when CSI = 1 and CS2 = 0. The bar on top of the second select variable indicates
that this input in enabled when it is equal to 0. If the chip select inputs are not enabled, or if they
are enabled but the read but the read or write inputs are not enabled, the memory is inhibited and
its data bus is in a high-impedance state. When SC1 = 1 and CS2 = 0, the memory can be placed
in a write or read mode. When the WR input is enabled, the memory stores a byte from the data
bus into a location specified by the address input lines.
Main Memory
• When the RD input is enabled, the content of the selected byte is placed into the data bus.
The RD and WR signals control the memory operation as well as the bus buffers
associated with the bidirectional data bus.
Memory Address Map:
• a ROM can only read, the data bus can only be in an output mode. The block diagram of a
ROM chip is shown in Fig. For the same-size chip, it is possible to have more bits of
ROM occupy less space than in RAM. For this reason, the diagram specifies a 512-byte
ROM, while the RAM has only 128 bytes.
• The nine address lines in the ROM chip specify any one of the 512 bytes stored in it. The
two chip select inputs must be CS1 = 1 and CS2 = 0 for the unit to operate. Otherwise, the
data bus is in a high-impedance state. There is no need for a read or write control because
the unit can only read.
Main Memory

Figure: Typical ROM chip.


• To demonstrate with a particular example, assume that a computer system needs 512
• bytes of RAM and 512 bytes of ROM. The RAM and ROM chips

TABLE: Memory Address Map for Microprocomputer


Main Memory
Memory Connection to CPU
• RAM and ROM chips are connected to a CPU through the data and address buses. The low-
order lines in the address bus select the byte within the chips and other lines in the address
bus select a particular chip through its chip select inputs. The connection of memory chips to
the CPU is shown in Fig.
• This configuration gives a memory capacity of 512 bytes of RAM and 512 bytes of ROM. It
implements the memory map of Table 12-1. Each RAM receives the seven low-order bits of
the address bus to select one of 128 possible bytes. The particular RAM chip selected is
determined from lines 8 and 9 in the address bus .
• This is done through a 2 x 4 decoder whose outputs go to the CS1 inputs in each RAM chip.
Thus, when address lines 8 and 9 are equal to 00, the first RAM chip is selected. When 01,
the second RAM chip is selected, and so on. The RD and WR outputs from the
microprocessor are applied to the inputs of each RAM chip.
Main Memory
• The selection between RAM and ROM is achieved through bus line 10.The RAMs are
selected when the bit in this line is 0, and the ROM when the bit is 1 . The other chip
select input in the ROM is connected to the RD control line for the ROM chip to be
enabled only during a read operation.
• Address bus lines 1 to 9 are applied to the input address of ROM without going through
the decoder. This assigns addresses 0 to 511 to RAM and 512 to 1023 to ROM. The data
bus of the ROM has only an output capability, whereas the data bus connected to the
RAMs can transfer information in both directions .
Main Memory

Figure: Memory connection to the CPU.


Auxiliary Memory
• The most common auxiliary memory devices used in computer systems are magnetic
disks and tapes.
Magnetic Disks:
• A magnetic disk is a circular plate constructed of metal or plastic coated with magnetized
material. Often both sides of the disk are used and several disks may be stacked on one
spindle with read/write heads available on each surface.
• Bits are stored in the magnetized surface in spots along concentric circles called tracks.
The tracks are commonly divided into sections called sectors.

Figure: Magnetic disk.


Auxiliary Memory
Magnetic Tape:
• A magnetic tape transport consists of the electrical, mechanical, and electronic
components to provide the parts and control mechanism for a magnetic-tape
unit. The tape itself is a strip of plastic coated with a magnetic recording
medium. Bits are recorded as magnetic spots on the tape along several tracks.
• Usually, seven or nine bits are recorded simultaneously to form a character
together with a parity bit. Read/write heads are mounted one in each track so
that data can be recorded and read as a sequence of characters.
Associative Memory
• A memory unit accessed by content is called an associative memory or content
addressable memory (CAM). An associative memory is more expensive than a random
access memory because each cell must have storage capability as well as logic circuits for
matching its content with an external argument. For this reason, associative memories are
used in applications where the search time is very critical and must be very short.
• Hardware Organization:
• It consists of a memory array and logic for m words with n bits per word. The argument
register A and key register K each have n bits, one for each bit of a word. The match
register M has m bits, one for each memory word. Each word in memory is compared in
parallel with the content of the argument register. The words that match the bits of the
argument register set a corresponding bit in the match register. After the matching process,
those bits in the match register that have been set indicate the fact that their corresponding
words have been matched.
Associative Memory

Figure: Block diagram of associative memory.

• The key register provides a mask for choosing a particular field or key in the argument
word. The entire argument is compared with each memory word if the key register
contains all l' s. Otherwise, only those bits in the argument that have l's in their
corresponding position of the key register are compared.
Associative Memory
• a numerical example, suppose that the argument register A and the key register K
have the bit configuration shown below. Only the three leftmost bits of A are
compared with memory words because K has 1's in these positions.

• Word 2 matches the unmasked argument field because the three leftmost bits of the
argument and the word are equal.
• The relation between the memory array and external registers in an associative
memory is shown in Fig.
• The cells in the array are marked by the letter C with two subscripts. The first
subscript gives the word number and the second specifies the bit position in the word.
Associative Memory
• Thus cell Cij is the cell for bit j in word i. A bit Ai in the argument register is compared
with all the bits in column j of the array provided that Ki = 1. This is done for all columns
j = 1, 2, . . . , n.
• If a match occurs between all the unmasked bits of the argument and the bits in word i,
the corresponding bit M1 in the match register is set to 1. If one or more unmasked bits of
the argument and the word do not match, M1 is cleared to 0.

Figure: Associative memory of m word, n cells per word.


Associative Memory
• The internal organization of a typical cell Cij is shown in Fig. It consists of a flip-flop
storage element Fi and the circuits for reading, writing, and matching the cell. The input bit
is transferred into the storage cell during a write operation.
• The bit stored is read out during a read operation. The match logic compares the content of
the storage cell with the corresponding unmasked bit of the argument and provides an output
for the decision logic that sets the bit in Mi.

Match Logic

Figure: One cell of associative memory.


Associative Memory
• The match logic for each word can be derived from the comparison algorithm
for two binary numbers. First, we neglect the key bits and compare the
argument in A with the bits stored in the cells of the words. Word i is equal to
the argument in A if Aj = Fij for j = 1, 2, . . . , n . Two bits are equal if they are
both 1 or both 0. The equality of two bits can be expressed logically by the
Boolean function.

• where xj = 1 if the pair of bits in position j are equal; otherwise, xj = 0. For a


word i to be equal to the argument in A we must have all xi variables equal to
1. This is the condition for setting the corresponding match bit M, to 1. The
Boolean function for this condition is

• and constitutes the AND operation of all pairs of matched bits in a word.
Cache Memory
• The active portions of the program and data are placed in a fast small memory, the
average memory access time can be reduced, thus reducing the total execution time of the
program. Such a fast small memory is referred to as a cache memory. It is placed between
the CPU and main memory as illustrated in Fig.
• The cache memory access time is less than the access time of main memory by a factor of
5 to 10. The cache is the fastest component in the memory hierarchy and approaches the
speed of CPU components.
• The fundamental idea of cache organization is that by keeping the most frequently
accessed instructions and data in the fast cache memory, the average memory access time
will approach the access time of the cache.

Figure: Example of cache memory.


Cache Memory
• The basic operation of the cache is as follows. When the CPU needs to access memory, the
cache is examined. If the word is found in the cache, it is read from the fast memory. If the
word addressed by the CPU is not found in the cache, the main memory is accessed to read the
word. A block of words containing the one just accessed is then transferred from main memory
to cache memory.
• The performance of cache memory is frequently measured in terms of a
quantity called hit ratio . When the CPU refers to memory and finds the
word in cache, it is said to produce a hit . If the word is not found in
cache, it is in main memory and it counts as a miss .
• The ratio of the number of hits divided by the total CPU references to memory (hits plus
misses) is the hit ratio.
• For example, a computer with cache access time of 100 ns, a main memory access time of 1000
ns, and a hit ratio of 0.9 produces an average access time of 200 ns. This is a considerable
improvement over a similar computer without a cache memory, whose access time is 1000 ns.
Cache Memory
• The transformation of data from main memory to cache memory is referred to as a
mapping process. Three types of mapping procedures are of practical interest when
considering the organization of cache memory:
1. Associative mapping
2. Direct mapping
3. Set-associative mapping
Associative Mapping
• The fastest and most flexible cache organization uses an associative memory. The
associative memory stores both the address and content (data) of the memory word. This
permits any location in cache to store any word from main memory. The diagram shows
three words presently stored in the cache.
• The address value of 15 bits is shown as a five-digit octal number and its corresponding
12 -bit word is shown as a four-digit octal number. A CPU address of 15 bits is placed in
the argument register and the associative memory is searched for a matching address.
Cache Memory

Figure: Associative mapping cache (all numbers in octal).


• If the address is found, the corresponding 12-bit data is read and sent to the CPU. If no
match occurs, the main memory is accessed for the word. The address-data pair is then
transferred to the associative cache memory.
• If the cache is full, an address--data pair must be displaced to make room for a pair that is
needed and not presently in the cache.
Cache Memory
Direct Mapping:
• Associative memories are expensive compared to random-access
memories because of the added logic associated with each cell. The
CPU address of 15 bits is divided into two fields. The nine least
significant bits constitute the index field and the remaining six bits
form the tag field.
• The figure shows that main memory needs an address that includes
both the tag and the index bits. The number of bits in the index field
is equal to the number of address bits required to access the cache
memory.

Figure: Addressing
relationships between
main and cache
memories.
Cache Memory
• In the general case, there are 2k words in cache memory and 2n
words in main memory. The n-bit memory address is divided into
two fields: k bits for the index field and n - k bits for the tag field.
The direct mapping cache organization uses the n-bit address to
access the main memory and the k-bit index to access the cache.
• Each word in cache consists of the data word and its associated tag.
When a new word is first brought into the cache, the tag bits are
stored alongside the data bits. When the CPU generates a memory
request, the index field is used for the address to access the cache.

Figure: Direct mapping cache organization.


Cache Memory
• The tag field of the CPU address is compared with the tag in the word read
from the cache. If the two tags match, there is a hit and the desired data
word is in cache. If there is no match, there is a miss and the required word
is read from main memory. It is then stored in the cache together with the
new tag, replacing the previous value. The disadvantage of direct mapping
is that the hit ratio can drop considerably if two or more words whose
addresses have the same index but different tags are accessed repeatedly.
• To see how the direct-mapping organization operates, consider the
numerical example shown in Fig. The word at address zero is presently
stored in the cache (index = 000, tag = 00, data = 1220). Suppose that the
CPU now wants to access the word at address 02000. The index address is
000, so it is used to access the cache. The two tags are then compared.
• The cache tag is 00 but the address tag is 02, which does not produce a
match. Therefore, the main memory is accessed and the data word 5670 is
transferred to the CPU. The cache word at index address 000 is then
replaced with a tag of 02 and data of 5670.
Cache Memory
• The index field is now divided into two parts: the block field and the
word field. In a 512-word cache there are 64 blocks of 8 words each,
since 64 x 8 = 512. The block number is specified with a 6-bit field and
the word within the block is specified with a 3-bit field.
• The tag field stored within the cache is common to all eight words of
the same block. Every time a miss occurs, an entire block of eight
words must be transferred from main memory to cache memory.

Figure: Direct mapping cache


with block size of 8 words.
Cache Memory
Set-Associative Mapping:
• It was mentioned previously that the disadvantage of direct mapping is
that two words with the same index in their address but with different
tag values cannot reside in cache memory at the same time.
• A third type of cache organization, called set-associative mapping, is an
improvement over the direct mapping organization in that each word of
cache can store two or more words of memory under the same index
address. Each data word is stored together with its tag and the number
of tag-data items in one word of cache is said to form a set.

Figure: Two-way set


associative mapping cache.
Cache Memory
• The octal numbers listed in Fig. are with reference to the main
memory contents illustrated in Fig.(a). The words stored at addresses
01000 and 02000 of main memory are stored in cache memory at
index address 000. Similarly, the words at addresses 02777 and
00777 are stored in cache at index address 777.
• When the CPU generates a memory request, the index value of the
address is used to access the cache. The tag field of the CPU address
is then compared with both tags in the cache to determine if a match
occurs.
• thus the name "set-associative." The hit ratio will improve as the set
size increases because more words with the same index but different
tags can reside in cache.
Cache Memory
write-through:
• The simplest and most commonly used procedure is to update main
memory with every memory write operation, with cache memory
being updated in parallel if it contains the word at the specified
address. This is called the write-through method. This method has

the advantage that main memory always contains the same data as

the cache.
write-back:
• The second procedure is called the write-back method. In this
method only the cache location is updated during a write operation.
The location is then marked by a flag so that later when the word is
removed from the cache it is copied into main memory.
Virtual Memory
• Virtual memory is a concept used in some large computer systems
that permit the user to construct programs as though a large memory
space were available, equal to the totality of auxiliary memory. Each
address that is referenced by the CPU goes through an address
mapping from the so-called virtual address to a physical address in
main memory.
• Virtual memory is used to give programmers the illusion that they
have a very large memory at their disposal, even though the
computer actually has a relatively small main memory. A virtual
memory system provides a mechanism for translating program-
generated addresses into correct main memory locations.
Virtual Memory
Address Space And Memory Space
• An address used by a programmer will be called a virtual address, and
the set of such addresses the address space. An address in main
memory is called a location or physical address. The set of such
locations is called the memory space.
• Thus the address space is the set of addresses generated by programs
as they reference instructions and data; the memory space consists of
the actual main memory locations directly addressable for processing.
• As an illustration, consider a computer with a main-memory capacity
of 32K words (K = 1024). Fifteen bits are needed to specify a physical
address in memory since 32K = 215.
• Suppose that the computer has available auxiliary memory for storing
220 = 1024K words. Denoting the address space by N and the
memory space by M, we then have for this example N = 1024K and M
= 32K.
Virtual Memory

Figure : Relation between address and memory space in a virtual memory system.
• In a virtual memory system, programmers are told that they have the
total address space at their disposal. Moreover, the address field of the
instruction code has a sufficient number of bits to specify all virtual
addresses.
• In our example, the address field of an instruction code will consist of
20 bits but physical memory addresses must be specified with only 15
bits. Thus CPU will reference instructions and data with a 20-bit
Virtual Memory
• To map a virtual address of 20 bits to a physical address of 15 bits.
The mapping is a dynamic operation, which means that every
address is translated immediately as a word is referenced by CPU.
The mapping table may be stored in a separate memory as shown in
Fig.in main memory.

Figure: Memory table for mapping a virtual address.


Virtual Memory
Address Mapping Using Pages
• Consider a computer with an address space of 8K and a memory space
of 4K. If we split each into groups of 1K words we obtain eight pages
and four blocks as shown in Fig. At any given time, up to four pages of
address space may reside in main memory in any one of the four blocks.
• The mapping from address space to memory space is facilitated if each
virtual address is considered to be represented by two numbers: a page
number address and a line within the page. In a computer with 2p words
per page, p bits are used to specify a line address and the remaining
high-order bits of the virtual address specify the page number.
• In the example of Fig. a virtual address has 13 bits. Since each page
consists of 210 = 1024 words, the high-order three bits of a virtual
address will specify one of the eight pages and the low-order 10 bits
give the line address within the page. Note that the line address in
address space and memory space is the same; the only mapping
required is from a page number to a block number.
Virtual Memory

Figure: Address space and memory space split into groups of 1K words.
• The organization of the memory mapping table in a paged system is shown in
Fig. The memory-page table consists of eight words, one for each page. The
address in the page table denotes the page number and the content of the word
gives the block number where that page is stored in main memory. The table
shows that pages 1, 2, 5 and 6 are now available in main memory in blocks 3, 0,
1, and 2, respectively.
• A presence bit in each location indicates whether the page has been transferred
from auxiliary memory into main memory. A 0 in the presence bit indicates that
this page is not available in main memory. The CPU references a word in
memory with a virtual address of 13 bits. The three high-order bits of the virtual
address specify a page number and also an address for the memory-page table.
Virtual Memory

Figure: Memory table in a paged system.


• page table at the page number address is read out into the memory table
buffer register. If the presence bit is a 1, the block number thus read is
transferred to the two high-order bits of the main memory address register.
The line number from the virtual address is transferred into the 10 low
order bits of the memory address register.
• A read signal to main memory transfers the content of the word to the main
memory buffer register ready to be used by the CPU. If the presence bit in
the word read from the page table is 0, it signifies that the content of the
word referenced by the virtual address does not reside in main memory.
Virtual Memory
Associative Memory Page Table
The page field in each word is compared with the page number in the virtual address.
If a match occurs, the word is read from memory and its corresponding block number
is extracted.
Consider again the case of eight pages and four blocks as in the example of Fig. We
replace the random access memory-page table with an associative memory of four
words as shown in Fig. Each entry in the associative memory array consists of two
fields. The first three bits specify a field fro storing the page number. The last two bits
constitute a field for storing the block number. The virtual address is placed in the
argument register. The page number bits in the argument are compared with all page
numbers in the page field of the associative memory. If the page number is found, the
5-bit word is read out from memory. The corresponding block number, being in the
same word, is transferred to the main memory address register. If no match occurs, a
call to the operating system is generated to bring the required page from auxiliary
memory.
Figure: An associative memory
page table.
Contents
Pipeline:
• Parallel processing
• Pipelining-arithmetic pipeline
• Instruction pipeline
Multiprocessors:
• Characteristics of multiprocessors
• Inter connection structures
• Inter processor arbitration
• Inter processor communication and
synchronization.
Pipelining
• Pipelining is a technique of decomposing a sequential process
into sub operations,
• with each sub process being executed in a special dedicated
segment that operates concurrently with all other segments.
• Throughput: The amount of processing that can be
accomplished during a given interval of time .
Parallel Processing

Fig: Processor with Multiple Functional Units


Parallel Processing
• M.J. Flynn Classify the parallel processing based on the number of
instructions and data items that are manipulated simultaneously .
• Instruction Stream
- Sequence of Instructions read from memory .
• Data Stream
- Operations performed on the data in the processor .
• Flynn’s Classification divides computers into four major groups:
Parallel Processing
SISD Computer Systems

• Characteristics
- Standard von Neumann machine
- Instructions and data are stored in memory
- One operation at a time
- Parallel processing is achieved by means of multiple functional units or by
pipeline.
• Limitations
- Von Neumann bottleneck
- Maximum speed of the system is limited by the Memory Bandwidth
- Limitation on Memory Bandwidth
- Memory is shared by CPU and I/O
Parallel Processing
MISD Computer System

Characteristics

- There is no computer at present that can be classified as MISD .


Parallel Processing
SIMD Computer System

Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at a time
Parallel Processing
MIMD Computer System

Characteristics

- Multiple processing units


- Execution of multiple instructions on multiple data.
- Multiprocessors and multi computers are MIMD computers.
Parallel Processing
• The parallel processing Is achieved by
1)Pipeline Processing
2) Vector Processing
3) Array processors
Pipelining :
• A technique of decomposing a sequential process into sub operations,
with each sub process being executed in a special dedicated segment
that operates concurrently with all other segments.
Example
Ai * Bi + Ci for i = 1, 2, 3, ... , 7

R1  Ai, R2  Bi Load Ai and Bi


R3  R1 * R2, R4  Ci Multiply and load Ci
R5  R3 + R4 Add
Parallel Processing
Pipelining

Fig: Pipe Line Processing


Parallel Processing

Table: Content of register in pipeline


Parallel Processing
General Pipeline

Fig : Four-Segment Pipeline


Parallel Processing
General Pipeline

Fig: Space-Time Diagram for pipeline


Parallel Processing
Pipeline Speedup

n: Number of tasks to be performed


K: number of segments
Conventional Machine (Non-Pipelined)
tn: Clock cycle
t1: Time required to complete the n tasks
t1 = n * tn

Pipelined Machine (k stages)


tp: Clock cycle (time to complete each suboperation)
tk: Time required to complete the n tasks
tk = (k + n - 1) * tp

Speedup
Sk: Speedup

Sk = n*tn / (k + n - 1)*tp
Parallel Processing
Pipeline Speedup
Example
- 4-stage pipeline
- sub opertion in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS
Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS
Speedup
Sk = 8000 / 2060 = 3.88
4-Stage Pipeline is basically identical to the system
with 4 identical function units
Parallel Processing
General Pipeline

Fig: Multiple Functional Units

There are two are as of computer design where the pipeline


organization is use full.
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
• The Arithmetic pipeline divides an arithmetic operation into sub
operations for execution in the pipeline segments.
• The inputs for the floating point adder pipeline :
X = A x 2a
Y = B x 2b
• Here A and B are the Fractions that represents the mantissa and a
and b are the exponents.
• The floating point addition and subtraction divided into four
segments:
1.Compare the exponents
2.Align the mantissa
3.Add or subtract the mantissa
4.Normalize the result
Arithmetic Pipeline
• The following numerical example may clarify the sub operations
performed in each segment. For simplicity, we use decimal numbers,
although Fig. refers to binary numbers. Consider the two normalized
floating-point numbers:
X = 0. 9504 X 10'
Y = 0.8200 X 1<J1
• The two exponents are subtracted in the first segment to obtain 3 - 2 = 1.
The larger exponent 3 is chosen as the exponent of the result. The next
segment shifts the mantissa of Y to the right to obtain
X = 0.9504 X 103
Y = 0. 0820 X 103
• This aligns the two mantissas under the same exponent. The addition of the
two mantissas in segment 3 produces the sum
Z = 1 . 0324 X 10‘
• The sum is adjusted by normalizing the result so that it has a fraction with a
nonzero first digit. This is done by shifting the mantissa once to the right
and incrementing the exponent by one to obtain the normalized sum.
Arithmetic Pipeline

Fig : Pipeline for Floating point addition and Subtraction


Arithmetic Pipeline

• The comparator, shifter, adder-sub tractor, incrementer, and


decrementer in the floating-point pipeline are implemented with
combinational circuits. Suppose that the time delays of the four
segments are t, = 60 ns, t2 = 70 ns, t3 = 100 ns, t4 = 80 ns, and
the interface registers have a delay of t, = 10 ns.
• The clock cycle is chosen to be t, = t3 + t, = 110 ns. An
equivalent non pipeline floating point adder-sub tractor will
have a delay time t, = t, + t2 + t, + t4 + t, = 320 ns. In this case
the pipelined adder has a speedup of 32011 10 = 2. 9 over the
Non pipelined adder
Instruction Pipeline
• Pipeline processing can occur not only in the data stream but in the
instruction stream as well.
• An instruction pipeline reads consecutive instructions from
memory while previous instructions are being executed in other
segments.
• This causes the instruction fetch and execute phases to overlap and
perform simultaneous operations .
• One disadvantage with this scheme is that an instruction may cause
a branch out of sequence. In this case the pipeline must be empted
and all instructions read from memory after branch.
• A computer with an instruction fetch unit and an instruction
execution unit designed to provide a two segment pipeline.
Instruction Pipeline
• In general any computer system needs six steps to process
any instruction .
1 . Fetch an instruction from memory
2 . Decode the instruction
3 . Calculate the effective address of the operand
4 . Fetch the operands from memory
5 . Execute the operation
6 . Store the result in the proper place
• Some instructions skip some phases
- Effective address calculation can be done in the part of the decoding
phase .
- Storage of the operation result into a register is done automatically in
the execution phase .
Instruction Pipeline

Fig: Four-segment CPU pipeline


Instruction Pipeline
• Reduces the instruction pipeline into four segments. Figure shows
how the instruction cycle in the CPU can be processed with a four-
segment pipeline. While an instruction is being executed in segment
4, the next instruction in sequence is busy fetching an operand from
memory in segment 3.
• The effective address may be calculated in a separate arithmetic
circuit for the third instruction, and whenever the memory is
available, the fourth and all subsequent instructions can be fetched
and placed in an instruction FIFO.
• Thus up to four sub operations in the instruction cycle can overlap
and up to four different instructions can be in progress of being
processed at the same time.
• Figure shows the operation of the instruction pipeline. The time in
the horizontal axis is divided into steps of equal duration. The four
segments are represented in the diagram with an abbreviated symbol.
Instruction Pipeline
1. FI is the segment that fetches an instruction.
2. DA is the segment that decodes the instruction and calculates the
effective address.
3. FO is the segment that fetches the operand.
4. EX is the segment that executes the instruction.
• It is assumed that the processor has separate instruction and data memories so
that the operation in FI and FO can proceed at the same time. In the absence
of a branch instruction, each segment operates on different instructions. Thus,
in step 4, instruction 1 is being executed in segment EX; the operand for
instruction 2 is being fetched in segment FO; instruction 3 is being decoded
in segment DA; and instruction 4 is being fetched from memory in segment
FI.
• Assume now that instruction 3 is a branch instruction. As soon as this
instruction is decoded in segment DA in step 4, the transfer from FI to DA of the
other instructions is halted until the branch instruction is executed in step 6. If
the branch is taken, a new instruction is fetched in step 7. If the branch is not
taken, the instruction fetched previously in step 4 can be used. The pipeline then
continues until a new branch instruction is encountered.
Instruction Pipeline

Figure: Timing of instruction pipeline.


pipeline conflicts:
In general, there are three major difficulties that cause the instruction
pipeline to deviate from its normal operation.
1. Resource conflicts caused by access to memory by two segments at
the same time. Most of these conflicts can be resolved by using separate
instruction and data memories.
2. Data dependency conflicts arise when an instruction depends on the
result of a previous instruction, but this result is not yet available.
3. Branch difficulties arise from branch and other instructions that
change the value of PC .

You might also like