0% found this document useful (0 votes)
40 views

Unit-V: Performance Enhancement Techinques

Pipelining is a technique where multiple instructions are overlapped in execution across multiple stages. It improves instruction throughput but not individual instruction time. The document discusses the five stages of a RISC pipeline and how hazards like structural, data, and branch hazards can occur due to dependencies between instructions. It also covers techniques like forwarding, stalling, and branch prediction that help address these hazards. Finally, it provides an overview of cache memory and how the locality of reference property improves average memory access times.

Uploaded by

laalan ji
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Unit-V: Performance Enhancement Techinques

Pipelining is a technique where multiple instructions are overlapped in execution across multiple stages. It improves instruction throughput but not individual instruction time. The document discusses the five stages of a RISC pipeline and how hazards like structural, data, and branch hazards can occur due to dependencies between instructions. It also covers techniques like forwarding, stalling, and branch prediction that help address these hazards. Finally, it provides an overview of cache memory and how the locality of reference property improves average memory access times.

Uploaded by

laalan ji
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

UNIT-V

PERFORMANCE ENHANCEMENT
TECHINQUES

prepared by Geetha.G and Safa.M


What Is Pipelining?
• Pipelining is an implementation technique whereby
multiple instructions are overlapped in execution
• Pipelining exploits parallelism among the instructions in a
sequential instruction stream.
• pipelining is the key implementation technique used to
make fast CPUs.
• each step in the pipeline completes a part of an instruction.

prepared by Geetha.G and Safa.M


What Is Pipelining?
• Each of these steps is called a pipe stage or a pipe
segment
• The stages are connected one to the next to form a pipe—
instructions enter at one end, progress through the stages,
and exit at the other end
• The time required between moving an instruction one step
down the pipeline is a processor cycle . The length of a
processor cycle is determined by the time required for the
slowest pipe stage

prepared by Geetha.G and Safa.M


What Is Pipelining?
• If the stages are perfectly balanced, then the time per
instruction on the pipelined processor—assuming ideal
conditions—is equal to
• =Time per instruction on un pipelined machine
• __________________________________
• Number of pipe stages

prepared by Geetha.G and Safa.M


Five stages of Pipeline
• 1. Instruction fetch cycle(IF):
• Send the program counter (PC) to memory and fetch the
current instruction from memory.
• Update the PC to the next sequential PC by adding 4
(since each instruction is 4 bytes) to the PC.

prepared by Geetha.G and Safa.M


Five stages of Pipeline
• 2.Instruction decode/register fetch cycle(ID):
• Decode the instruction and read the registers.
• Decoding is done in parallel with reading registers, which
is possible because the register specifiers are at a fixed
location in a RISC architecture. This technique is known
as fixed-field decoding

prepared by Geetha.G and Safa.M


Five stages of Pipeline
• 3. Execution/effective address cycle (EX):
• Perform one of three functions depending on the
instruction type.

• Memory reference: The ALU adds the base register and


the offset to form the effective address

prepared by Geetha.G and Safa.M


Five stages of Pipeline
• Register-Register ALU instruction: The ALU performs the
operation specified by the opcode on the values read from
the register file.

• Register-Immediate ALU instruction: The ALU performs


the operation specified by the opcode on the first value
read from the register file and the sign-extended
immediate

prepared by Geetha.G and Safa.M


Five stages of Pipeline
• 4. Memory access (MEM):
• If the instruction is a load, memory does a read using the
effective address computed in the previous cycle. If it is a
store, then the memory writes the data from the second
register read from the register file using the effective address.
• 5.Write-back cycle(WB):
• Register-Register ALU instruction or Load instruction: Write
the result into the register file, whether it comes from the
memory system (for a load) or from the ALU (for an ALU
instruction).
prepared by Geetha.G and Safa.M
Pipeline
• starting a new instruction on each clock cycle
• don’t try to perform two different operations with the same
data path resource on the same clock cycle.
• For example, a single ALU cannot be asked to compute an
effective address and perform a subtract operation at the
same time
• Second, the register file is used in the two stages: one for
reading in ID and one for writing in WB.
• To handle reads and a write to the same register we perform
the register write in the first half of the clock cycle and the
read in the second half
prepared by Geetha.G and Safa.M
Pipeline
• we must also ensure that instructions in different stages of
the pipeline do not interfere with one another.
• This separation is done by introducing pipeline registers
between successive stages of the pipeline, so that at the
end of a clock cycle all the results from a given stage are
stored into a register that is used as the input to the next
stage on the next clock cycle

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
prepared by Geetha.G and Safa.M
Throughput and speedup
• Pipelining increases the CPU instruction throughput—the
number of instructions completed per unit of time—but it
does not reduce the execution time of an individual
instruction
• Pipeline overhead arises from the combination of pipeline
register delay and clock skew.

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
The Major Hurdle of Pipelining—
Pipeline Hazards
• There are situations, called hazards, that prevent the next
instruction in the instruction stream from executing during
its designated clock cycle.
• Hazards reduce the performance from the ideal speedup
gained by pipelining.
• There are three classes of hazards:
• 1. Structural hazards
• arise from resource conflicts when the hardware cannot
support all possible combinations of instructions
simultaneously in overlapped execution.
prepared by Geetha.G and Safa.M
• 2. Data hazards
• arise when an instruction depends on the results of a
previous instruction
• 3.Control hazards
• arise from the pipelining of branches and other
instructions that change the PC.
• Hazards in pipelines can make it necessary to stall the
pipeline

prepared by Geetha.G and Safa.M


Structural Hazards
• If some combination of instructions cannot be
accommodated because of resource conflicts, the
processor is said to have a structural hazard.
• When a sequence of instructions encounters this hazard,
the pipeline will stall one of the instructions until the
required unit is available. Such stalls will increase the CPI
from its usual ideal value of 1.

prepared by Geetha.G and Safa.M


Structural Hazards
• Some pipelined processors have shared a single-memory
pipeline for data and instructions. As a result, when an
instruction contains a data memory reference, it will
conflict with the instruction reference for a later
instruction
• To resolve this hazard, we stall the pipeline for 1 clock
cycle when the data memory access occurs. A stall is
commonly called a pipeline bubble or just bubble

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
Structural Hazards

prepared by Geetha.G and Safa.M


Data Hazards
• Data hazards arise when an instruction depends on the results
of a previous instruction in a way that is exposed by the
overlapping of instructions in the pipeline.
• Consider the pipelined execution of these instructions:
• DADD R1,R2,R3
• DSUB R4,R1,R5
• AND R6,R1,R7
• OR R8,R1,R9
• XOR R10,R1,R11
prepared by Geetha.G and Safa.M
• the DADD instruction writes the value of R1 in the WB
pipe stage, but the DSUB instruction reads the value
during its ID stage. This problem is called a data hazard

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
Minimizing Data Hazard Stalls by
Forwarding
• forwarding (also called bypassing and sometimes short-
circuiting)

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
Data Hazards Requiring Stalls
• Consider the following sequence of instructions:
• LD R1,0(R2)
• DSUB R4,R1,R5
• AND R6,R1,R7
• OR R8,R1,R9

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
prepared by Geetha.G and Safa.M
3.Control hazards
• 3.Control hazards
• arise from the pipelining of branches and other
instructions that change the PC.
• taken branch-if a branch changes the PC to its target
address
• not taken-if it falls through

prepared by Geetha.G and Safa.M


Branch Hazards
• the simplest method of dealing with branches is to redo
the fetch of the instruction following a branch, once we
detect the branch during ID
• One stall cycle for every branch will yield a performance
loss of 10% to 30% depending on the branch frequency

prepared by Geetha.G and Safa.M


Branch Hazards

prepared by Geetha.G and Safa.M


Cache memory
• The effectiveness of the cache mechanism is based
on the property of programs called locality of
reference
• Many instructions in localized areas of the program
are executed repeatedly during some time period ,
this is referred as locality of reference
• Two types :
• Temporal : Recently executed instructions are likely
to be executed again
• Spatial : Instructions in close proximity to a recently
executed instruction

prepared by Geetha.G and Safa.M


CACHE MEMORY

CACHE MAIN
CPU MEMORY

Cache memory between main memory & CPU

prepared by Geetha.G and Safa.M


Cache memory
• Cache is the fastest component in the memory hierarchy,
which approaches CPU speeds, and is placed between
CPU and memory.
• Cache stores the most frequently accessed instructions and
data.
• If the active portions of the program and data are placed in
a fast and small memory, the average memory access time
can be reduced, thus by reducing the total execution time
of the program.

prepared by Geetha.G and Safa.M


Cache memory
• The temporal aspect suggest that the instruction executed
would remain in the cache until it is needed again
• The spatial aspect suggests that instead of fetching one
item from the memory a block can be fetched i.e., a
contiguous address locations
• When a read request is received from the processor , block
of memory location which contains the requested word
location is transferred into the cache one word at a time

prepared by Geetha.G and Safa.M


Cache memory
• if any word in the same block is requested , then desired
contents are directly read by the cache

• The correspondence between the main memory and the


cache is specified by a mapping function

• Processor does not know about the existence of the cache


• The cache control circuitry determines whether the
requested word is present in the cache or not


prepared by Geetha.G and Safa.M
Cache memory
• Cache hit
• If the requested word for read/write operation is present in the
cache then it is a cache hit
• Cache Miss
• If the requested word for read/write operation is not present in
the cache then it is a cache miss

prepared by Geetha.G and Safa.M


Cache memory

• When a read miss occurs , the entire block is loaded into


the cache , and then the requested word is forwarded to the
processor

• Alternatively, the requested word may be sent directly to


the processor as soon as it is read into the cache by
reducing the waiting period of the processor
• This approach is called load-through or early restart

prepared by Geetha.G and Safa.M


Cache memory: Write Policy
• For read operation main memory is not involved.
• For a write operation system can follow two ways,
• Write through protocol
• - The cache location and the main memory are
updated simultaneously
• Disadvantage :
• - Given cache word is updated several times
during cache residency , but at every time it is updated in
main memory which is unnecessary
• Write- back protocol
• - Update only the cache location and mark it as
updated with an associated flag bit called as dirty or
modified bit. prepared by Geetha.G and Safa.M
Cache memory: Write Policy
• Disadvantage :

• - Entire block will be written back even if the single word in the block is
changed, when the block containing this marked word is to be removed
from the cache to make room for a new block

• In a write operation ,when a write miss occurs

• Write through protocol – the information is written directly to the main


memory

• Write-back protocol - the block containing the word is brought into the
cache and then the desired word in the cache is overwritten with new
information

prepared by Geetha.G and Safa.M


Mapping function
• Determines the following:

• Placement strategies:
• Where to place an incoming block in the cache.
• Consider a cache consisting of 128 blocks ,of 16
words each,for a total of 2048 words

• The main memory has 64K words, which is viewed


as 4K blocks of 16 words each

• Different types of mapping functions are


• Direct mapping
• Associative mapping
• Set – associative mapping

prepared by Geetha.G and Safa.M


DIRECT MAPPING
• In this technique , block `j` of the main memory
maps onto the block `j` modulo 128 of the cache
• So , one of the memory blocks 0,128,256 ,.. Is
loaded into in the cache block 0.
• Contention is raised for that given position even
when the cache is not full
• Contention is resolved by allowing the new block to
overwrite the resident block
• The memory address is divided into three fields

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
• The lower order 4 -bits select one of the 16 words in
the block
• The 7-bit cache block field determines the position
in which this block must be stored
• The higher 5-bits are stored in 5 tag bits associated
with each location
• As execution proceeds, 7 bits cache block field
points to a particular block in the cache , then the 5-
bits are compared with the tag bits of each location
• If there is a match , the required word is present in
the cache otherwise no

prepared by Geetha.G and Safa.M


Associative mapping
• Main memory block is placed anywhere in the
cache block position
• 12 tag bits are required to identify a memory block
resident in the cache block position
• 4 bits to identify the word in the given block
• the cost of an associative cache is higher than the
cost of direct mapping because all the 128 tags are
searched to determine whether the requested word is
present in the cache

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
Set-Associative mapping
• Combination of direct and associative mapping
techniques
• blocks of the cache are grouped into sets
• mapping allows a block of the main memory to
reside in any block within the set of the cache
• Having 64 sets means that the 6-bit set field of the
address which set of the cache contains the desired
block
• The tag field of the address is then compared with
the associative tags within the set

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
Replacement algorithm
• Replacement strategies:
• Which block to replace when a cache miss occurs.
• When a new block is to be brought into the cache
and all the positions that it may occupy are full, the
cache controller must decide which of the old blocks
to overwrite.
LRU Replacement
When a block is to be overwritten, it is sensible to
overwrite the one that has gone the longest time
without being referenced. This block is called least
recently used(LRU) block, and the technique is
called the LRU replacement algorithm.
prepared by Geetha.G and Safa.M
LRU
 Suppose it is required to track the LRU block of a four
block set in a set-associative cache.
 A 2-bit counter can be used for each block
 When a hit occurs, the counter of the block that is
referenced is set to 0.
 All other counter of the block are incremented by 1.
 When a miss occurs and the set is not full, the counter
associated with the new block loaded from the main
memory is set to 0, and the values of all other counters
are increased by 1.
prepared by Geetha.G and Safa.M
LRU
 When a miss occurs and the set is full, the block with
counter value 3 is removed, the new block is put in its
place, and its counter is et to 0.
 The other three block counters are incremented by 1.

prepared by Geetha.G and Safa.M


Memory Interleaving
• Alternative Technique to reduce memory access time
• The main memory is divided into number of
modules.
• The successive words in the address space are
placed in the different modules.
• Main memory is structured as a collection of
physically separate modules
• Each module has its own ,
• ABR – Address Buffer Register
• DBR - Data Buffer Register

prepared by Geetha.G and Safa.M


Interleaved Memory

prepared by Geetha.G and Safa.M


• Two methods of address layout ,
• First method
- The higher order ‘k’ bits name one of ‘n’ modules
- the lower order ‘m’ bits name a particular word
in the module
- For a block transfer only one module gets involved
- So only one module is kept busy at a time

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
• Second Method
- More effective method, called as memory interleaving
- the low-order ‘k’ bits of the memory address select a
module

- The high-order ‘m’ bits name a location within that


module

- In block transfer of consecutive memory words


several modules are kept busy at any one time

- Faster access to a block of data and higher average


utilization of the memory system
prepared by Geetha.G and Safa.M
prepared by Geetha.G and Safa.M
Memory Hierarchy
• A "memory hierarchy" in computer
storage distinguishes each level in the "hierarchy"
by
• Increasing speed
• Increasing cost
• Increasing size

prepared by Geetha.G and Safa.M


prepared by Geetha.G and Safa.M
Memory Hierarchy
Processor
Registers Increasing
PRIMARY
Speed
CACHE Increasing
L1
cost/bit

SECONDARY
CACHE
L2

MAIN
MEMORY

Increasing Size
SECONDARY
MEMORY

prepared by Geetha.G and Safa.M

You might also like