Canvas Pipelining and Parallel Processors
Canvas Pipelining and Parallel Processors
Stages of a Pipeline
3. Execute: The operation is executed using the ALU (Arithmetic Logic Unit) or other resources.
Each stage works in parallel with others, processing different instructions at the same time. For
example, while one instruction is being executed, another can be decoded, and yet another fetched.
Throughput refers to the number of instructions completed per unit of time. By overlapping
tasks, pipelining increases throughput significantly.
Speedup is the ratio of the time taken to execute instructions without pipelining to the time
taken with pipelining.
For an ideal pipeline with stages and no delays, the speedup approaches , meaning the pipeline is -
times faster than a single-stage execution process.
Pipeline Hazards
Pipeline hazards are situations that prevent the next instruction in the pipeline from executing during
its designated clock cycle. These can disrupt the smooth operation of the pipeline.
1. Structural Hazards:
o Occur when two or more instructions require the same hardware resource at the
same time.
2. Data Hazards:
o Arise when instructions depend on the results of previous instructions that have not
yet completed.
o Example: An instruction tries to use a value that is still being calculated by a previous
instruction.
3. Control Hazards:
o Occur due to the change in instruction flow, such as after a branch or jump
instruction.
o Example: A pipeline might fetch the wrong instruction following a branch until the
branch outcome is known.
Parallel Processors
Parallel processing involves the use of multiple processors to perform computations simultaneously,
thereby increasing computational power and reducing execution time. It is essential in modern
computing to handle complex and large-scale problems.
Parallel processors are systems that use two or more processors to execute tasks
concurrently.
o Distributed Memory Systems: Each processor has its private memory, and
communication occurs over a network.
3. Fault Tolerance: Some systems can continue operation even if one processor fails.
Parallel processors often face challenges when accessing shared memory and maintaining cache
coherency.
o Multiple processors may need to read or write to the same memory location
simultaneously.
2. Cache Coherency:
o In systems with multiple processors, each processor may have its private cache.
o Cache Coherency ensures that all caches have the most recent value of shared data.
o Example Problem: If Processor A updates a variable in its cache, Processor B must
see this updated value in its cache.
MESI Protocol: Maintains four states for a cache block — Modified, Exclusive, Shared, Invalid.
Directory-Based Protocols: Use a central directory to track the state of each cache block.
Conclusion
Pipelining and parallel processing are foundational concepts in modern computing, enhancing the
speed and efficiency of processors. While pipelining increases instruction throughput by overlapping
stages, parallel processing harnesses the power of multiple processors to handle larger workloads.
However, challenges like pipeline hazards and cache coherency must be addressed to fully realize the
potential of these techniques.
Branch instructions pose a significant challenge in instruction pipelines as they can disrupt the
sequential flow of program execution. These branches can either be unconditional or conditional,
and each type requires specific handling to maintain pipeline efficiency.
1. Unconditional Branch:
o Always alters the program flow by loading the Program Counter (PC) with the target
address.
2. Conditional Branch:
o If the condition is not satisfied, the execution continues with the next sequential
instruction.
Several techniques are employed to mitigate the disruption caused by branch instructions:
Process: Prefetch both the target instruction and the next sequential instruction after the
branch.
Advantage: Reduces branch penalties by ensuring the correct instruction stream is already
fetched based on the branch outcome.
Challenge: Wastes resources if the branch outcome differs from the prediction.
2. Branch Target Buffer (BTB)
Process:
o Stores addresses of previously executed branch instructions along with their target
addresses.
o When a branch instruction is decoded, the pipeline searches the BTB for the target
address.
Advantage: Faster execution of repetitive branch patterns as target instructions are readily
available.
Fallback: If the target address is not in the BTB, the pipeline fetches it and updates the BTB
for future use.
3. Loop Buffer
What is it?: A small, high-speed register file maintained by the fetch stage of the pipeline.
Process:
o Executes the loop directly from the buffer without accessing memory.
Condition: The loop mode is removed after the final branch out.
4. Branch Prediction
What is it?: Uses logic to predict the outcome of a conditional branch before it is executed.
Process:
o If incorrect, the pipeline must flush the incorrect instructions and fetch the correct
path.
o Static Prediction: Based on simple rules (e.g., "always predict not taken").
5. Delayed Branch
What is it?: A compiler-level optimization that rearranges the code to minimize branch
penalties.
Process:
o Inserts no-op (no operation) instructions or useful instructions after a branch to keep
the pipeline busy while fetching the target instruction.
Example:
Advantage: Keeps the pipeline active, reducing idle cycles caused by branch instructions.
Prefetching Reduces penalties for branch decisions Wastes resources if prediction fails
These techniques aim to minimize the disruption caused by branch instructions, ensuring smoother
execution of the instruction pipeline and improving overall performance.