0% found this document useful (0 votes)

9 views47 pages

CA 5 Pipelining

Pipelining is a technique that allows multiple operations to be performed simultaneously, increasing the number of operations executed per second without changing the time for individual operations. It involves stages such as fetch, decode, execute, and write, with buffers to manage data flow between stages. However, hazards such as data, instruction, and structural hazards can cause stalls, and techniques like operand forwarding and instruction queues are used to mitigate these issues.

Uploaded by

angelina54320291

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views47 pages

CA 5 Pipelining

Uploaded by

angelina54320291

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 47

Pipelining

©ASHIQUE 1
Pipelining
To arrange the hardware so that more than one
operation can be performed at the same time.
In this way, the number of operations performed per
second is increased even though the elapsed time
needed to perform any operation is not changed.

©ASHIQUE 2
Basic Idea of Pipelining
Execution of a program consists of a sequence of
fetch and execute steps.
 Let Fi refers to fetch step for Instruction Ii
 Let Ei refers to execution step for Instruction Ii

Consider computer has two hardware units

 One for instruction fetch
 Other for instruction execution

©ASHIQUE 3
Basic Idea of Pipelining
The Instruction fetched by the fetch unit is deposited in
an intermediate storage buffer, B1.
This buffer is needed to enable the execution unit to
execute the instruction while the fetch unit is fetching
the next instruction.
Assume both the source and the destination of the
data operated on by the instructions are inside
Execution unit.

©ASHIQUE 4
Basic Idea of Pipelining
In the first clock cycle, the fetch unit fetches an instruction I1 (Step
F1) and stores it in buffer B1 at the end of the clock cycle.

In the second clock cycle

 The instruction fetch unit proceeds with the fetch operation for
instruction I2 (Step F2)
 The execution unit performs the operation specified by instruction I1,
which is available in buffer B1 (Step E1)
 By the end of the second clock cycle
 The execution of I1 is completed
 I2 is stored in B1, replacing I1

©ASHIQUE 5
Basic Idea of Pipelining
In this manner
 Both fetch and execution units are kept busy all the time
 The completion rate of instruction execution will be twice that
achievable by the sequential operation

Two stage pipeline in which each stage performs one

step in processing an instruction,

©ASHIQUE 6
T ime
I1 I 2 I3

F 1 E 1 F 2 E 2 F 3 E 3

(a) Sequential execution

Interstage buffer
B1

Instruction
fetch Ex
ecution
unit
unit

(b) Hardware organization

T ime
Clock cycle 1 2 3 4

Instruction
I1 F 1 E 1

I2 F 2 E 2

I3 F 3 E 3

(c) Pipelined execution

©ASHIQUE 7
4-stage Pipeline
A pipelined processor may process each instruction in 4
steps:
 F Fetch: read the instruction from the memory
 D Decode: decode the instruction and fetch the source operand(s)
 E Execute: perform the operation specified by the instruction
 W Write: store the result in the destination location.

©ASHIQUE 8
4-stage Pipeline: Buffer
Information is passed from one unit to the next through
a storage buffer
During clock cycle 4, the information in the buffer:
 Buffer B1 holds I3.
 Buffer B2 holds both the source operand(s) for I2 , the
specification of the operation to be performed, information
needed for the write step of I2.
 Buffer B3 holds the result produced by the execution unit and
the destination information for I1.

©ASHIQUE 9
Time
Clock cycle 1 2 3 4 5 6 7

Instruction

I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

(a) Instruction execution divided into four steps

Interstage buffers

D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3

(b) Hardware organization

Figure 8.2. A 4-stage pipeline.

©ASHIQUE 10
Role of Cache Memory
Each stage in a pipeline is expected to complete its operation in
one clock cycle.
 The clock period should be sufficiently long to perform the task being
performed in any stage.

If different units require different amounts of time, the clock period
must allow the longest time to be completed.
Consider the fetch step:
 The access time of the main memory may be as much as greater than
the time needed to perform basic pipeline stage operations.
 Pipelining would be of little value

The use of cache memories solves the problem

©ASHIQUE 11
Pipeline Performance
For a variety of reasons, one of the pipeline stages
may not be able to complete its processing task for a
given instruction in the time allotted.
Any time if one of the stages in the pipeline cannot
complete its operation in one clock cycle, the pipeline
stalls.
Any condition that causes the pipeline to stall is called
a hazard.

©ASHIQUE 12
Pipeline Performance
Three types of hazard:
 Data hazard
 Instruction or control hazard
 Structural hazard

Hazards result in degradation in performance.

An important goal in designing processors is
 to identify all hazards that cause the pipeline to stall
 To find ways to minimize their impact.

©ASHIQUE 13
Data Hazard
This is a situation in which the pipeline is stalled
because the data to be operated on are delayed for
some reason.
Example
 Stage E is responsible for arithmetic and logic operations and
one cycle is assigned for this task
 But some operations such as divide may require more time and
the pipeline stalls.

©ASHIQUE 14
Data Hazard
Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 F5 D5 E5

Figure 8.3. Effect of an execution operation taking more than one clock cycle.

©ASHIQUE 15
Instruction Hazard
The pipeline may be stalled because of a delay in the
availability of an instruction
 This may be a result of a miss in the cache
 Required to be fetched from the main memory

©ASHIQUE 16
Instruction Hazard
Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

(a) Instruction execution steps in successive clock cycles

Time
Clock cycle 1 2 3 4 5 6 7 8 9

Stage
F: Fetch F1 F2 F2 F2 F2 F3

D: Decode D1 idle idle idle D2 D3

E: Execute E1 idle idle idle E2 E3

W: Write W1 idle idle idle W2 W3

(b) Function performed by each processor stage in successive clock cycles

Figure 8.4. Pipeline stall caused by a cache miss in F2.

©ASHIQUE 17
Structural Hazard
This is the situation when two instructions require the use of a
given hardware resource at the same time
One instruction may need to access memory as a part of the
execute or write stage while another instruction being fetched
 Only one instruction can proceed
 Other is delayed
 Solution: Separate data and instruction cache

Another example: consider the following instruction

 Load X(R1), R2

©ASHIQUE 18
Structural Hazard
Time
Clock cycle 1 2 3 4 5 6 7

Instruction
I1 F1 D1 E1 W1

I2 (Load) F2 D2 E2 M2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4

I5 F5 D5

Figure 8.5. Effect of a Load instruction on pipeline timing.

©ASHIQUE 19
Data Hazard : More Example
We must ensure that the results obtained when instructions are
executed in a pipelined processor are identical to those obtained
when the same instructions are executed sequentially
Consider the following example
 Mul R4, R2, R3
 Add R6, R5, R4

Data dependency arises when the destination of one instruction is

used as a source in the next instruction
When two operations depend on each other, they must be
performed sequentially in the correct order.

©ASHIQUE 20
Data Hazard Example
Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I1 (Mul) F1 D1 E1 W1

I2 (Add) F2 D2 D2A E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

Pipeline stalled by data dependency between D 2 and W1.

©ASHIQUE 21
Operand Forwarding
Data hazard arises because I2 is waiting for data to be
written in the register file.

However, these data are available at the output of the

ALU once the Execute stage completes step E1.

The delay can be reduced or possibly eliminated, if we

arrange the result of I1 to be forwarded directly for use in
step E2

©ASHIQUE 22
Operand Forwarding
What happens?
After decoding instruction I2 and detecting data dependency, a
decision is made to use data forwarding.
The operand not involved in the dependency, register R4 is read and
loaded in register SRC1 in clock cycle 3.
In the next clock cycle, the product produced by instruction I 1 is
available in register RSLT and because of forwarding connection it
can be used in step E2.
Hence execution of I2 proceeds without interruption.

©ASHIQUE 23
©ASHIQUE 24
Handling Data
Hazards
In Software
An alternative approach is
◦ to leave the task of detecting data dependencies
◦ and dealing with them to the software

In this case, the compiler can introduce the two-cycle delay needed
between instructions I1 and I2 by inserting NOP (No-operation)
instructions, as follows:
I1: Mul R2, R3, R4
NOP
NOP
I2: Mul R5, R4, R6

©ASHIQUE 25
Handling Data Hazards
In Software
So such dependencies can be detected in two ways:
 Software (compiler) and hardware
Leaving tasks such as inserting NOP instructions to the compiler
leads simpler hardware.

Being aware of the need for a delay, the compiler can attempt to
reorder instructions to perform useful tasks in the NOP slots and thus
achieve better performance.

On the other hand, the insertion of NOP instruction leads to larger
code size

©ASHIQUE 26
Instruction Hazard:
Unconditional Branch
The time lost as a result of a branch instruction is
often referred to as the branch penalty
For a longer pipeline, the branch penalty may be
higher.
Reducing the branch penalty requires the branch
address to be computed earlier in the pipeline.
The instruction fetch unit has dedicated hardware
 to identify a branch instruction
 to compute the branch target address as early as possible
after an instruction is fetched

©ASHIQUE 27
Time
Clock cycle 1 2 3 4 5 6

Instruction
I1 F1 E1

I2 (Branch) F2 E2 Execution unit idle

I3 F3 X

Ik Fk Ek

Ik+1 Fk+1 Ek+1

Figure 8.8. An idle cycle caused by a branch instruction.

©ASHIQUE 28
Time
Clock cycle 1 2 3 4 5 6 7 8

I1 F1 D1 E1 W1

I2 (Branch) F2 D2 E2

I3 F3 D3 X

I4 F4 X

Ik Fk Dk Ek Wk

Ik+1 Fk+1 Dk+1 E k+1

(a) Branch address computed in Execute stage

Time
Clock cycle 1 2 3 4 5 6 7

I1 F1 D1 E1 W1

I2 (Branch) F2 D2

I3 F3 X

Ik Fk Dk Ek Wk

Ik+1 Fk+1 D k+1 E k+1

(b) Branch address computed in Decode stage

Figure 8.9. Branch timing.

©ASHIQUE 29
Instruction Queue
and Prefetching
Many processors employ sophisticated fetch units that
can fetch instructions before they are needed and put
them in instruction queue.
The instruction queue can store several instructions.
A separate unit called dispatch unit
 Takes instructions from the front of the queue
 Sends them to the execution unit
 Also Performs decoding function

It attempts to keep the instruction queue filled at all

times to reduce the impact of occasional delays when
fetching instruction.

©ASHIQUE 31
Instruction Queue
and Prefetching
When the pipeline stalls because of data hazard
 Dispatch unit is not able to issue instructions from the
instruction queue.
 Fetch unit continues to fetch instructions and add them to the
queue

When there is delay in fetching instruction

 Dispatch unit continues to issue instructions from the
instruction queue.

D : Dispatch/
E : Ex ecute W : Write
Decode
instruction results
unit

I1 F1 D1 E1 E1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 (Branch) F5 D5

I6 F6 X

Ik Fk Dk Ek Wk

Ik+1 Fk+1 Dk+1 Ek+1

Figure : Branch timing in the presence of an instruction queue.

Branch target address is computed in the D stage.

©ASHIQUE 34
Instruction Queue
and Prefetching
Branch instruction does not increase the overall
execution time
The instruction fetch unit has executed the branch
instruction(by computing the branch address)
concurrently with the execution of other instructions.
This technique is called branch folding

Branch folding occurs only if at the time a branch

instruction is encountered , at least one instruction is
available in the queue other than the branch instruction.

©ASHIQUE 35
Conditional Branch
A Conditional Branch instruction introduces added
hazard caused by the dependency of the branch
condition on the result of a preceding instruction.

The decision to branch cannot be made until the

execution of that instruction has been completed.

©ASHIQUE 36
Branch Prediction
A technique for reducing the branch penalty associated
with conditional branches is to attempt to predict whether or
not a particular branch will be taken

The simplest form of branch prediction is to assume that

the branch will not take place and to continue to fetch
instructions in sequential address order

Until the branch condition is evaluated, instruction

execution along the predicted path must be done on a
speculative basis

©ASHIQUE 37
Branch Prediction
Speculative execution means the instructions are executed
before the processor is certain that they are in the correct
execution sequence

Care must be taken that no processor registers and

memory locations are updated until it is confirmed that
these instructions should indeed be executed

If the branch decision indicates otherwise

 The instructions and all their associated data in the execution units must be
purged
 The correct instruction should be fetched and executed

©ASHIQUE 39
Branch Prediction
If branches outcomes were random, then half the
branches would be taken
Simple Approach:
 Assuming that the branches will not be taken
 Would save time lost to conditional branches 50 percent of the time

Better performance can be achieved if we arrange

some branch instructions to be predicted as taken and
others as not taken, depending on the expected
program behavior

©ASHIQUE 40
Branch Prediction
Static Branch Prediction
 The branch prediction decision is always the same
every time a given instruction is executed

Dynamic Branch Prediction

 Decision may change depending on execution
history

©ASHIQUE 41
Static Branch
Prediction
By observing whether the target address of the branch is lower
than or higher than the address of the branch instruction.

To have the compiler decide whether a given branch instruction

should be predicted taken or not taken.
 include a branch prediction bit, which is set to 0 or 1 by the
compiler to indicate the desired behavior.
 The instruction fetch unit checks this bit to predict whether the
branch will be taken or not taken

©ASHIQUE 42
Dynamic Branch
Prediction
The objective of branch prediction algorithms is to
reduce the probability of making a wrong decision, to
avoid fetching instructions that eventually have to be
discarded.

In dynamic branch prediction schemes, the processor

hardware assesses the likelihood of a given branch
being taken by keeping track of branch decisions every
time that instruction is executed.

©ASHIQUE 43
Dynamic Branch
Prediction
In its simplest form, the execution history used in
predicting the outcome of a given branch instruction is the
result of the most recent execution of that instruction.

The processor assumes that the next time the instruction

is executed, the result is likely to be the same.

Hence, the algorithm may be described by the two-state

machine. The two states are:
 LT: Branch is likely to be taken
 LNT: Branch is likely not to be taken

©ASHIQUE 44
Dynamic Branch
Prediction
Better performance can be achieved by keeping more
information about execution history.

An algorithm that uses 4 states, thus requiring two bits of

history information for each branch instruction, The four
states are:

 ST: Strongly likely to be taken

 LT: Likely to be taken
 LNT: Likely not to be taken
 SNT: Strongly likely not to be taken

©ASHIQUE 47

9th English Penguin Guide 2023 Sample Notes 2024 PDF Download
No ratings yet
9th English Penguin Guide 2023 Sample Notes 2024 PDF Download
120 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
DLCOA_6.1_Sep2024
No ratings yet
DLCOA_6.1_Sep2024
81 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Unit-3
No ratings yet
Unit-3
94 pages
Pipe Lining
No ratings yet
Pipe Lining
29 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
CSE 4293 Pipelining
No ratings yet
CSE 4293 Pipelining
36 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
61 pages
Pipelining
No ratings yet
Pipelining
26 pages
Module 5 Part2 pipelining
No ratings yet
Module 5 Part2 pipelining
36 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
Pipelining New.pptx
No ratings yet
Pipelining New.pptx
33 pages
Computer System Organization
No ratings yet
Computer System Organization
26 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
4-Pipeline
No ratings yet
4-Pipeline
30 pages
Pipelining (All Slides)
No ratings yet
Pipelining (All Slides)
45 pages
Pipe Lining
No ratings yet
Pipe Lining
12 pages
Slide 6
No ratings yet
Slide 6
46 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
Instruction Pipelining: 1 Zelalem Birhanu, Aait
No ratings yet
Instruction Pipelining: 1 Zelalem Birhanu, Aait
20 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
CA unit-2 Chapter-2
No ratings yet
CA unit-2 Chapter-2
36 pages
DDCO-Jan25-Unit5
No ratings yet
DDCO-Jan25-Unit5
30 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
SIMD Machines:: Pipeline System
No ratings yet
SIMD Machines:: Pipeline System
35 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Computer Architecture M2 (Part 3)
No ratings yet
Computer Architecture M2 (Part 3)
34 pages
Pipelining and parallel processing
No ratings yet
Pipelining and parallel processing
26 pages
8. CH14-WS_10thEd_Pipeline
No ratings yet
8. CH14-WS_10thEd_Pipeline
16 pages
Pipelining
No ratings yet
Pipelining
44 pages
CA 4 Memory Organization
No ratings yet
CA 4 Memory Organization
80 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
COA Lecture 10
No ratings yet
COA Lecture 10
22 pages
L10-L11-Instruction Pipelining
No ratings yet
L10-L11-Instruction Pipelining
38 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
CA-unit 4-Material
No ratings yet
CA-unit 4-Material
31 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
85 pages
ca 5
No ratings yet
ca 5
12 pages
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
No ratings yet
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
51 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Technical Report: TCP, UDP, and Sockets: The Service-Level Specification
No ratings yet
Technical Report: TCP, UDP, and Sockets: The Service-Level Specification
305 pages
Chapter 6 - Pipelining
0% (1)
Chapter 6 - Pipelining
61 pages
S-6 Pedagogy of English (Primary Level) (1) 1
100% (1)
S-6 Pedagogy of English (Primary Level) (1) 1
89 pages
Basic Concepts1
No ratings yet
Basic Concepts1
18 pages
Electrical Wiring in Building Design
No ratings yet
Electrical Wiring in Building Design
30 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
814-Article Text-3802-4054-10-20210613
No ratings yet
814-Article Text-3802-4054-10-20210613
21 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Lecture 09_Applications of Stacks
No ratings yet
Lecture 09_Applications of Stacks
27 pages
Maximum Power Transferred Theorem
No ratings yet
Maximum Power Transferred Theorem
19 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
31 pages
ANJANEYA PRASAD K - Resume
No ratings yet
ANJANEYA PRASAD K - Resume
2 pages
Pipeline: A Simple Implementation of A RISC Instruction Set
No ratings yet
Pipeline: A Simple Implementation of A RISC Instruction Set
16 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
CA 3 Single Bus Organization
No ratings yet
CA 3 Single Bus Organization
10 pages
CO Pipelining PDF notes
No ratings yet
CO Pipelining PDF notes
10 pages
Lecture 3
No ratings yet
Lecture 3
8 pages
CO Begins - Greedy Algorithm Fractional Knapsack
No ratings yet
CO Begins - Greedy Algorithm Fractional Knapsack
8 pages
Algorithm 7
No ratings yet
Algorithm 7
8 pages
Chapter Eight
No ratings yet
Chapter Eight
30 pages
2021 FINAL EXAM ROUTINE
No ratings yet
2021 FINAL EXAM ROUTINE
4 pages
Algorithm 8
No ratings yet
Algorithm 8
16 pages
如何撰写mla参考书目
100% (2)
如何撰写mla参考书目
10 pages
Advanced With Answers
No ratings yet
Advanced With Answers
31 pages
Bmu101107 PDF
No ratings yet
Bmu101107 PDF
76 pages
Lect3 Pipeline
No ratings yet
Lect3 Pipeline
4 pages
BAB 2 Proposal Shanty Ago
No ratings yet
BAB 2 Proposal Shanty Ago
24 pages
Bahatarai Campus: Dhanpat Rai Publication
No ratings yet
Bahatarai Campus: Dhanpat Rai Publication
1 page
Soal Primagama UTBK 01
No ratings yet
Soal Primagama UTBK 01
4 pages
Group 1 - Language Development in Infancy Ages 0-3
No ratings yet
Group 1 - Language Development in Infancy Ages 0-3
10 pages
Translation in Multimedia and Business Midterm Exam
No ratings yet
Translation in Multimedia and Business Midterm Exam
4 pages
5b_20241208_0001
No ratings yet
5b_20241208_0001
1 page
Rohini 89415543675
No ratings yet
Rohini 89415543675
12 pages
Silent Letters
No ratings yet
Silent Letters
2 pages
Minor Project On: Core Java
No ratings yet
Minor Project On: Core Java
20 pages
EE789 Assignment
No ratings yet
EE789 Assignment
10 pages
Practical List Python
No ratings yet
Practical List Python
3 pages
603 07 002 03 Capacitor Bank 250kvar 440V
No ratings yet
603 07 002 03 Capacitor Bank 250kvar 440V
12 pages
Matlab: Asst - Prof. Mrs.D.Revathi
No ratings yet
Matlab: Asst - Prof. Mrs.D.Revathi
3 pages
Struts 2 Framework Tutorial: Filterdispatcher Respectively
No ratings yet
Struts 2 Framework Tutorial: Filterdispatcher Respectively
26 pages
EUP1501
No ratings yet
EUP1501
1 page
Syncing With Subtitle Edit
No ratings yet
Syncing With Subtitle Edit
3 pages
Postmodernism (Magical Realism) Lecture Outline
No ratings yet
Postmodernism (Magical Realism) Lecture Outline
5 pages
Weekly Home Learning Plan Quarter 1 MDL
No ratings yet
Weekly Home Learning Plan Quarter 1 MDL
7 pages
Cis 313 Final Exam Review
No ratings yet
Cis 313 Final Exam Review
1 page
Computer
No ratings yet
Computer
5 pages
Eng 4111 1 Rubric C
No ratings yet
Eng 4111 1 Rubric C
8 pages
Educational Philosophy
No ratings yet
Educational Philosophy
3 pages
Practical Java Programming for IoT, AI, and Blockchain
From Everand
Practical Java Programming for IoT, AI, and Blockchain
Perry Xiao
No ratings yet
Fortinet FCP - FortiGate 7.4 Administrator Exam Preparation
From Everand
Fortinet FCP - FortiGate 7.4 Administrator Exam Preparation
Georgio Daccache
No ratings yet
Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC
From Everand
Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC
Naim Dahnoun
No ratings yet
Small Unmanned Fixed-wing Aircraft Design: A Practical Approach
From Everand
Small Unmanned Fixed-wing Aircraft Design: A Practical Approach
Andrew J. Keane
No ratings yet

CA 5 Pipelining

Uploaded by

CA 5 Pipelining

Uploaded by

Pipelining

Consider computer has two hardware units

In the second clock cycle

Two stage pipeline in which each stage performs one

(a) Sequential execution

(b) Hardware organization

(c) Pipelined execution

(a) Instruction execution divided into four steps

(b) Hardware organization

Figure 8.2. A 4-stage pipeline.

The use of cache memories solves the problem

Hazards result in degradation in performance.

(a) Instruction execution steps in successive clock cycles

D: Decode D1 idle idle idle D2 D3

E: Execute E1 idle idle idle E2 E3

W: Write W1 idle idle idle W2 W3

(b) Function performed by each processor stage in successive clock cycles

Figure 8.4. Pipeline stall caused by a cache miss in F2.

Another example: consider the following instruction

Figure 8.5. Effect of a Load instruction on pipeline timing.

Data dependency arises when the destination of one instruction is

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

However, these data are available at the output of the

The delay can be reduced or possibly eliminated, if we

I2 (Branch) F2 E2 Execution unit idle

Ik+1 Fk+1 Ek+1

Figure 8.8. An idle cycle caused by a branch instruction.

Ik+1 Fk+1 Dk+1 E k+1

(a) Branch address computed in Execute stage

Ik+1 Fk+1 D k+1 E k+1

(b) Branch address computed in Decode stage

Figure 8.9. Branch timing.

It attempts to keep the instruction queue filled at all

When there is delay in fetching instruction

Ik+1 Fk+1 Dk+1 Ek+1

Figure : Branch timing in the presence of an instruction queue.

Branch folding occurs only if at the time a branch

The decision to branch cannot be made until the

The simplest form of branch prediction is to assume that

Until the branch condition is evaluated, instruction

Care must be taken that no processor registers and

If the branch decision indicates otherwise

Better performance can be achieved if we arrange

Dynamic Branch Prediction

To have the compiler decide whether a given branch instruction

In dynamic branch prediction schemes, the processor

The processor assumes that the next time the instruction

Hence, the algorithm may be described by the two-state

An algorithm that uses 4 states, thus requiring two bits of

 ST: Strongly likely to be taken

You might also like