0% found this document useful (0 votes)

10 views

Questions That I Encountered

The document discusses techniques for identifying branch instructions in the instruction fetch unit, including the use of Branch Target Buffers (BTB), opcode pre-decoding, and compiler hints. It also explains the location and purpose of BTB and branch predictors in the pipeline, highlighting reasons for their placement in either the fetch or decode stages. Additionally, it covers the process of handling mispredictions, including flushing instructions and the impact of pipeline depth on misprediction penalties.

Uploaded by

maneabhishek5355

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Questions That I Encountered

Uploaded by

maneabhishek5355

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Questions that I encountered……

1. If you want to identify a branch instruction only in the instruction

fetch unit, the key techniques used are:

1. Using the Branch Target Buffer (BTB)

 The BTB is a cache that stores branch instruction addresses and their predicted
targets.
 The fetch unit checks the PC (Program Counter) in the BTB:
o If a match is found, the instruction is likely a branch.
o If no match, it is treated as a non-branch (until decode confirms).
 BTB lookup happens in parallel with instruction fetching, enabling early prediction.

2. Opcode Pre-Decoding (Branch Tagging in Fetch)

 Some processors use pre-decode bits to classify instructions in the fetch unit.
 As an instruction is fetched from memory, a pre-decoder examines the opcode to
check if it belongs to a branch category.
 This avoids waiting for full decoding in later pipeline stages.

3. PC Hashing or Pattern Matching

 Some architectures store a hashed pattern of past branch PCs to predict if the
fetched instruction is a branch.
 Example: Gshare predictor XORs the PC with history bits to check if the current
instruction is likely a branch.

4. Instruction Buffer Hinting

 Some architectures mark branch instructions in the instruction buffer (e.g., ARM
Thumb instructions use bit patterns).
 When fetching, the instruction buffer signals a branch hint to the predictor before
decode.

5. Using Compiler Hints (Static Prediction)

 Some ISAs use compiler-generated hints embedded in the instruction itself (e.g.,
static branch prediction in MIPS).
 The fetch unit can read these hints and prepare for a branch even before decoding.

Summary of Identifying a Branch in Fetch Unit

Technique How It Works Speed

BTB Lookup Checks if the PC exists in the BTB Fast (single cycle)
Opcode Pre-Decoding Extracts branch hints from instruction Medium (before full
Technique How It Works Speed
format decode)
PC Hashing (Gshare, Uses history & PC bits to estimate if
Fast (parallel lookup)
TAGE) branch
Instruction Buffer Precomputed branch markers in fetch
Fast
Hinting buffer
ISA-level branch annotations help early
Compiler Hints Medium
prediction

========================================
2. Where Are the BTB and Branch Predictor Located? & Why?

Branch Target Buffer (BTB):

 Located in the Instruction Fetch Stage (just after PC generation).

 The BTB stores previous branch targets and outcomes to allow speculative fetching
of the correct instruction before the branch resolves.
 It’s typically a CAM (Content Addressable Memory) or associative cache indexed
by PC.

Branch Predictor:

 Located between Instruction Fetch and Decode Stage.

 Used to make early predictions about whether a branch will be taken or not.
 It helps reduce control hazards by avoiding pipeline stalls.

Why is it located in the Fetch Stage?

 To enable early prediction and avoid pipeline stalls.

 To allow speculative instruction fetch before the branch resolves.
 Since modern processors have deep pipelines (10+ stages in some architectures),
resolving branches early significantly improves performance.

Why Is BTB Located in the Decode Stage in Some Processors?

In most modern high-performance processors, the BTB (Branch Target Buffer) is placed in
the Instruction Fetch Stage to allow early prediction. However, in some architectures, the
BTB is located in the Decode Stage.

Reasons for Placing BTB in the Decode Stage:

1. Reduced Power & Area Overhead:

o Placing the BTB in the fetch stage requires a lookup for every instruction,
including non-branch instructions, which increases power consumption.
o By moving the BTB to the decode stage, only instructions that are actually
branches will access it.
o This is useful for embedded or low-power CPUs (e.g., ARM Cortex-M series).
2. Simplified Pipeline Design:
o If a processor has a short pipeline, early branch prediction may not be as
critical.
o A decode-stage BTB ensures the instruction is confirmed as a branch before
looking up its target, avoiding unnecessary speculative fetches.
3. Architectures That Don't Rely on Aggressive Speculation:
o Some processors use delayed branching (like MIPS), where branch
resolution occurs at a predictable point, reducing the need for early BTB
lookups.
4. Trade-off Between Area and Performance:
o Fetch-stage BTBs require larger associative caches (since lookups happen at
every fetch cycle).
o A decode-stage BTB can be smaller and more efficient, as it is only accessed
when an instruction is identified as a branch.

Examples of Processors with BTB in Decode Stage:

1. MIPS R4000
o Early MIPS processors had no BTB in fetch; instead, branch targets were
computed in Decode/Execute.
o The architecture relied more on delayed branches than aggressive branch
prediction.
2. ARM Cortex-M Series (Microcontrollers)
o These processors prioritize low power over aggressive branch prediction.
o The BTB is integrated with decode logic to minimize fetch-stage complexity.
3. Older x86 Processors (Before Pentium Pro)
o Some early x86 CPUs placed branch handling logic in decode because
instruction lengths were variable, making it harder to predict branches in fetch.

Are There Other Types of BTB Placement Exceptions?

Yes! Different processors use hybrid BTB designs where branch prediction is distributed
across multiple pipeline stages.

1. Split BTB Design (Fetch & Decode BTBs)

 Some CPUs have two BTBs:

o A small, fast BTB in the fetch stage for common branches.
o A larger, more accurate BTB in the decode stage for complex branches.
 Example: AMD Zen and Intel Haswell architectures use multi-level BTBs.

2. Indirect Branch Prediction in Execution Stage

 Indirect branches (function pointers, jump tables) require target address

resolution at execution.
 Some CPUs have specialized BTBs in execution units for these cases.
 Example: Intel's Hybrid Branch Predictors handle different branch types across
multiple stages.

Summary: Why Some CPUs Put BTB in Decode Stage

Reason Impact
Reduces unnecessary BTB lookups for non-branch
Lower power consumption
instructions
Simplifies pipeline design Works well in processors with shorter pipelines
Used in embedded architectures like ARM Cortex-
More efficient for low-power CPUs
M
Useful in architectures with delayed Example: MIPS, where branches resolve
branches predictably
Helps CPUs with variable-length instructions (e.g.,
Fetch-stage complexity reduction
old x86)

Would you like a deeper look into multi-level BTBs or how modern CPUs optimize
indirect branch prediction? 🚀

How Do Branch Predictor Algorithms Work?

Basic Predictors:

1. Static Prediction:
o Simple rule-based, e.g., assume backward branches are taken (used in loops)
and forward branches are not taken.
o Used in early architectures but ineffective for complex branches.
2. Dynamic Prediction:
o Uses runtime history to predict branch behavior.
o More accurate than static prediction.

Advanced Dynamic Predictors:

1. 1-bit Predictor:
o Stores a single bit for each branch (0 = not taken, 1 = taken).
o High misprediction in cases where a branch changes frequently.
2. 2-bit Saturating Counter Predictor:
o Uses a 2-bit counter per branch address to track history.
o More resistant to fluctuations.
o States: Strongly Taken, Weakly Taken, Weakly Not Taken, Strongly Not
Taken.
3. Gshare Predictor:
o XORs the branch history register (BHR) with the program counter (PC)
before indexing into a table.
oReduces aliasing in branch prediction.
4. TAGE (Tagged Geometric Predictor):
o Uses multiple branch history lengths to adapt to short and long-term branch
behaviors.
o Best-in-class for high-performance CPUs.
5. Neural Branch Prediction:
o Uses perceptrons (machine learning models) to improve prediction accuracy
for hard-to-predict branches.
o IBM’s POWER10 and Intel’s newer chips have experimented with this.

4. How Does It Flush Instructions on a Misprediction?

How Does the Processor Flush Instructions on a Misprediction?

When a branch misprediction occurs, the processor must discard all speculative
instructions that were executed under the incorrect assumption. This process is known as a
pipeline flush. Let’s go step by step into how this works.

Step 1: Detecting the Misprediction

 The branch predictor makes an early guess about whether a branch is taken or not.
 The pipeline speculatively executes instructions based on this prediction.
 When the branch instruction reaches the Execution (EX) stage, the actual branch
outcome is computed.
 If the prediction was wrong, the misprediction is detected.

🔹 Example:

 Suppose the branch predictor predicts “branch is taken”, so it fetches instructions

from Target Address X.
 However, during execution, the branch is actually not taken, meaning instructions
from sequential address Y should have been fetched instead.
 This mismatch triggers a branch misprediction.

Step 2: Flushing the Speculative Instructions

Once a misprediction is detected, the processor removes all incorrectly fetched and
executed instructions that followed the wrong path. This is done through:

1. Flushing the Instruction Queue

 The instruction queue (or fetch buffer) holds fetched instructions before they enter
the pipeline.
 If a misprediction occurs, all instructions after the mispredicted branch are
discarded from the queue.
 The fetch unit is then redirected to fetch instructions from the correct target address.

2. Invalidating Pipeline Registers

 Every stage of the pipeline has registers that hold instructions as they progress.
 On a misprediction, these registers are marked invalid, preventing the incorrect
instructions from completing execution.

3. Resetting the Program Counter (PC)

 The PC is responsible for tracking which instruction is fetched next.

 When a misprediction occurs, the PC is updated with the correct branch target
address, forcing the fetch unit to restart from the correct location.

🔹 Example (Pipeline Flush in a 5-Stage Pipeline):

Stage Before Flush (Mispredicted Path) After Flush (Corrected Path)

Restart fetch from correct
Fetch (IF) Instruction at wrong path
target
Instruction dependent on mispredicted
Decode (ID) Flushed
branch
Branch instruction computes correct
Execute (EX) Correct path determined
outcome
Memory (MEM) Speculative instruction Flushed
Write-back
Speculative instruction Flushed
(WB)

Step 3: Handling Speculative Execution Results

 Modern processors use speculative execution to improve performance, meaning
some instructions may have already performed computations (e.g., ALU operations,
memory accesses).
 If an instruction is executed on the wrong path, its results must not be written to
registers or memory.
 The processor ensures this using two key mechanisms:

1. Reorder Buffer (ROB) – Used in Out-of-Order Execution

 The ROB stores speculative results until instructions reach retirement (commit)
stage.
 If a misprediction occurs, all speculative instructions in the ROB are discarded
before they commit to the register file or memory.
2. Store Buffer – Prevents Wrong Memory Writes

 If a speculative instruction writes to memory, its data is held in a store buffer.

 If a misprediction occurs, those writes are canceled before they reach memory.

🔹 Example (ROB Discarding Speculative Results):

ROB Misprediction Commit to

Instruction
Entry Detected? Register?
1 ADD R1, R2, R3 ✅ No ✅ Yes
2 LOAD R4, [R5] ✅ No ✅ Yes
3 MUL R6, R7, R8 (Speculative) ❌ Yes ❌ Discarded
STORE R9 → MEM
4 ❌ Yes ❌ Canceled
(Speculative)

Step 4: Restarting Execution from the Correct Path

After flushing the incorrect instructions, the processor redirects execution:

1. Fetch from the Correct Target Address – The instruction fetch unit starts fetching
the correct instructions.
2. Resume Normal Execution – The pipeline is filled with instructions from the correct
path.
3. Minimize Stalls – The CPU may use checkpointing techniques to quickly recover
from mispredictions.

🔹 Optimizations to Reduce Flush Penalty:

 Branch Prediction Improvements – Better predictors (e.g., TAGE, Neural

Predictors) reduce mispredictions.
 Early Branch Resolution – Some CPUs move branch resolution earlier in the
pipeline (e.g., Intel’s Loop Stream Detector avoids re-fetching loops).
 Checkpointing & Recovery – High-end CPUs (e.g., IBM POWER10, AMD Zen 4)
save pipeline snapshots, allowing instant rollback.

Misprediction Penalty & Impact

 Misprediction Penalty = Number of Cycles Lost Due to Flushing & Refetching.
 The deeper the pipeline, the higher the penalty.
 Some CPU architectures handle this more efficiently than others.

Processor Pipeline Depth Misprediction Penalty (Cycles)

Intel Pentium 4 (NetBurst) 20+ Stages ~20 Cycles
AMD Zen 4 16-19 Stages ~10-12 Cycles
Intel Core i7 (Alder Lake) 14-19 Stages ~10-15 Cycles
Processor Pipeline Depth Misprediction Penalty (Cycles)
ARM Cortex-A76 ~13 Stages ~8-12 Cycles
RISC-V (5-stage simple pipeline) 5 Stages ~3-4 Cycles

Key Takeaways
✅ Misprediction Detection happens in the Execution Stage, when the actual branch
outcome is computed.
✅ Pipeline Flush removes all speculative instructions from the pipeline.
✅ Reorder Buffer (ROB) ensures that speculative results do not commit to
registers/memory.
✅ PC Reset & Instruction Queue Flush restart execution from the correct branch target.
✅ Modern CPUs reduce flush penalties with better branch predictors, checkpointing,
and early resolution.

5. What Happens to the Branch Target Address if the

Pipeline Flushes?
 If the pipeline flushes due to a misprediction, the correct branch target address is
fetched from:
1. BTB (if valid entry exists)
2. Instruction Decode (if the BTB miss occurs)
3. ALU Computation (if it’s an indirect branch)
 The PC is then redirected to the correct branch target and refetch begins.

6. What Is Misprediction Penalty?

Definition:

Misprediction penalty is the number of wasted cycles due to an incorrect branch

prediction before fetching the correct instruction.

Penalty Factors:

 Pipeline Depth:
o Deeper pipelines (15+ stages) suffer higher penalties (e.g., Intel P4 NetBurst
had ~20 cycle penalty).
 Fetch-to-Execute Latency:
o The longer it takes to detect a misprediction, the worse the penalty.
 Branch Resolution Stage:
o If the branch is resolved later in the pipeline (e.g., in execution), the penalty is
higher.
Typical Penalty Values:

Processor Pipeline Depth Misprediction Penalty

Intel Pentium ~5 stages 3-4 cycles
Intel Core i7 ~14-19 stages 10-15 cycles
AMD Zen 4 ~16 stages 10-12 cycles
ARM Cortex-A76 ~13 stages 8-12 cycles

Reducing Misprediction Penalty:

1. Faster Branch Resolution: Move branch evaluation earlier in the pipeline.

2. Better Predictors: Use hybrid or machine-learning-based predictors.
3. Speculative Execution & Checkpointing: Store multiple snapshots of the pipeline
state for quick recovery.

Cambridge Igcse Ict Coursebook With Cd-Rom Revised Edition
No ratings yet
Cambridge Igcse Ict Coursebook With Cd-Rom Revised Edition
10 pages
Processor Structure and Function
100% (1)
Processor Structure and Function
55 pages
07 Branch Prediction
No ratings yet
07 Branch Prediction
35 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
17 pages
Branch Predictors
No ratings yet
Branch Predictors
41 pages
Implementing a Branch Predictor
No ratings yet
Implementing a Branch Predictor
7 pages
3 4advanced Techniques
No ratings yet
3 4advanced Techniques
10 pages
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
No ratings yet
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
22 pages
Branch Prediction - 1: Computer Architecture: A Constructive Approach
No ratings yet
Branch Prediction - 1: Computer Architecture: A Constructive Approach
29 pages
Branch Handling
No ratings yet
Branch Handling
23 pages
CS252 Graduate Computer Architecture Prediction (Con't) (Dependencies, Load Values, Data Values) February 22, 2010
No ratings yet
CS252 Graduate Computer Architecture Prediction (Con't) (Dependencies, Load Values, Data Values) February 22, 2010
54 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
7 pages
5.4 Branch Prediction Logic
No ratings yet
5.4 Branch Prediction Logic
5 pages
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
No ratings yet
What About Branches?: Branch Outcomes Are Not Known Until EXE What Are Our Options?
27 pages
Branch Prediction
No ratings yet
Branch Prediction
38 pages
17.L15 BranchPrediction
No ratings yet
17.L15 BranchPrediction
38 pages
Ue21ec341b 20240412163937
No ratings yet
Ue21ec341b 20240412163937
22 pages
05 - Pipelining - Branch Prediction
No ratings yet
05 - Pipelining - Branch Prediction
20 pages
Seminar Monday ACA
No ratings yet
Seminar Monday ACA
20 pages
Conditional Branches
No ratings yet
Conditional Branches
35 pages
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
No ratings yet
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
26 pages
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
No ratings yet
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
93 pages
Computer Architecture: Branching
No ratings yet
Computer Architecture: Branching
37 pages
lect09-adv-branch-prediction
No ratings yet
lect09-adv-branch-prediction
55 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
CA - Slides
No ratings yet
CA - Slides
28 pages
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
No ratings yet
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
20 pages
البحث الثاني
No ratings yet
البحث الثاني
10 pages
Branch_Prediction_Two_Level_c9dad57e-1c2e-47df-8284-25f3c9587a86
No ratings yet
Branch_Prediction_Two_Level_c9dad57e-1c2e-47df-8284-25f3c9587a86
2 pages
Branch Prediction: Joel Emer
No ratings yet
Branch Prediction: Joel Emer
36 pages
Lab3-Branch-Prediction-Hardware
No ratings yet
Lab3-Branch-Prediction-Hardware
16 pages
Pipeline Part 2 and Data Hazards
No ratings yet
Pipeline Part 2 and Data Hazards
11 pages
5.Branch prediction
No ratings yet
5.Branch prediction
25 pages
Folien BranchPredictionOptimization
No ratings yet
Folien BranchPredictionOptimization
5 pages
L10 PipelineHazards 3
No ratings yet
L10 PipelineHazards 3
35 pages
Branch Prediction
No ratings yet
Branch Prediction
6 pages
Branch Prediction ARM
No ratings yet
Branch Prediction ARM
14 pages
Anch Prediction
No ratings yet
Anch Prediction
183 pages
Pentium
No ratings yet
Pentium
18 pages
Pentium 1 Features and Architecture
No ratings yet
Pentium 1 Features and Architecture
5 pages
branchPred
No ratings yet
branchPred
27 pages
Branch Hazards in The Pipelined Processor: Winter 2002 CSE 141 - Topic
No ratings yet
Branch Hazards in The Pipelined Processor: Winter 2002 CSE 141 - Topic
24 pages
Aca Unit-4 Notes
No ratings yet
Aca Unit-4 Notes
23 pages
Indirector USENIX Security 2024
No ratings yet
Indirector USENIX Security 2024
18 pages
34_Fetch Directed Instruction Prefetching
No ratings yet
34_Fetch Directed Instruction Prefetching
12 pages
37_ New Case for the TAGE Branch Predictor
No ratings yet
37_ New Case for the TAGE Branch Predictor
13 pages
Control Hazard
No ratings yet
Control Hazard
20 pages
10_branchprediction
No ratings yet
10_branchprediction
49 pages
Alpha Mpu Project: General Information
No ratings yet
Alpha Mpu Project: General Information
5 pages
L11 PipelineHazards 4
No ratings yet
L11 PipelineHazards 4
30 pages
Pipeline - Instr - Super Branch
No ratings yet
Pipeline - Instr - Super Branch
48 pages
ACA-notes
No ratings yet
ACA-notes
39 pages
Modern CPU
No ratings yet
Modern CPU
14 pages
Co-4-2nd Part
No ratings yet
Co-4-2nd Part
4 pages
RISC-V Pipeline P3
No ratings yet
RISC-V Pipeline P3
24 pages
AVM-BTB Adaptive and Virtualized Multi-level Branch Target Buffer
No ratings yet
AVM-BTB Adaptive and Virtualized Multi-level Branch Target Buffer
15 pages
Lec5 PDF
No ratings yet
Lec5 PDF
23 pages
9.1.0 Branch Prediction Pentiums IBM PPC
No ratings yet
9.1.0 Branch Prediction Pentiums IBM PPC
163 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
DR - Chao Tan, Carnegie Mellon University: Computer Organization Computer Architecture
No ratings yet
DR - Chao Tan, Carnegie Mellon University: Computer Organization Computer Architecture
221 pages
Sunplus Spca 536a Datasheet
No ratings yet
Sunplus Spca 536a Datasheet
44 pages
188
No ratings yet
188
82 pages
Spesifikasi ACER Travelmate P2-P245
No ratings yet
Spesifikasi ACER Travelmate P2-P245
12 pages
Research and Realization of Reflective Memory Network: 1. Abstract
No ratings yet
Research and Realization of Reflective Memory Network: 1. Abstract
56 pages
PLC Based Instrumentation and Control
No ratings yet
PLC Based Instrumentation and Control
38 pages
Raspberry Pi 5 What Should We Expect
No ratings yet
Raspberry Pi 5 What Should We Expect
9 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
16 pages
Question Bank - ESY
No ratings yet
Question Bank - ESY
6 pages
Lab2 - Scheduling Simulation
No ratings yet
Lab2 - Scheduling Simulation
7 pages
Arduino
100% (1)
Arduino
19 pages
AHB To APB Bridge (Part - A)
No ratings yet
AHB To APB Bridge (Part - A)
4 pages
8085 PROG examples
No ratings yet
8085 PROG examples
36 pages
Device Management - csc314 Lecture Notes 2020-2021-1
No ratings yet
Device Management - csc314 Lecture Notes 2020-2021-1
13 pages
HP Tech Support Interview Questions
No ratings yet
HP Tech Support Interview Questions
27 pages
GND GND: Pcint17 Pcint16 Pcint14 Pcint18 Pcint19 Pcint20 Pcint21 Pcint22 Pcint23 Pcint0 Pcint1
No ratings yet
GND GND: Pcint17 Pcint16 Pcint14 Pcint18 Pcint19 Pcint20 Pcint21 Pcint22 Pcint23 Pcint0 Pcint1
1 page
Performance Analysis of 3D Stacked Memory Architectures in High Performance Computing
No ratings yet
Performance Analysis of 3D Stacked Memory Architectures in High Performance Computing
4 pages
Tipos de Ob s71200 s7 1500
No ratings yet
Tipos de Ob s71200 s7 1500
6 pages
Computer Science & Engineering Branch: B.Tech. Degree Course
No ratings yet
Computer Science & Engineering Branch: B.Tech. Degree Course
83 pages
Class XLL NIOS Data Entry L1
No ratings yet
Class XLL NIOS Data Entry L1
4 pages
COA Chapter 1
No ratings yet
COA Chapter 1
18 pages
Microlok II System Startup, Troubleshooting, and Maintenance
No ratings yet
Microlok II System Startup, Troubleshooting, and Maintenance
207 pages
Synopsis Shrikant
No ratings yet
Synopsis Shrikant
12 pages
2013Q4 WCDMA Product Trainning Technical Cases (2013-11)
No ratings yet
2013Q4 WCDMA Product Trainning Technical Cases (2013-11)
83 pages
20BCY100037 EXP Report
No ratings yet
20BCY100037 EXP Report
30 pages
CS2252 Notes
No ratings yet
CS2252 Notes
196 pages
CS 6303 Computer Architecture TWO Mark With Answer
100% (1)
CS 6303 Computer Architecture TWO Mark With Answer
14 pages
Strategic Management Analysis of The Strategy of The: Prof. Dr. Christian Buer
No ratings yet
Strategic Management Analysis of The Strategy of The: Prof. Dr. Christian Buer
25 pages

Questions That I Encountered

Uploaded by

Questions That I Encountered

Uploaded by

Questions that I encountered……

1. If you want to identify a branch instruction only in the instruction

1. Using the Branch Target Buffer (BTB)

2. Opcode Pre-Decoding (Branch Tagging in Fetch)

3. PC Hashing or Pattern Matching

4. Instruction Buffer Hinting

5. Using Compiler Hints (Static Prediction)

Summary of Identifying a Branch in Fetch Unit

Technique How It Works Speed

Branch Target Buffer (BTB):

 Located in the Instruction Fetch Stage (just after PC generation).

 Located between Instruction Fetch and Decode Stage.

Why is it located in the Fetch Stage?

 To enable early prediction and avoid pipeline stalls.

Why Is BTB Located in the Decode Stage in Some Processors?

Reasons for Placing BTB in the Decode Stage:

1. Reduced Power & Area Overhead:

Examples of Processors with BTB in Decode Stage:

Are There Other Types of BTB Placement Exceptions?

1. Split BTB Design (Fetch & Decode BTBs)

 Some CPUs have two BTBs:

2. Indirect Branch Prediction in Execution Stage

 Indirect branches (function pointers, jump tables) require target address

Summary: Why Some CPUs Put BTB in Decode Stage

How Do Branch Predictor Algorithms Work?

Advanced Dynamic Predictors:

4. How Does It Flush Instructions on a Misprediction?

Step 1: Detecting the Misprediction

 Suppose the branch predictor predicts “branch is taken”, so it fetches instructions

Step 2: Flushing the Speculative Instructions

1. Flushing the Instruction Queue

2. Invalidating Pipeline Registers

3. Resetting the Program Counter (PC)

 The PC is responsible for tracking which instruction is fetched next.

🔹 Example (Pipeline Flush in a 5-Stage Pipeline):

Stage Before Flush (Mispredicted Path) After Flush (Corrected Path)

Step 3: Handling Speculative Execution Results

1. Reorder Buffer (ROB) – Used in Out-of-Order Execution

 If a speculative instruction writes to memory, its data is held in a store buffer.

🔹 Example (ROB Discarding Speculative Results):

ROB Misprediction Commit to

Step 4: Restarting Execution from the Correct Path

🔹 Optimizations to Reduce Flush Penalty:

 Branch Prediction Improvements – Better predictors (e.g., TAGE, Neural

Misprediction Penalty & Impact

Processor Pipeline Depth Misprediction Penalty (Cycles)

5. What Happens to the Branch Target Address if the

6. What Is Misprediction Penalty?

Misprediction penalty is the number of wasted cycles due to an incorrect branch

Processor Pipeline Depth Misprediction Penalty

Reducing Misprediction Penalty:

1. Faster Branch Resolution: Move branch evaluation earlier in the pipeline.

You might also like