100% found this document useful (1 vote)
614 views

COAL Assignment (Y86 Processor Architecture)

The document discusses the Y86 processor architecture, including its instruction set, programmer visible state such as registers and memory, and sequential and pipelined implementations. It provides details on the register set, condition codes, program counter, memory, status registers, and types of instructions in the Y86 instruction set such as simple, move, stack, arithmetic, jump, and function call instructions. Pipeline hazards and methods for avoiding them like stalling and forwarding are also covered.

Uploaded by

Rafia Khuram
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
614 views

COAL Assignment (Y86 Processor Architecture)

The document discusses the Y86 processor architecture, including its instruction set, programmer visible state such as registers and memory, and sequential and pipelined implementations. It provides details on the register set, condition codes, program counter, memory, status registers, and types of instructions in the Y86 instruction set such as simple, move, stack, arithmetic, jump, and function call instructions. Pipeline hazards and methods for avoiding them like stalling and forwarding are also covered.

Uploaded by

Rafia Khuram
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Name: Rafia khuram

Roll No: 0024-BSCS-19

Section: C

Course: Computer Architecture and Assembly Language

Topic: “Y86 Processor Architecture”


Contents
Processor Architecture:...............................................................................................................................3
General Diagram of a Processor:.............................................................................................................3
Introduction to Y86 processor architecture:.................................................................................................3
The Y86 instruction set Architecture:......................................................................................................4
Programmer visible state:........................................................................................................................4
Program registers.................................................................................................................................4
Condition code:...................................................................................................................................4
Program counter (PC):.........................................................................................................................5
Memory:..............................................................................................................................................5
Status registers:....................................................................................................................................5
Y86 instruction set:..................................................................................................................................5
Simple instruction:...............................................................................................................................5
2) Move instruction:............................................................................................................................6
3) Stack operations:.............................................................................................................................8
4) Arithmetic and logical Operation:...................................................................................................9
5) Jump instructions:.........................................................................................................................10
6) Conditional move instructions:......................................................................................................10
.................................................................................................................................................................. 11
7) Function call and return instructions:............................................................................................11
RISC and CISK instruction set:.................................................................................................................11
Comparison of CISC and RISC:...........................................................................................................12
Y86 instruction Set CISC/RISC:...........................................................................................................13
Sequential Y86 Implementations:..............................................................................................................13
SEQ Hardware Structure:......................................................................................................................13
Diagram:............................................................................................................................................14
SEQ stage implementation:...................................................................................................................14
Y86 implementation Observation:.........................................................................................................18
General principles of pipelining:...............................................................................................................19
Introduction to pipelined registers:........................................................................................................19
Advantages and Disadvantages of using pipeline implementation:...................................................19
Pipelining implementation:....................................................................................................................20
Fetch stage:........................................................................................................................................20
Decode stage:....................................................................................................................................20
Next PC prediction:...........................................................................................................................21
Execution stage:.................................................................................................................................21
Diagram:............................................................................................................................................23
Pipeline hazards:....................................................................................................................................23
Avoiding Data Hazards by Stalling:......................................................................................................24
Diagram:............................................................................................................................................24
Detailed Mechanism:.........................................................................................................................24
Load/use hazards:..................................................................................................................................25
Avoiding Data Hazards by Forwarding:................................................................................................25
Explanation:.......................................................................................................................................25

Processor Architecture:
The word “architecture” typically refers to building design and construction. The most important
type of hardware design is computer’s architecture. The design of the processor determines what
software can run on the computer and what other hardware components are supported. For
example, Intel's x86 processor architecture is the standard architecture used by most PCs.

General Diagram of a Processor:


Introduction to Y86 processor architecture:
Y86 was inspired by the IA32 instruction set, which is colloquially referred to as “x86.” Compared with
IA32, the Y86 instruction set has fewer data types, instructions, and addressing modes. It also has a
simpler byte level encoding. Still, it is sufficiently complete to allow us to write simple programs
manipulating integer data.

The Y86 instruction set Architecture:


The Y86 instruction set architecture includes

1) Defining the different state element

2) The set of instructions and their codings

3) The set of programming conventions

4) Handling of exceptional events


Programmer visible state:
Each instruction in a Y86 program can read and modify some part of the processor state. This is
referred to as the programmer-visible state, where the “programmer” in this case is either
someone writing programs in assembly code or a compiler generating machine-level code.

Registers:

Registers are high performance storage used to manipulate data. There are eight program
registers:

Program registers
Programs for Y86 access and modify the program registers and %eax, %ecx, %edx, %ebx, %esi,
%edi, %esp, and %ebp each stores a word. Register %esp is used as the stack pointer register by
the push, pop, call and return instructions. Otherwise, the registers have no fixed meaning or
values.

Condition code:
There are three single-bit condition codes, ZF, SF, and OF, storing information about the effect
of the most recent arithmetic or logical instruction.

ZF: The last ALU operation produced a 0

SF: The last ALU operation produced a negative number

0F: The last ALU operation produced an over flow

Program counter (PC):


The program counter (PC) holds the address of the instruction currently being executed.
Memory:
The memory is conceptually a large array of bytes, holding both program and data. Y86
programs reference memory locations using virtual addresses

Status registers:
This indicates whether the program is running normally, or some special event has occurred.

AOK Normal operation

HLT halt instruction encountered

ADR Bad address encountered

INS Invalid instruction encountered

Y86 instruction set:


We use instruction set as a target for our processor implementations. The set of Y86 instructions
is largely a subset of the IA32 instruction set. It includes only 4-byte integer operations, has
fewer addressing modes, and includes a smaller set of operations. Since we only use 4-byte data,
we can refer to these as “words” without any ambiguity.

 Some instructions are just 1 byte long, but those that require operands have longer
encodings. First, there can be an additional register specifier byte, specifying either one
or two registers. These register fields are called rA and rB
 Instructions that have no register operands, such as branches and call, do not have a
register specifier byte.
 Those that require just one register operand (irmovl, pushl, and popl) have the other
register specifier set to value 0xF.
 Some instructions require an additional 4-byte constant word. This word can serve as the
immediate data for irmovl, the displacement for rmmovl and mrmovl address specifiers,
and the destination of branches and calls.

Simple instruction:
 Halt
The halt instruction stops instruction execution. . For Y86, executing the halt
instruction causes the processor to stop, with the status code set to HLT.
Byte 0 1 2 3 4 5 6

halt
0 0
 nop
This program instruction does nothing.
Byte 0 1 2 3 4 5 6

nop
1 0

2) Move instruction:
The movl instruction is split into four different instructions, the source is either immediate(i),
register(r) or memory(m) and is designated by the first character in the instruction name. The
destination is either register(r) or memory (m) and is designated by the second character in the
instruction name.

 irmovl: ((i)immediate to (r)register move instruction)

This instruction is immediate to register move instruction and l in the instruction name
indicates that we are moving a double word.
Byte 0 1 2 3 4 5 6

irmovl (v ,rB)
3 0 F rB V

Immediate refers to the constant value that we want to encode in the instruction.
So the instruction R[%rB]  val moves a constant value to the register. Suppose the
constant value is CODE and we want to move it into register %rB (any value
suppose 9) .

irmovl (0xCODE, %rB)


irmovl (0xCODE, %rB) 3 0 F 9 Immediate value
3 0 F 9 DE CO 0 00 0 00 0 00
Encoding: 0 0 0

The above instruction will be represented in memory as 0x000000000000CODEF930.

 rrmovl: ((r)register to (r)register move instruction)

This instruction is a register to register move instruction while the l in the instruction
name indicates that we are moving a double word.
Byte 0 1 2 3 4 5 6
rrmovl (rA ,rB) 2 0 rA rB

The above instruction works in a way like R[%rB] <- R[%rA] here , the register rA is source and
%rB destination register. Suppose the value in register %rA is 8 and %rB is 9.

rrmovl (%rA,%rB)
2 0 8 9
Encoding:
While encoding the above instruction we will take the lower order byte first that is 20 and
then the next byte that is 89. The above instruction will be encoded in memory as
0x8920.
 mrmovl: (memory(m) to register (r) move instruction)
This instruction is a memory(m) to register move instruction where l in the instruction
name indicates that we are going to move a double word.

Byte 0 1 2 3 4 5 6

mrmovl (D(rB) ,rA)


5 0 rA rB D

The place where we are going to put the value is in a register file. For this Y86 processor
uses a mode called register + offset which means that we read a value out of a register
and we add an offset to that value that will give us the location in memory that we we are
going to use to access the data.
So with an instruction mrmovl 4%(rB), %rA (assume r[%rB]=0x4000)

mrmovl (4(%rB), %rA)


5 0 rA rB D(address offset)

mrmovl (4(%rB), %rA)


5 0 9 4 04 00 0 00 0 00 00
0 0
Encoding:
The above instruction will be represented in memory as 0x000000000000049450

 rmmovl: (register (r) to memory(m) move instruction)


This instruction is a register to memory move instruction where l in the instruction name
indicates that we are going to move a double word.
Byte 0 1 2 3 4 5 6

rmmovl ((rA) ,D(rB))


4 0 rA rB D
In this move instruction, we take a value in the register and we move it to memory and we will
be using the same addressing mode as in the previous memory to register move instruction.
So with an instruction rmmovl %rA, 4 %( rB) (assume r[%rB]=0x4000)

rmmovl (%rA ,4(%rB))


4 0 rA rB D(address offset)

rmmovl (%rA ,4(%rB)) %r


4 0 9 4 04 00 0 00 0 00 00
0 0
Encoding:
The above instruction will be represented in memory as 0x000000000000049440

3) Stack operations:
 pushl
The pushl and popl instructions implement push and pop, just as they do in IA32.The
pushl rA instruction works in a way that it first moves the stack pointer by subtracting 4
in case of a double word to create space for the new value and then put the new value into
that memory location.

Byte 0 1 2 3 4 5 6

pushl rA
A 0 rA F

Encoding:
pushl rA A 0 0 F

The above instructions will be represented in memory as 0x0FA0

 Popl:
The popl instruction basically reverses the order in which we pushed the value in the
stack. It first takes the memory referenced by the stack pointer and put into rA. Then after
that increment the stack pointer.
Byte 0 1 2 3 4 5 6

Popl rA
B 0 rA F
Encoding:

popl rA B 0 0 F

The above instruction will be encoded in memory as 0x0FB0.

4) Arithmetic and logical Operation:


There are four integer operation instructions. These are addl, subl, andl, and xorl. They
operate only on register data, whereas IA32 also allows operations on memory data. These
instructions set the three condition codes ZF, SF, and OF (zero, sign, and overflow).

Byte 0 1 2 3 4 5 6

OPl rA ,rB
6 fn rA rB

The specific encodings of the integer operations are :

5) Jump instructions:
The seven jump instructions are jmp, jle, jl, je, jne, jge, and jg. Branches are taken according
to the type of branch and the settings of the condition codes. The branch conditions are the same
as with IA32.

Byte 0 1 2 3 4 5 6
jXX Dest 7 fn Dest

The specific encodings of the jump instructions are :

6) Conditional move instructions:


There are six conditional move instructions cmovle, cmovl, cmove, cmovne, cmovge, and
cmovg. These have the same format as the register-register move instruction rrmovl, but the
destination register is updated only if the condition codes satisfy the required constraints.

Byte 0 1 2 3 4 5 6

cmovXX rA, rB
2 fn rA rB

The specific encodings for the conditional move instructions are :

7) Function call and return instructions:


 call
The call instruction pushes the return address on the stack and jumps to the destination
address.
Byte 0 1 2 3 4 5 6

Call Dest
8 0 Dest

 ret
The ret instruction returns from such a call.

Byte 0 1 2 3 4 5 6

ret
9 0

RISC and CISK instruction set:


CISC stands for Complex Instruction Set Computing, and is a type of microprocessor in which
single instructions can execute several low level operations (such as a load from memory, a
memory store and an arithmetic operation) or are capable of multi-step operations or addressing
modes within single instructions.

RISK stands for Reduced Instruction Set Computer, and is a type of microprocessor
architecture that utilizes a small, highly optimized set of instructions rather than a more
specialized set of instructions often found in other type of architectures.

Comparison of CISC and RISC:

RISK
CISK
1) CISK emphasizes efficiency in 1) RISC emphasizes efficiency in cycles
instruction program. per instruction.
2) CISC has an emphasis on smaller code 2) RISC needs more RAM.
size and uses less RAM overall than
CISC.
3) Variable-length encodings. IA32 3) Fixed-length encodings. Typically all
instructions can range from 1 to 15 instructions are encoded as 4 bytes.
bytes
4) Arithmetic and logical operations 4) Arithmetic and logical operations only
can be applied to both memory and use register operands. Memory referencing
register operands. is only allowed by load instructions. This
convention is referred to as load/store
architecture.
5) CISK is the original microprocessor 5) RISK is redesigned ISA that emerged in
ISA the early 1980s.
6) In CISK, instructions can take several 6)Single cycle instructions
clock cycles.
7) Hardware Centric Design 7) Software Centric Design

 The ISA does as much as possible  High level compilers take most of the
using hardware circuitry burden of coding many software steps
from the programmer

8) Condition codes. Special flags are set as 8) No condition codes. Instead, explicit test
a side effect of instructions and then used instructions store the test results in normal
for conditional branch testing registers for use in conditional evaluation.

9) Stack-intensive procedure linkage. The 9) Register-intensive procedure linkage.


stack is used for procedure arguments and Registers are used for procedure arguments and
return addresses. return addresses.

10) Examples: 10) Examples:


VAX, Motorola 68000 family, System/360, ARM, PA-RISC, power architecture, Alpha,
AMD and the Intel x86 CPUs. AVR, ARC and the SPARC.

Y86 instruction Set CISC/RISC:


The Y86 instruction set uses the attributes of both CISK and RISK instruction sets. On the CISC
side, it has condition codes, variable-length instructions, and stack-intensive procedure linkages.
On the RISC side, it uses load-store architecture and a regular encoding. It can be viewed as
taking a CISC instruction set (IA32) and simplifying it by applying some of the principles of
RISC.

Sequential Y86 Implementations:

We describe a processor called SEQ (for “sequential” processor). On each clock cycle, SEQ
performs all the steps required to process a complete instruction. This would require a very long
cycle time, however, and so the clock rate would be unacceptably low. Our purpose in
developing SEQ is to provide a first step toward our ultimate goal of implementing an efficient,
pipelined processor.
SEQ Hardware Structure:
The computations required to implement all of the Y86 instructions can be organized as a
series of six basic stages: fetch, decode, execute, memory, write back, and PC update.
The hardware units are associated with the different processing stages:
 Fetch
Using the program counter register as an address, the instruction memory reads the bytes
of an instruction. The PC incrementer computes valP, the incremented program counter.
 Decode
The register file has two read ports, A and B, via which register values valA and valB are
read simultaneously.
 Execute
The execute stage uses the arithmetic/logic (ALU) unit for different purposes according
to the instruction type. For integer operations, it performs the specified operation. For
other instructions, it serves as an adder to compute an incremented or decremented stack
pointer, to compute an effective address, or simply to pass one of its inputs to its outputs
by adding zero. The condition code register (CC) holds the three condition-code bits.
New values for the condition codes are computed by the ALU. When executing a jump
instruction, the branch signal Cnd is computed based on the condition codes and the jump
type.
 Memory:
The data memory reads or writes a word of memory when executing a memory
instruction. The instruction and data memories access the same memory locations, but for
different purposes.
 Write back
The register file has two write ports. Port E is used to write values computed by the ALU.
while port M is used to write values read from the data memory.

Diagram:

We use the following drawing conventions:


 Hardware units are shown as light boxes. These include the memories, the ALU, and so
forth. We will use the same basic set of units for all of our processor implementations.
We will treat these units as “black boxes” and not go into their detailed designs.
 Control logic blocks are drawn as gray rounded rectangles.These blocks serve to select
from among a set of signal sources, or to compute some Boolean function. We will
examine these blocks in complete detail, including developing HCL descriptions.
 Wire names are indicated in white round boxes.These are simply labels on the wires, not
any kind of hardware element.
 Word-wide data connections are shown as medium lines. Each of these lines actually
represents a bundle of 32 wires, connected in parallel, for transferring a word from one
part of the hardware to another.
Byte and narrower data connections are shown as thin lines.Each of these lines actually
represents a bundle of four or eight wires, depending on what type of values must be
carried on the wires.
Single-bit connections are shown as dotted lines.These represent control values passed
between the units and blocks on the chip.
SEQ stage implementation:
As a first step, we describe a processor called SEQ (for “sequential” processor). On each clock
cycle, SEQ performs all the steps required to process a complete instruction. This would require
a very long cycle time, however, and so the clock rate would be unacceptably low. Our purpose
in developing SEQ is to provide a first step toward our ultimate goal of implementing an
efficient, pipelined processor.

 Fetch
Using the program counter register as an address, the instruction memory reads the bytes
of an instruction. The PC incrementer computes valP, the incremented program counter.
For the fetch stage our goal is to read the instruction from memory.
The parts of processor involved in the fetch stage:
(1) The program counter
(2) Memory
(3) PC increment
(4) Logic to identify invalid instruction

Explanation:

We have memory and a program counter and we are going to use the value in the
program counter to read something from memory and then will take what is in
memory and will load it in some internal Register called instruction.

Diagram:
Rounded boxes represent logic and oval shapes represent internal registers.

Now instead of just reading the instruction and one big register, we will divide
the big register into small registers because we know that the op code or i code is
going to be crucial when determining the instruction that we are executing. Many
of these instructions will have registers rA and rB so when we read from memory
we are going to populate these registers and then based on the op code that will
tell us how many bytes we have to read we will figure out how to increment PC
and produce a potential next PC value which will be called valP.

 Decode
This phase determines what to read from register file and then read those values.The
register file has two read ports, A and B, via which register values valA and valB are read
simultaneously. The parts of the processor involved in decode stage are:
(1) rA, rB and icode from the instruction
(2) Register file
(3) Logic to determine which registers are used to produce valA and valB

Explanation:
Normally, We will take rA and rB and pass it in some logic that will then tell
the register file which registers to read. By reading those registers we are
going to produce values that we are going to call valA and valB and we are
going to name some of the signals in inputs and outputs. This implementation
may work on move instructions, jump instructions and ALU instructions but
not for push/pop and call/ret instructions.
Push, pop, call and return have some implied register called %esp which is
the stack pointer and is in the register file. Sometimes we are going to need
that to read values. In addition to sending rA and rB in the logic blocks we
also send the Op code.
Diagrams:
 Execute
This stage basically uses the ALU and set the condition codes. The parts of the processor
involved in this stage are:
(1) valA, valB from the register file
(2) valC from the instruction
(3) ALU
(4) Condition Codes
(5) Logic to a) Select inputs in ALU, b) Set Condition codes
Explanation:
The execute stage uses the arithmetic/logic (ALU) unit for different purposes
according to the instruction type. For integer operations, it performs the
specified operation. For other instructions, it serves as an adder to compute an
incremented or decremented stack pointer, to compute an effective address, or
simply to pass one of its inputs to its outputs by adding zero. The condition
code register (CC) holds the three condition-code bits. New values for the
condition codes are computed by the ALU. When executing a jump
instruction, the branch signal Cnd is computed based on the condition codes
and the jump type.

Diagram:
Push and Pop Instructions:
Push, pop, call and ret instructions also need to decrement the stack. We use
ALU to add 0 to some other instructions particularly register to register move
and the irmovl . We can actually reduce the wiring and allow valA to link
directly to the register file for a larger class of instructions. We have to think
when we increment or decrement the value of the stack where we have to
feed the plus or minus. In stack pointer, valB will come out and get into the
ALU B and then the plus or minus to go in the ALU A. On the other hand for
immediate or register to register move the register we are reading is valA and
going to go in the ALU A. ALU B will have 0 and so we normally get valB
into the ALU and now we want input to make decision about which one to
pick so we will extend our icode cuircuit and icode signal to also go into the
ALU. Next we will set the condition codes only if we have used ALU so now
we will use a logic block.
Diagram:
 Memory
The data memory reads or writes a word of memory when executing a memory
instruction. The instruction and data memories access the same memory locations, but for
different purposes. The parts of processor involved in this stage are:
(1) Memory
(2) valE (address)
(3) valA (Address)
(4) valP (from the PC increment)

Explanation:
We use the ALU for address calculation and after that valE contains our
address so clearly that will go into a logic block that tells us where we are
reading and writing data in our memory. In a case when we want to write
something into memory, we will use register rB and valB to do address
calculation so the contents of valA we want to write into memory so we are
going to grab that signal and sent it into the data logic. We are goint to
introduce a new logic that tells what register to use to hold data from
memory and source for this will be valA and rA will be the destination
register and we will feed that into dstM.
Diagram:

Push and pop instruction:


When we call push we decrement the stack pointer first and then will move
the data in but when we do a pop we read from a location in the stack pointer
and then change the the stack pointer. So for the push operation we need to
go to the ALU to decrement the stack pointer so that we can use the same
address line that we used for register memory operations but for pop we need
a value from the stack before it goes thorough the ALU. So we steal the
signal from valB and save that the address that we are going to use for pop
operation. To decide which of these addresses to use valE or valB we need to
get information from the code that will allow us to make that decision.
Data from push comes from valA when we do a pop we are reading data
from memory it goes to valM it too will come in the memory port.
Call and ret instruction:
When we do a call we are pushing the value of the address of the next
instruction and we will take valP direct that to the data, the logic block will
take the op code. When we do a return we pop a value out of the stack and
instead of writing it in the register file it goes in piece of logic that is going to
calculate what the next instruction is.

 Write back
In this phase we write values into the register files. The parts of the processor involved in
this phase are:
(1) Register file
(2) ValM (from memory)
(3) ValE (from the ALU)
(4) Cond (Conditional move)
Explanation:
First of all we are going to introduce an E port and the value is going to come
from valE. Normally it comes from rB but depending on op code we may or
may not write this.In normal case when we want to write into register rB we
are going to tell dstE the register in which we are going to write things and
then op code will decide when to do that. Next we are going to take the
condition code ann wire that into dstE so gthe logic in dstE is normally allows
to take rB.
Diagram:
Push and Pop instruction:
In case of push and pop instruction it might say grab rB and we are going to
use stack pointer to update path so for push, pop, ret and call dstE is
responsible for figuring that out.
 Update the PC:
The last stage is to update the program counter register and it figures out the
address of next instruction to be executed. In this stage we are going to take
valM that is one of the inputs the other input is valP and then to decide what
we are going to do we need a condition code here in a way of conditional
jump that is going to determine where we go and we also need valC .
basically , valP, valM, valC and condition codes decides the address of the
next instruction to be executed.
Diagram:
Y86 implementation Observation:
 We only read instruction in the Fetch stage
 We only read from the register file in the Decode stage
 We only use the ALU in the Execute stage
 We only read/write to memory in the Memory phase
 We only write to the register fiule in the write back stage
 For any given instruction, we have to wait for the signals to propagate through the entire
circuit but, At any given instant, most of the hardware is unused

General principles of pipelining:


Let us consider some general properties and principles of pipelined systems. Such systems are
familiar to anyone who has been through the serving line at a cafeteria or run a car through an
automated car wash. In a pipelined system, the task to be performed is divided into a series of
discrete stages In the case of the car wash; a new car is allowed to enter the spraying stage as the
preceding car moves from the spraying stage to the scrubbing stage. In general, the cars must
move through the system at the same rate to avoid having one car crash into the next.

key feature of pipelining is that it increases the throughput of the system, that is, the number of
customers served per unit time, but it may also slightly increase the latency, that is, the time
required to service an individual customer. For example, a customer in a cafeteria who only
wants a salad could pass through a non pipelined system very quickly, stopping only at the salad
stage. A customer in a pipelined system who attempts to go directly to the salad stage risks
incurring the wrath of other customers.

Introduction to pipelined registers:


As we need to hold signal between each stage of implementation we introduce registers between
each stage and these registers are called pipeline registers. It enables us to execute multiple
instructions in parallel.

In our sequential implementation, we execute single instruction and go through all the stages and
then we go to next instruction and execute it but in pipelined implementation when first function
is reading in the register file there is an instruction behind it that is reading in instruction from
memory. We go to the next phase and the instruction that started first is already read from the
register file and now it is going in execute phase. Meanwhile the instruction right behind it can
decode and the line behind that can do fetch so in theory we can say that we can have five
different instructions being executed at the same time on the hardware.

Advantages and Disadvantages of using pipeline implementation:


 Advantages:
(1) Uses hardware more efficiently ( units work in parallel)
(2) A collection of instructions completes more quickly
 Disadvantage:
(1) An individual instruction takes a bit longer
(2) If some phases take more time than the phases that are short will have to wait
(3) Sometime we do not get right information at the right time

Pipelining implementation:
The pipeline registers are labeled as follows:

F holds a predicted value of the program counter, as will be discussed shortly.

D sits between the fetch and decode stages. It holds information about the most recently
fetched instruction for processing by the decode stage.
E sits between the decode and execute stages. It holds information about the most recently
decoded instruction and the values read from the register file for processing by the execute stage.

M sits between the execute and memory stages. It holds the results of the most recently
executed instruction for processing by the memory stage. It also holds information about branch
conditions and branch targets for processing conditional jumps.

W sits between the memory stage and the feedback paths that supply the computed results
to the register file for writing and the return address to the PC selection logic when completing a
ret instruction

Fetch stage:
In fetch stage we have some address for which we want to fetch and we are going to read from
memory, get an instruction and then send that instruction to logic which is going to collect that
logic and latch it into the decode register. Latch means that it has captured the signal that is
flowing into the register and holds them there even if the signals flow below or outside the
register. The register will hold that values until the next time when the clock cycle passes and we
ask it to read new values. Now we will call it calcPC.

Decode stage:
The decode stages of SEQ+ and PIPE– both generate signals dstE and dstM indicating the
destination register for values valE and valM. In SEQ+, we could connect these signals directly
to the address inputs of the register file write ports. With PIPE–, these signals are carried along
in the pipeline through the execute and memory stages, and are directed to the register file only
once they reach the writeback stage (shown in the more detailed views of the stages). We do this
to make sure the write port address and data inputs hold values from the same instruction.
Otherwise, the write back would be writing the values for the instruction in the write-back stage,
but with register IDs from the instruction in the decode stage. As a general principle, we want to
keep all of the information about a particular instruction contained within a single pipeline stage.

One block of PIPE– that is not present in SEQ+ in the exact same form is the block labeled
“Select A” in the decode stage. We can see that this block generates the value valA for the
pipeline register E by choosing either valP from pipeline register D or the value read from the A
port of the register file. This block is included to reduce the amount of state that must be carried
forward to pipeline registers E and M. Of all the different instructions, only the call requires valP
in the memory stage. Only the jump instructions require the value of valP in the execute stage (in
the event the jump is not taken). None of these instructions requires a value read from the
register file. Therefore, we can reduce the amount of pipeline register state by merging these two
signals and carrying them through the pipeline as a single signal valA. This eliminates the need
for the block labeled “Data” in SEQ (Figure 4.23) and SEQ+ (Figure 4.40), which served a
similar purpose. In hardware design, it is common to carefully identify how signals get used and
then reduce the amount of register state and wiring by merging signals such as these.
Next PC prediction:
Our goal in the pipelined design is to issue a new instruction on every clock cycle, meaning that
on each clock cycle, a new instruction proceeds into the execute stage and will ultimately be
completed. Achieving this goal would yield a throughput of one instruction per cycle. To do this,
we must determine the location of the next instruction right after fetching the current instruction.
Unfortunately, if the fetched instruction is a conditional branch, we will not know whether or not
the branch should be taken until several cycles later, after the instruction has passed through the
execute stage. Similarly, if the fetched instruction is a ret, we cannot determine the return
location until the instruction has passed through the memory stage. With the exception of
conditional jump instructions and ret, we can determine the address of the next instruction based
on information computed during the fetch stage. For call and jmp (unconditional jump), it will be
valC, the constant word in the instruction, while for all others it will be valP, the address of the
next instruction. We can therefore achieve our goal of issuing a new instruction every clock
cycle in most cases by predicting the next value of the PC. For most instruction types, our
prediction will be completely reliable. For conditional jumps, we can predict either that a jump
will be taken, so that the new PC value would be valC, or we can predict that it will not be taken,
so that the new PC value would be valP. In either case, we must somehow deal with the case
where our prediction was incorrect and therefore we have fetched and partially executed the
wrong instructions.

This technique of guessing the branch direction and then initiating the fetching of instructions
according to our guess is known as branch prediction. It is used in some form by virtually all
processors.

Execution stage:
We will add pipeline registers at every point in the pipeline. The stage is basically the logic
between the two pipeline registers and with each stage there is a pipeline register that holds the
signal that is going to compute on. So what happens in every stage actually depends on the
values of particular registers of that stage.

To control the ALU we need to know that what function is ALU is supposed to do and we will
figure out it using a piece of logic. For icode , we know that we need it in the execution stage.
Next is ifun function we need ifun because we need to tell the ALU which function to execute.

rA and rB are the register numbers and we use them to decode stage to read the register file after
that stage we do not need them but we need what those registers have in execute stage so we
need valA that came out of the register file from A and valB that came out from the register file
from B. We will need valC as well and valP is value of next program counter and we use it to
figure out PC so we donot need that in execute stage. From the instruction point of view rA and
rB sometimes have two meanings in ALU we use rB to read and after that to write data to the
memory.
Diagram:

Pipeline hazards:
Introducing pipelining into a system with feedback can lead to problems when there are
dependencies between successive instructions. We must resolve this issue before we can
complete our design. These dependencies can take two forms:

(1) data dependencies,


Data dependencies are where the results computed by one instruction are used as the data for
a following instruction
(2) control dependencies
Where one instruction determines the location of the following instruction, such as when
executing a jump, call, or return. When such dependencies have the potential to cause an
erroneous computation by the pipeline, they are called hazards. Like dependencies, hazards
can be classified as either:
 data hazards
 control hazards
Avoiding Data Hazards by Stalling:
One very general technique for avoiding hazards involves stalling, where the processor holds
back one or more instructions in the pipeline until the hazard condition no longer holds. Our
processor can avoid data hazards by holding back an instruction in the decode stage until the
instructions generating its source operands have passed through the write-back stage.

Diagram:

Pipelined execution of prog2 using stalls. After decoding the addl instruction in cycle 6, the stall
control logic detects a data hazard due to the pending write to register %eax in the write-back
stage. It injects a bubble into execute stage and repeats the decoding of the addl instruction in
cycle 7. In effect, the machine has dynamically inserted a nop instruction,

Detailed Mechanism:

This logic must handle the following four control cases for which other mechanisms, such as
data forwarding and branch prediction, do not suffice:

Processing ret:

The pipeline must stall until the ret instruction reaches the write-back stage.

Load/use hazards:
The pipeline must stall for one cycle between an instruction that reads a value from memory and
an instruction that uses this value.

Mispredicted branches:

By the time the branch logic detects that a jump should not have been taken, several instructions
at the branch target will have started down the pipeline. These instructions must be removed
from the pipeline.
Exceptions:

When an instruction causes an exception, we want to disable the updating of the programmer-
visible state by later instructions and halt execution once the excepting instruction reaches the
write-back stage.

Avoiding Data Hazards by Forwarding:


Rather than stalling until the write has completed, it can simply pass the value that is about to be
written to pipeline register E as the source operand.

Explanation:
The decode-stage logic detects that register %eax is the source register for operand valB, and that
there is also a pending write to %eax on write port E. It can therefore avoid stalling by simply
using the data word supplied to port E (signal W_valE) as the value for operand valB. This
technique of passing a result value directly from one pipeline stage to an earlier one is commonly
known as data forwarding (or simply forwarding, and sometimes bypassing). It allows the
instructions of above given program to proceed through the pipeline without any stalling. Data
forwarding requires adding additional data connections and control logic to the basic hardware
structure.

Pipelined execution using forwarding. In cycle 6, the decodestage logic detects the presence of a
pending write to register %eax in the write-back stage. It uses this value for source operand valB
rather than the value read from the register file.
Pipelined execution using forwarding. In cycle 5, the decode stage logic detects a pending write to
register %edx in the write-back stage and to register %eax in the memory stage. It uses these as the
values for valA and valB rather than the values read from the register file.

data forwarding can also be used when there is a pending write to a register in the memory stage,
avoiding the need to stall for program given. In cycle 5, the decode-stage logic detects a pending
write to register %edx on port E in the write-back stage, as well as a pending write to register
%eax that is on its way to port E but is still in the memory stage. Rather than stalling until the
writes have occurred, it can use the value in the write-back stage (signal W_valE) for operand
valA and the value in the memory stage (signal M_valE) for operand valB.

You might also like