Instruction Pipelining
Instruction Pipelining
PIPELINING IN C5X
PIPELINE STRUCTURE
• C5X permits four operations, viz., Fetching, Decoding, Reading and Execution to be
performed simultaneously using a 4-phase clock.
• This allows four instructions to be processed simultaneously in the CPU. When the first
instruction is in the execute phase, the second instruction can be in the read phase, the
third instruction can be in the decode phase, and the fourth instruction can be in the fetch
phase.
• The functions performed in the four phases of the C5X pipeline are as follows:
• Fetch (F): This phase fetches the instruction words from memory and updates the
program counter (PC).
• Decode (D): This phase decodes the instruction word and performs address generation
and ARAU updates of auxiliary registers.
• Read (R): This phase reads operands from memory, if required.
• If the instruction uses indirect addressing mode, it will read the memory location pointed
at by the ARP before the update of the previous decode phase.
• Execute (E): This phase performs any specify operation, and, if required, writes results of
a previous operation to memory.
PIPELINE OPERATION
• The pipeline is essentially invisible to the user except in some cases,
such as AR updates, memory mapped accesses of the CPU registers, the
NORM instruction and memory configuration commands.
• Furthermore, the pipeline operation is not protected. The user has to
understand the pipeline operation to avoid the pipeline conflict by
arranging the code.
• The following sections show how the pipeline operation and the pipeline
conflict affect the result.
• NORMAL PIPELINE OPERATION
• Instructions with Single Word and Two Words
• When a program segment involves single-word single-cycle instructions
executing with no wait state, there is perfect overlapping in the pipeline,
where all four phases operate in parallel. In the case of
• programs involving two-word instructions, it is not possible to keep all
the functional units busy all the time and the operation performed by
some of the units may be dummy operations resulting in no
productivity.
Pipeline Operation with Branch and Call Instructions
• The branch instruction requires two cycles when branching does not
occur and four cycles when branching occurs; one for the B instruction
to enter the execute phase, one for fetching the branch address, two
more for flushing out the one two-word or two 1-word instructions
which enter the instruction pipeline after the branch instruction.
• The same is true with the call instructions. They require two cycles
when the subroutine is not called.
• When the subroutine is called they also require four cycles out of which
the last two cycles are required for flushing out the pipeline.
• However, the delayed branch and call instructions permit the call and
branching to be carried out in two clock cycles.
• The one 2-word instruction or two 1-word instructions following the
delayed branch/ call instruction are fetched from program memory and
executed before the branch/call is carried.
• Hence instruction pipeline need not be flushed out after the call/branch
instruction and the execution can resume from the branch address.
• When the instruction B enters the execute phase, the ADD and SACL instructions enter
the decode and fetch phases, respectively.
• However, since these two instructions are not to be executed, they are dummy phases
and these instructions have to be flushed out of the instruction pipeline.
• This requires two cycles.
• Hence branch requires four cycles (one for the B instruction to enter the execute, one for
fetching the branch address and two more for flushing out the unwanted instructions.)
• The same is true with the call instructions. They also require four cycles out of which
the last two cycles are required for flushing out the pipeline.
• However the delayed branch and call instructions permit the call and branching to be
carried out in two clock cycles.
• The one 2-word instruction or two 1-word instructions following the delayed branch/call
instruction are fetched from program memory and executed before the branch/call is
carried.
• Hence instruction pipeline need not be flushed after the call/branch instruction and the
execution can resume from the branch address.
Pipeline Operation on ARAU Memory-Mapped Registers
• Auxiliary register arithmetic unit (ARAU) updates of the ARs occur during the decode (D) phase of the
pipeline.
• Hence the ARs are updated in the decode phase itself when the indirect addressing mode is used with
arguments such as *+, *–, *0+, etc.
• This enables any instruction which follows this instruction to have the correct address when it enters the
data read phase.
• This is desirable because when one instruction enters the execute phase the next instruction enters the data
read phase.
• However, when the ARs are modified using load, store, using memory-mapped addressing (e.g., LAR,
SAMM, LMMR, SACL, or SPLK, etc), they are modified only in the execute (E) phase of the pipeline.
• Therefore, the use of ARs for the next two instructions after a memory-mapped load of the AR is
prohibited.
• This means that the next two instructions after a memory-mapped load of the AR should not use this AR.
• Modifications to the index register (INDX) and auxiliary register compare register (ARCR) also occur in
the E phase of the pipeline.
• Therefore, any AR updates using the INDX or the ARCR must take place at least two cycles after a load of
these registers.
• In Example 5.7 what was expected was to add the content of location 164h, i.e., 90h, with that of
165h, i.e., 80h, and store the result in ACC. This actually happens as the SAMM instruction is
followed immediately by two NOP instructions which do not make use of AR1. Expected results
are obtained because of the the following reasons:
• When the instruction LACC *+ enters the decode phase in cycle 8, the value of AR1 is 164h and
this is used for accessing the data in its data read phase in cycle 9. The SAMM instruction
modifies AR1 to 164h in its execute phase in clock cycle 7 itself. In this case since the decode
phase of LACC *+ occurs after the execute phase of SAMM, no problem arises.
• In the decode phase of LACC *+, the value of AR1 is modified to 165h. In 9th clock cycle, the
ADD *+ enters the decode phase. At that time the value of AR1 is 165h and hence it uses this
address when it enters the data read phase in cycle 10. It also modifies AR1 to 166h in cycle 9.
Pipeline Conflicts
• When more than one pipeline stage requires processing on the same resource, such
as memory and CPU registers, a pipeline conflict occurs.
• There is no priority between these four phases and unexpected results are obtained
when pipeline conflict occurs.
• Therefore, conflict between these four phases should be avoided in order to get the
correct results.
• Since the ‘C5X only has one set of external address and data buses, a bus conflict
occurs between instruction fetch (F), operand read (R), and execute (E) write
phases if both program and data memory are external.
• While the bus conflict is occurring, a dummy operation can be inserted to
eliminate the bus conflict. Example 5.8 shows pipeline operation with a bus
conflict and a dummy operation
• In the operand read (R) phase of LACC, a bus conflict occurs with the fetch
of SACL.
• Therefore, a dummy fetch operation is inserted. In the next fetch (F) phase,
the SACL has a bus conflict with the ADD operand read (R) phase.
• Therefore, the fetch of SACL is delayed again by one cycle. Two dummy
instruction fetches are inserted between ADD and SACL because of this
delay.
• A similar situation occurred in the execute (E) phase of SACL. Since
external memory writes take three cycles, during the execution of SACL any
instruction fetch or operand read access on the external bus will be delayed
for three cycles.