Increasing Instruc: Microprocessors W of The Oe
Increasing Instruc: Microprocessors W of The Oe
2.6 PIPELINING
am
data
per
One of the approaches adopted for increasing the efficiency of the advanced microprocessors as w
as P-DSPs isinstruction pipelining. An
instruction cycle starting with the fetching of an instruc
and ending with the execution of the
an instruction including the time storage results can of the oe
into a number of microinstructions. Execution of each of the microinstructions is also referred to as
1 1
11
2
3 1
4
11
5 2
6 2
12
12
9 13
10 13
13
11
13
12
instruction. One clock eycle of the processor corresponds to 7. In a period of 127 only three instruc.
tions can be executed in a machine without pipelining. In the same period nine instructions can he
De
executed as shown in Fig. 2.8. Hence the throughput is increased by a factor of 3 in this case
12
13 12 11
13 12
15 4 13 12
5
I6 l5 14 3
6
17 6 15 14
8 18 7 6 15
9 19 17 16
19 18 7
10
11 19 18
12 19
It may be noted that the initial latency of a machine with four phases is 4T. Hence for executing a
program with N instructions, the time required for execution is (N + 4)7, With a non-pipelined ma-
chine, the time required for executing N instructions is 4NT.
Instruction pipeline shown in Fig. 2.8 corresponds to a highly optimistic case. In the case of
processors requiring single clock cycle for execution for each of the instructions in the program, the
throughput shown in Fig. 2.8 can be achieved. This is normally achieved with restricted instruction set
computers (RISC). However in complex instruction set computers (CISC), there are also instructions
with multiple word requiring multiple clock cycles for execution. In this case all the functional units
cannot be kept busy all the time. For example, in the case of call and branch instructions of a P-DSP,
four phases or T states are required for the call/branch instruction to exit execution phase. By that time
two more single word instructions or one double instruction enters the instruction pipeline. These
instructions should not be executed. Hence two words have to be flushed out of the instruction
pipeline before the instructions are fetched starting from the new program address.
To overcome this problem, some of the P-DSPs have special branch/call and return instructions
called as delayed branch/call/return instructions. When the delayed branch instruction is executed, the
program branches to the new program address only after the two -word instructions or the single 2-
word instruction following the branch instruction are completely executed. Similarly, when the de-
layed call instruction is executed, the program calls to the subroutine only after the two I-word
instructions or the single 2-word instruction following the call instruction are completely executed.
When the delayed call/branch/return instructions are executed, there is no need for
line and maximum throughput is obtained. Examples of pipeline operation of delayed as well as
the flushing pipe-
undelayed branch/call instructions are given in Chapter 4
The throughput efficiency of the pipeline may also be reduced because of conflicts between the
instructions in the instruction pipeline in different phases. This happens if the same memory is used to
store the data and program and there is only a single address bus for addressing both the program and
Introduction to Programmable
DSPs 47
ta
data memory. Tms is true in he case of off-chip memory., For example, an instruction in fetcn p1as
maytry to fetch the
instnuction code from a memory chip accessed
that is also by anotherinstruci
that is in the
operand read phase.To avoid the conflict,the operand read phase
will be done ano
irst
hancode feteh phase will be repeated till there is no conflict again.
referred to as aepu
The nuber oi instructions that are processed simultaneously in the CPU, also
of the instruction pIpeline, differs in different families of P-DSPs. The pipeline depths
of some or
P-DSPs are given in Table 2.1.