0% found this document useful (0 votes)
20 views

Increasing Instruc: Microprocessors W of The Oe

1. Instruction pipelining increases processor efficiency by dividing instruction cycles into multiple phases (fetch, decode, read, execute) so that multiple instructions can be processed simultaneously. 2. In a pipelined processor, the functional units are kept busy almost all the time as new instructions continuously flow through the pipeline. This increases throughput compared to non-pipelined processors. 3. Pipelining works best for RISC processors with single-cycle instructions, but complex CISC instructions may require flushing the pipeline if they span multiple cycles, reducing efficiency.

Uploaded by

Yogabharath
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Increasing Instruc: Microprocessors W of The Oe

1. Instruction pipelining increases processor efficiency by dividing instruction cycles into multiple phases (fetch, decode, read, execute) so that multiple instructions can be processed simultaneously. 2. In a pipelined processor, the functional units are kept busy almost all the time as new instructions continuously flow through the pipeline. This increases throughput compared to non-pipelined processors. 3. Pipelining works best for RISC processors with single-cycle instructions, but complex CISC instructions may require flushing the pipeline if they span multiple cycles, reducing efficiency.

Uploaded by

Yogabharath
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Introduction to Programmable DSPs 45

2.6 PIPELINING
am
data
per
One of the approaches adopted for increasing the efficiency of the advanced microprocessors as w
as P-DSPs isinstruction pipelining. An
instruction cycle starting with the fetching of an instruc
and ending with the execution of the
an instruction including the time storage results can of the oe
into a number of microinstructions. Execution of each of the microinstructions is also referred to as

one phase of an instruction. For


example, an instruction cycle requiring four microinstructions a o
rd said to be in four phases as follows:
ds, 1. Fetch phase in which the instruction is fetched from the
program memory
2. Decode phase in which the instruction is decoded
in
3. Memory read phase in which the operand required for the execution of the instruction may be
er-
read from the data memory
is
4. Execution phase in which execution as well as the storage of the results in either one of the
ut registers or memory is carried out
Each of the above microinstructions may be carried out separately by four functional units. Let us
assume that each of the above four phases take equal time for completion. In this case in a conven-
tional microprocessor with no pipelining, cach of the functional units is busy only 25% of the time.
This is because only one instruction is processed at the CPU at a time. Figure 2.7 shows wheneach of
the functional unit is busy when a program containing three instructions I1,12, 13 is executed.

Value of T Fetch Decode Read Execute

1 1

11
2
3 1

4
11

5 2

6 2

12
12

9 13
10 13
13
11
13
12

Fig. 2.7 Instruction cycles ofprocessor with no pipelining


instructions
almost all the time by processing a number of
The functional units can be kept busy instructions I1,
in a machine with four functional units, four
simultaneously in the CPU. For example, decode phase,
as shown in Fig. 2.8. When II enters the
12, 13 and 14 can be processed simultaneously
When Il enters the operand read phase, 12 enters the
decode
12 can enter the opcode fetch phase. enters the operand
fetch phase. When Il enters the execute phase, 12
phase and 13 enters the opcode is fully
and 14 enters the opcode fetch phase. The pipeline
read phase, I3 enters the decode phase 14 keep the
useful work to do. The instructions that follow
loaded now and all the functional units have of the
functional units till the
busy is exited. Let T denote the time required for each phase
program
46 Digital Signal Processors

instruction. One clock eycle of the processor corresponds to 7. In a period of 127 only three instruc.
tions can be executed in a machine without pipelining. In the same period nine instructions can he
De
executed as shown in Fig. 2.8. Hence the throughput is increased by a factor of 3 in this case

Value of T Fetch Decode Read Execute


11

12

13 12 11

13 12

15 4 13 12
5

I6 l5 14 3
6

17 6 15 14

8 18 7 6 15

9 19 17 16
19 18 7
10
11 19 18
12 19

Fig.2.8 lnstruction cycles of a processor with pipelining

It may be noted that the initial latency of a machine with four phases is 4T. Hence for executing a
program with N instructions, the time required for execution is (N + 4)7, With a non-pipelined ma-
chine, the time required for executing N instructions is 4NT.
Instruction pipeline shown in Fig. 2.8 corresponds to a highly optimistic case. In the case of
processors requiring single clock cycle for execution for each of the instructions in the program, the
throughput shown in Fig. 2.8 can be achieved. This is normally achieved with restricted instruction set
computers (RISC). However in complex instruction set computers (CISC), there are also instructions
with multiple word requiring multiple clock cycles for execution. In this case all the functional units
cannot be kept busy all the time. For example, in the case of call and branch instructions of a P-DSP,
four phases or T states are required for the call/branch instruction to exit execution phase. By that time
two more single word instructions or one double instruction enters the instruction pipeline. These
instructions should not be executed. Hence two words have to be flushed out of the instruction
pipeline before the instructions are fetched starting from the new program address.
To overcome this problem, some of the P-DSPs have special branch/call and return instructions
called as delayed branch/call/return instructions. When the delayed branch instruction is executed, the
program branches to the new program address only after the two -word instructions or the single 2-
word instruction following the branch instruction are completely executed. Similarly, when the de-
layed call instruction is executed, the program calls to the subroutine only after the two I-word
instructions or the single 2-word instruction following the call instruction are completely executed.
When the delayed call/branch/return instructions are executed, there is no need for
line and maximum throughput is obtained. Examples of pipeline operation of delayed as well as
the flushing pipe-
undelayed branch/call instructions are given in Chapter 4
The throughput efficiency of the pipeline may also be reduced because of conflicts between the
instructions in the instruction pipeline in different phases. This happens if the same memory is used to
store the data and program and there is only a single address bus for addressing both the program and
Introduction to Programmable
DSPs 47

ta
data memory. Tms is true in he case of off-chip memory., For example, an instruction in fetcn p1as
maytry to fetch the
instnuction code from a memory chip accessed
that is also by anotherinstruci
that is in the
operand read phase.To avoid the conflict,the operand read phase
will be done ano
irst
hancode feteh phase will be repeated till there is no conflict again.
referred to as aepu
The nuber oi instructions that are processed simultaneously in the CPU, also
of the instruction pIpeline, differs in different families of P-DSPs. The pipeline depths
of some or
P-DSPs are given in Table 2.1.

You might also like