Chapter_4
Chapter_4
The Processor
+
+ Processor Organization
Processor Requirements:
■ Fetch instruction
■ The processor reads an instruction from memory (register, cache, main memory)
■ Interpret instruction
■ The instruction is decoded to determine what action is required
■ Fetch data
■ The execution of an instruction may require reading data from memory or an I/O
module
■ Process data
■ The execution of an instruction may require performing some arithmetic or logical
operation on data
■ Write data
■ The results of an execution may require writing data to memory or an I/O module
Categories:
• General purpose
Referenced by means of the • Can be assigned to a variety of functions by the
machine language that the programmer
processor executes • Data
• May be used only to hold data and cannot be
employed in the calculation of an operand address
• Address
• May be somewhat general purpose or may be
devoted to a particular addressing mode
• Examples: segment pointers, index registers,
stack pointer
• Condition codes
• Also referred to as flags
• Bits set by the processor hardware as the result of
operations
Condition Codes
+
Control and Status Registers
Four registers are essential to instruction
execution:
■Program counter (PC)
■ Contains the address of an instruction to be fetched
• The main line of activity consists of alternating instruction fetch and instruction
execution activities.
• After an instruction is fetched, it is examined to determine if any indirect addressing is
involved. If so, the required operands are fetched using indirect addressing.
• Following execution, an interrupt may be processed before the next instruction fetch.
+
Instruction Cycle State Diagram
+
Data Flow, Fetch Cycle
+
Data Flow, Indirect Cycle
+
Data Flow, Interrupt Cycle
Pipelining Strategy
■ Laundry Example
■ Ann,Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
■ Washer takes 30 minutes A B C D
6 7 8 9 1 1 Midni
PM Tim 0 1 ght
3 4 2 3 4 e2 3 4 ■
2 3 4 2
Sequential laundry takes 6
A 0 0 0 0 0 0 0 0 0 0for 4 loads
hours 0 0
■ If they learned pipelining, how
long would laundry take?
B
D
+
Traditional Pipeline Concept
6 7 8 9 1 11 Midnigh
PM 0 t
Time
T
a 3 4 4 4 4 2
s 0 0 0 0 0 0
k A
■Pipelined laundry takes
3.5 hours for 4 loads
O B
r
d C
e
r D
+
Traditional Pipeline Concept
6 7 8 9
■ Pipeliningdoesn’t help
PM Tim latency of single task, it
T helps throughput of entire
3 4 4 4 4e 2 workload
a
A 0 0 0 0 0 0 ■ Pipeline
rate limited by
s slowest pipeline stage
k ■ Multipletasks operating
B simultaneously using
different resources
O
C ■ Potential
speedup =
r Number pipe stages
d D ■ Unbalanced lengths of pipe
e stages reduces speedup
r
+
Use the Idea of Pipelining in a
Computer
Fetch +
Execution
I I2 I3
Tim
1 e
Tim
Clock 1 2 3 4 e
F E F E F E cycle
1 1 2 2 3 3 Instruction
I1 F1 E1
(a) Sequential
execution
I2 F2 E2
Interstage
buffer B
1 I3 F3 E3
Instructio E ecutio
n fetc x nuni
(c) Pipelined
huni t execution
t
Basic idea of instruction pipelining.
(b) Hardware
organization
+
Use the Idea of Pipelining in a Computer-
Two-Stage Instruction Pipeline
+
Fetch + Decode
+ Execution +
Write
+ Additional Stages
■ Fetch instruction (FI)
■ Read the next expected ■ Fetch operands (FO)
instruction into a buffer ■ Fetch each operand from
memory
■ Decode instruction (DI) ■ Operands in registers
■ Determine the opcode need not be fetched
and the operand
specifiers ■ Execute instruction (EI)
■ Perform the indicated
■ Calculate operands (CO)
operation and store the
■ Calculate the effective result, if any, in the
address of each source specified destination
operand operand location
■ This may involve
displacement, register ■ Write operand (WO)
indirect, indirect, or other ■ Store the result in
forms of address memory
calculation
+
Timing Diagram for Instruction
Pipeline Operation
+
The Effect of a conditional
Branch on Instruction Pipeline
Operation
+
An alternative Pipeline
Depiction
+
Clock cycle
1 2 3 4 5 6 7 8 9 10
ADD EAX, EBX FI DI FO EI WO
SUB ECX, EAX FI DI Idle FO EI WO
I3 FI DI FO EI WO
I4 FI DI FO EI WO
Drawbacks:
• With multiple pipelines there are contention delays for access
to the registers and to memory
• Additional branch instructions may enter the pipeline before
the original branch decision is resolved
Prefetch Branch Target
■Benefits:
■ Instructions fetched in sequence will be available without the
usual memory access time
■ If a branch occurs to a target just a few locations ahead of the
address of the branch instruction, the target will already be in
the buffer
■ This strategy is particularly well suited to dealing with loops
Instruction to be
8 decoded in case of hit
Loop Buffer
(256 bytes)
Decode stage 1
All opcode and addressing-mode information is 3 bytes of instruction are passed to the D1 stage D1 decoder can then direct the D2 stage to
decoded in the D1 stage from the prefetch buffers capture the rest of the instruction
Decode stage 2
Expands each opcode into control signals for the ALU Also controls the computation of the more complex addressing modes
Execute
Write back
Updates registers and status flags modified during the preceding execute stage
+ 80486 Instruction Pipeline
Examples
Instruction Level
Parallelism
and Superscalar
Processors
+
Scalar Processor
■Limited by
■ Data dependency
■ Procedural dependency
■ Resource conflicts
+ True Data (Write-Read)
■Dependency
ADD r1, r2 (r1 🡨 r1 + r2)
■MOV r3, r1 (r3 🡨 r1)
■Can fetch and decode second instruction
in parallel with first but cannot execute
second instruction until first is finished
■Also called Flow dependency or RAW
dependency
+ Procedural Dependency
■Cannot execute instructions after a (conditional)
branch in parallel with instructions before a
branch
■Also, if instruction length is not fixed, instructions
have to be decoded to find out how many fetches
are needed (eg. RISC)
■This prevents simultaneous fetches
+Resource Conflict
■Two or more instructions requiring access to the same
resource at the same time
■ e.g. functional units, registers, bus
■Machine Parallelism
■ Ability to take advantage of instruction level parallelism
■ Governed by number of parallel pipelines
+
Example of the concept of
instruction-level parallelism
■As an example of the concept of instruction-level
parallelism, consider the following two code fragments