Lecture 06 - (New) Pipelining and Parallelism
Lecture 06 - (New) Pipelining and Parallelism
Week - 06
22-29 October 2018
1
Topics to Cover
‘Organizational techniques’ to • Parallelism
Improve Processor Speed • Instruction-level parallelism (ILP)
• Pipelining Pipelining
• Superscalar Superscalar
• Super-pipeline • Machine-level parallelism
Multicore systems
Cluster Computers
Flynn’s Taxonomy of Computers
2
Paper Pattern – Mid-Term (Total Marks = 30)
• Short Questions – 5 (2 marks each)
3
Instruction Format
1. Multi-Stage Pipeline Opcode Address
6
6-Stage Non-Pipelined Instruction Execution
1st instruction
2nd instruction
New
Instruction
executed
Per cycle.
k+(2n-1) Cycles
=> 11 cycles 11
With Super-Scalar Pipelining (Fig. Next
Slide)
• When a superscalar processor design is used, multiple instructions can
be in the execution stage at the same time.
• For n-pipelines, n-instructions can execute during the same clock
cycle.
• Let us introduce a second pipeline (superscalar) into our 6-staged
pipeline and assume that execution stage S4 requires two clock cycles.
• In Figure, odd-numbered instructions enter the u-pipeline and even-
numbered instructions enter the v-pipeline.
• This removes the wasted cycles, and it is now possible to process n
instructions in (k + n) cycles.
12
Two Pipelined Stages (Superscalar)
Every next
Instruction
Executed
Per clock cycle
14
Super-pipeline Performance
15
‘Super-Scalar’ VS ‘Super-Pipeline’
• Simple pipeline system performs only one
pipeline stage per clock cycle.
17
Preparatory Questions (Pipelining)
Q1. What is the ‘instruction pipelining’? How can we pipeline
instructions?
Q2. In a six-stage pipelined processor, how many instructions can be
executed in 18 clock cycles?
Q3. What is a ‘superscalar’ pipeline? How does it improve processor
performance?
Q4. What is a ‘superpipeline’? How does it differ from a normal
pipeline?
Q5. What are the ‘hazards’ to pipelining?
18
Parallelism
• Executing two or more operations at the same time is known as
Parallelism. It is used in ‘high-performance computing’.
• In ‘Parallel processing’ the computer does the simultaneous data
processing tasks concurrently (at one time).
• Goal of Parallelism
• Parallelism is done to increase the ‘computational speed’ of a
computer system.
• It increases the computer’s processing capability and increases its
throughput, ‘the amount of processing during a given interval of time’.
.
19
Types of ‘Parallelism’
• Parallelism can be of two types:
1) Instruction Level Parallelism (ILP) (Parallelism in Software)
i. Pipelining
ii. Superscalar } For uni-processor
20
1) Instruction Level Parallelism (ILP)
• Instruction-level parallelism exists when instructions in a sequence
are independent and thus can be executed in parallel by overlapping.
• As an example of the concept of ILP, consider the following two codes:
• The three instructions on the left are independent, and in theory all
three can be executed in parallel.
• In contrast the three instruction on the right can not be executed in
parallel because the second instruction uses the result of the first, and
the third instruction uses the result of the second.
21
Instruction Level Parallelism (ILP)
• Instruction-level parallelism (ILP): is a measure of how many of the
operations in a computer program can be performed simultaneously.
• Micro-architectural techniques that are used to exploit ILP include:
i. Instruction pipelining: where the execution of multiple instructions
can be partially overlapped. (You have already studied this)
• In Pipelining, while an instruction is being executed in the ALU, the
next instruction can be read from memory. (overlap fetch & execute)
ii. Superscalar: execution in which multiple execution units are used
to execute multiple instructions in parallel.
22
ii. Superscalar Approach
• A parallel processing system is able to perform concurrent data
processing to achieve faster execution time.
• In a Superscalar computer, the system has redundant functional units.
• For example, the system may have two or more ALUs and be able to
execute two or more instructions at the same time.
• ‘Parallel processing’ is established by distributing the data among the
multiple functional units.
• As the amount of hardware increases with parallel processing, and
with it, the cost of system increases.
23
Superscalar Processor with Multiple Functional
Units (Figure Next Slide)
24
Multiple Functional Units
• Figure shows one possible way
of separating the execution
unit into eight functional units
operating in parallel.
25
2. Machine Parallelism
• Machine parallelism is a measure of the ability of the processor to
take advantage of instruction-level parallelism (ILP).
i. Multi-core processors: the system may have two or more
processors operating concurrently. (You have already studied this)
• Such multi-core system will support ‘multi-threaded’ programs and
‘multi-tasking’.
ii. Multi-computers (Clusters): consist of multiple independent
computers organized in a cooperative fashion. (e.g. networks)
• Clusters are ‘interconnected computers’ that can support workloads
that are beyond the capacity of a single multiprocessor computer.
26
Final Note
• Both instruction-level and machine parallelism are important factors
in enhancing performance.
• A program may not have enough instruction-level parallelism to take
full advantage of machine parallelism. (e.g. not enough independent
instructions, may not support multiple-threads).
• The use of a fixed-length instruction set architecture, as in a RISC,
enhances instruction-level parallelism. (e.g. pipelining)
• On the other hand, limited machine parallelism will limit performance
no matter what the nature of the program.
27
Types of Parallel Processor Systems
• The normal operation of a computer is to fetch instructions from
memory and to execute them in the processor. (instruction cycle)
• The sequence of instructions read from memory constitutes an
instruction stream.
• The operations performed on the data in the processor constitutes a
data stream.
• Parallel processing may occur in the ‘instruction stream’, in the ‘data
stream’ or both.
• ‘Flynn’ classified systems on the basis of these two streams in system.
28
Flynn’s Taxonomy of Parallel Processor
Systems
• On the basis of ‘instruction’ & ‘data streams’, Flynn classifies the
organization of a computer system by the number of instruction and
data items that are manipulated simultaneously.
• Flynn’s classification: divides computers into four major groups as:
1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)
29
1. Single Instruction, Single Data (SISD)
• In SISD, A single processor executes instructions sequentially from a
single instruction stream, each instruction processes one data item
stored in a single memory.
30
2. Single Instruction, Multiple Data (SIMD)
• In SIMD, same instruction is processed in all processor (cores) with
different data.
• SIMD represents an organization that includes many processing units
under the supervision of a common control unit.
• All processors receive the same instruction from the control unit but
operate on different items of data. (e.g GPU Graphics Processing Unit)
• The shared memory unit must contain different modules so that it can
communicate with all processors simultaneously.
• Vector processors & array processors fall into this category.
31
3. Multiple Instructions, Single Data (MISD)
• In MISD, different instructions (processors) operate on the same data.
32
4. Multiple Instruction, Multiple Data
(MIMD)
• In MIMD, a multiprocessor system is capable of processing (data
streams) of several programs (instruction streams) at the same time.
• Each processor uses its own data and executes its own program.
33
Figure. A taxonomy of Parallel Processor
Architecture
34
Final Note
• Flynn’s classification depends on the distinction between the
performance of the ‘control unit’ and the ‘data-processing unit’.
35
Preparatory Questions (Parallelism)
1. What is ‘parallelism’? Describe its goal.
2. What are the two types of parallelism? Describe.
3. Describe ‘instruction-level parallelism’ and its types.
4. Describe ‘machine-level parallelism’ and its types.
5. Describe ‘Flynn’s taxonomy of parallel processor systems’ and its
types.
36