0% found this document useful (0 votes)
71 views

Introduction To Parallel Processing: Unit-2

This document discusses parallel processing and different levels of parallelism that can be exploited. It describes two types of parallelism: functional parallelism and data parallelism. Functional parallelism can be utilized at the instruction, loop, procedure, and program levels. Data parallelism can be utilized directly on data-parallel architectures or by expressing parallel operations on data elements sequentially using loops. The document also discusses different techniques for parallel processing like pipelining and replication, and classifications of parallel architectures.

Uploaded by

sushil@ird
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Introduction To Parallel Processing: Unit-2

This document discusses parallel processing and different levels of parallelism that can be exploited. It describes two types of parallelism: functional parallelism and data parallelism. Functional parallelism can be utilized at the instruction, loop, procedure, and program levels. Data parallelism can be utilized directly on data-parallel architectures or by expressing parallel operations on data elements sequentially using loops. The document also discusses different techniques for parallel processing like pipelining and replication, and classifications of parallel architectures.

Uploaded by

sushil@ird
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Introduction to parallel processing

Unit-2
Types and levels of parallelism

• Architectures, compilers and operating system have been


striving for more than two decades to extract and utilize as
much parallelism as possible in order to speed up
computation.
• Problem solutions may contain two different kinds of available
parallelism, called functional parallelism and data
parallelism.
Levels of available functional
parallelism
• Programs written in imperative languages(c++) may embody
functional parallelism at different levels:

• parallelism at the instruction level (fine-grained parallelism),


• parallelism at the loop level (middle-grained parallelism),
• parallelism at the procedure level (middle-grained parallelism)
• parallelism at the program level (coarse-grained parallelism)
Available and utilized levels of functional parallelism.

• Available parallelism can be utilized by


architectures, compilers and operating systems
for speeding up computation.
• In general, functional parallelism can be utilized
at four different levels of granularity, that is, at
instruction, thread, process and user level.
Utilization of data parallelism

• Data parallelism may be utilized in two different ways. One


possibility is to exploit data parallelism directly by dedicated
architectures that permit parallel or pipelined operations on
data elements, called data-parallel architectures.

• The other possibility is to convert data parallelism into


functional parallelism by expressing parallel executable
operations on data elements in a sequential manner, by using
the loop.
Classification of parallel architectures
• Flynn’s classic taxonomy (Flynn, 1966) is based on the
number of control units as well as the number of processors
available in a computer.

Although this is a lucid and straightforward scheme, it does not


reveal or cover key aspects such as what kind of parallelism is
utilized, at what level or how parallel execution is implemented.
Proposed classification
Basic parallel techniques
• There are two basic ways of exploiting parallelism in parallel
computer architectures:
• Pipelining - In pipelining a number of functional units are
employed in sequence to perform a single computation. These
functional units form an assembly line or pipeline. Pipelining is a
very powerful technique to speed up a long series of similar
computations and hence is used in may parallel architectures.
• Replication- A natural way of introducing parallelism to a
computer is the replication of functional units. Replicated
functional units can execute the same operation simultaneously on
as may data elements as there are replicated computational
resources available.
Relationships between languages and parallel architectures

• Although languages and parallel architectures could


be considered as independent layers of a computer
system, in reality, for efficiency reasons the parallel
architecture has a strong influence on the language
constructs applied to exploit parallelism.
• Vector processors do not often impose special language
constructs, rather they require special compiler support to
exploit loop parallelism related to identical operations on
elements of vectors or matrices.
Relationships between languages and
parallel architectures

• Data-parallel languages contain language constructs to specify


the allocation of processors for the elements of the parallel
data structures. While the application of parallel data
structures and masks simplifies and shortens the program
text, the allocation constructs lead to the expansion of
programs.
• Languages to perform distributed memory architectures use
message-passing operations like send and receive to specify
communication and synchronization between processors.
Computer execution
Principle of pipelining

• The term ‘pipelining’ refers to the temporal overlapping of


processing.
• Pipelines are nothing more than assembly lines in computing
that can be used either for instruction processing or, in a more
general sense, for performing any complex operations.
• Note that pipelining can be utilized effectively only for a
sequence of the same or similar tasks, much the same as
assembly lines.
• A basic pipeline processes a sequence of tasks, such as
instructions, according to the following principle of operation:
Basic principle of pipelining

• Each task is subdivided into a number of successive subtasks.


• A pipeline stage associated with each subtask which performs the required
operations
• The same amount of time is available in each stage for performing the
required subtask
• All pipeline stages operate like an assembly line, that is, receiving their input
typically from the previous stage and delivering their output to the next
stage
Basic principle of pipelining

• The basic pipeline operates clocked, in other words


synchronously. This means that each stage accepts a new input
at the start of the clock cycle, each stage has a single clock
cycle available for performing the required operations, and
each stage delivers the result to the next stage by the beginning
of the subsequent clock cycle.
Design space of pipeline

• The design space comprises the following two salient aspects:


the basic layout of the pipeline and the method of
dependency resolution:
Basic layout of a pipeline

• we identify and discuss fundamental to the layout of a


pipeline:

• The number of pipeline stages used to perform a given task.


• Specification of the subtasks to be performed in each of the
pipeline stages.
• Layout of the stage sequence.
• Use of bypassing.
• Timing of pipeline operations.
Basic layout of a pipeline
• The number of pipeline stages is one of the fundamental
decisions. Evidently, when more pipeline stages are used,
more parallel execution and thus a higher performance can be
expected.

a. Two-stage pipeline
Basic layout of a pipeline

bbbbb
b. Four-stage pipeline;

c. Eight-stage pipeline
subtasks performed in pipeline
• After a maximum is reached, the performance would
certainly fall. The second aspect is the specification of
the subtasks to be performed in each of the pipeline
stages and the specification of the subtasks can be done
at a number of levels
layout of the stage sequence
• While processing an instruction, the pipeline stages are usually operated
successively one after the other but certain stage is recycled, that is, used
repeatedly, to accomplish the result while performing a multiplication or
division.
• Recycling allows an effective use of hardware resources, but impedes
pipeline repetition rate.
Dependency resolution
• The other major aspect of pipeline design is dependency
resolution.
• Some early pipelined computers followed the MIPS approach
(Microprocessor without Interlocked Pipeline Stages) .
• MIPS employed a static dependency resolution, also termed
static scheduling or software interlock resolution.
• Here, the compiler is responsible for the detection and proper
resolution of dependencies.
• A more advanced resolution scheme is a combined
static/dynamic dependency resolution, which has been
employed in the MIPS R processors (R2000, R3000, R4000,
R4200, R6000).
Dependency resolution

Possibilities for resolving pipeline hazards.

In recent processors dependencies are resolved dynamically, by


extra hardware. compilers for these processors are assumed to
perform a parallel optimization by code reordering, in order to
increase performance.
Array processing
• A synchronous array of parallel processors is
called an array processor, which consists of
multiple processing elements(PEs) under the
supervision of one control unit(CU).

• An array processor can handle single


instruction and multiple data(SIMD) streams.
Array processing
• SIMD machine are specifically designed to
perform vector computations over matrices
and arrays of data.
• SIMD computer appear in two basic
architectural organizations:
• array processors-using random access memory
• associative processors- using content addressable(or
associative) memory.
SIMD processor organization
Scalar and vector pipeline
• The scalar pipeline processor process a
sequence of scalar instruction and operands
under the control of a DO loop.
• The instruction in a small DO loop are often
pre-fetched into instruction buffer, the
operands required by repeated scalar
instruction are loaded into data cache in order
to continuously supply the pipeline with
operands.
Scalar and vector pipeline
• vector pipeline processors handles vector
instruction. there processors are specially
designed to execute vector instructions using
vector operands.
• To handle vector instruction and operands,
vector processor are supported with specialized
firmware and hardware. this specialized
firmware and hardware controls the vector
pipeline rather than software. scalar pipelines
are controlled by software.
Evolution of parallel processing
• Parallel processing involves multiple processes
which are active simultaneously and solving a
given problem, generally on multiple
processors.
• For parallel processing, there are different
types of architectural model employing same
king of parallelism.
• Shared memory multiprocessing
• Distributed memory multiprocessing
Shared memory multiprocessing

• The memory and interconnection mechanism


are common to all CPUs.
Distributed memory multiprocessing

In DM model, the memory is strictly local to


each processor and C1 access the M2 via
interconnection network only.
Future trends towards parallel
processing
• The interest in high performance computing is
not new, always hungry for more and more
computing power.
• Many hardware vendors are today venturing into
the domain of high performance computing.
• Due to immature compilers, insufficient
bandwidth between processors and main
memory as well as among the processors are
hard terrains to cross.
Future trends towards parallel
processing
• The recent trend is clusters where workstation
in the cluster are computer server for another
workstation(HP, IBM proposed various model).
• Gigabit network up to 10 gigabit turn the
parallel processing issue on so that no more
restriction in IPC requirements.
• For the OS and programming, IBM, DEC, Cray
and others standardize the language and its
compiler across a range of machine and
distributed computing.

You might also like