0% found this document useful (0 votes)

71 views

Introduction To Parallel Processing: Unit-2

This document discusses parallel processing and different levels of parallelism that can be exploited. It describes two types of parallelism: functional parallelism and data parallelism. Functional parallelism can be utilized at the instruction, loop, procedure, and program levels. Data parallelism can be utilized directly on data-parallel architectures or by expressing parallel operations on data elements sequentially using loops. The document also discusses different techniques for parallel processing like pipelining and replication, and classifications of parallel architectures.

Uploaded by

sushil@ird

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views

Introduction To Parallel Processing: Unit-2

Uploaded by

sushil@ird

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Introduction to parallel processing

Unit-2
Types and levels of parallelism

• Architectures, compilers and operating system have been

striving for more than two decades to extract and utilize as
much parallelism as possible in order to speed up
computation.
• Problem solutions may contain two different kinds of available
parallelism, called functional parallelism and data
parallelism.
Levels of available functional
parallelism
• Programs written in imperative languages(c++) may embody
functional parallelism at different levels:

• parallelism at the instruction level (fine-grained parallelism),

• parallelism at the loop level (middle-grained parallelism),
• parallelism at the procedure level (middle-grained parallelism)
• parallelism at the program level (coarse-grained parallelism)
Available and utilized levels of functional parallelism.

• Available parallelism can be utilized by

architectures, compilers and operating systems
for speeding up computation.
• In general, functional parallelism can be utilized
at four different levels of granularity, that is, at
instruction, thread, process and user level.
Utilization of data parallelism

• Data parallelism may be utilized in two different ways. One

possibility is to exploit data parallelism directly by dedicated
architectures that permit parallel or pipelined operations on
data elements, called data-parallel architectures.

• The other possibility is to convert data parallelism into

functional parallelism by expressing parallel executable
operations on data elements in a sequential manner, by using
the loop.
Classification of parallel architectures
• Flynn’s classic taxonomy (Flynn, 1966) is based on the
number of control units as well as the number of processors
available in a computer.

Although this is a lucid and straightforward scheme, it does not

reveal or cover key aspects such as what kind of parallelism is
utilized, at what level or how parallel execution is implemented.
Proposed classification
Basic parallel techniques
• There are two basic ways of exploiting parallelism in parallel
computer architectures:
• Pipelining - In pipelining a number of functional units are
employed in sequence to perform a single computation. These
functional units form an assembly line or pipeline. Pipelining is a
very powerful technique to speed up a long series of similar
computations and hence is used in may parallel architectures.
• Replication- A natural way of introducing parallelism to a
computer is the replication of functional units. Replicated
functional units can execute the same operation simultaneously on
as may data elements as there are replicated computational
resources available.
Relationships between languages and parallel architectures

• Although languages and parallel architectures could

be considered as independent layers of a computer
system, in reality, for efficiency reasons the parallel
architecture has a strong influence on the language
constructs applied to exploit parallelism.
• Vector processors do not often impose special language
constructs, rather they require special compiler support to
exploit loop parallelism related to identical operations on
elements of vectors or matrices.
Relationships between languages and
parallel architectures

• Data-parallel languages contain language constructs to specify

the allocation of processors for the elements of the parallel
data structures. While the application of parallel data
structures and masks simplifies and shortens the program
text, the allocation constructs lead to the expansion of
programs.
• Languages to perform distributed memory architectures use
message-passing operations like send and receive to specify
communication and synchronization between processors.
Computer execution
Principle of pipelining

• The term ‘pipelining’ refers to the temporal overlapping of

processing.
• Pipelines are nothing more than assembly lines in computing
that can be used either for instruction processing or, in a more
general sense, for performing any complex operations.
• Note that pipelining can be utilized effectively only for a
sequence of the same or similar tasks, much the same as
assembly lines.
• A basic pipeline processes a sequence of tasks, such as
instructions, according to the following principle of operation:
Basic principle of pipelining

• Each task is subdivided into a number of successive subtasks.

• A pipeline stage associated with each subtask which performs the required
operations
• The same amount of time is available in each stage for performing the
required subtask
• All pipeline stages operate like an assembly line, that is, receiving their input
typically from the previous stage and delivering their output to the next
stage
Basic principle of pipelining

• The basic pipeline operates clocked, in other words

synchronously. This means that each stage accepts a new input
at the start of the clock cycle, each stage has a single clock
cycle available for performing the required operations, and
each stage delivers the result to the next stage by the beginning
of the subsequent clock cycle.
Design space of pipeline

• The design space comprises the following two salient aspects:

the basic layout of the pipeline and the method of
dependency resolution:
Basic layout of a pipeline

• we identify and discuss fundamental to the layout of a

pipeline:

• The number of pipeline stages used to perform a given task.

• Specification of the subtasks to be performed in each of the
pipeline stages.
• Layout of the stage sequence.
• Use of bypassing.
• Timing of pipeline operations.
Basic layout of a pipeline
• The number of pipeline stages is one of the fundamental
decisions. Evidently, when more pipeline stages are used,
more parallel execution and thus a higher performance can be
expected.

a. Two-stage pipeline
Basic layout of a pipeline

bbbbb
b. Four-stage pipeline;

c. Eight-stage pipeline
subtasks performed in pipeline
• After a maximum is reached, the performance would
certainly fall. The second aspect is the specification of
the subtasks to be performed in each of the pipeline
stages and the specification of the subtasks can be done
at a number of levels
layout of the stage sequence
• While processing an instruction, the pipeline stages are usually operated
successively one after the other but certain stage is recycled, that is, used
repeatedly, to accomplish the result while performing a multiplication or
division.
• Recycling allows an effective use of hardware resources, but impedes
pipeline repetition rate.
Dependency resolution
• The other major aspect of pipeline design is dependency
resolution.
• Some early pipelined computers followed the MIPS approach
(Microprocessor without Interlocked Pipeline Stages) .
• MIPS employed a static dependency resolution, also termed
static scheduling or software interlock resolution.
• Here, the compiler is responsible for the detection and proper
resolution of dependencies.
• A more advanced resolution scheme is a combined
static/dynamic dependency resolution, which has been
employed in the MIPS R processors (R2000, R3000, R4000,
R4200, R6000).
Dependency resolution

Possibilities for resolving pipeline hazards.

In recent processors dependencies are resolved dynamically, by

extra hardware. compilers for these processors are assumed to
perform a parallel optimization by code reordering, in order to
increase performance.
Array processing
• A synchronous array of parallel processors is
called an array processor, which consists of
multiple processing elements(PEs) under the
supervision of one control unit(CU).

• An array processor can handle single

instruction and multiple data(SIMD) streams.
Array processing
• SIMD machine are specifically designed to
perform vector computations over matrices
and arrays of data.
• SIMD computer appear in two basic
architectural organizations:
• array processors-using random access memory
• associative processors- using content addressable(or
associative) memory.
SIMD processor organization
Scalar and vector pipeline
• The scalar pipeline processor process a
sequence of scalar instruction and operands
under the control of a DO loop.
• The instruction in a small DO loop are often
pre-fetched into instruction buffer, the
operands required by repeated scalar
instruction are loaded into data cache in order
to continuously supply the pipeline with
operands.
Scalar and vector pipeline
• vector pipeline processors handles vector
instruction. there processors are specially
designed to execute vector instructions using
vector operands.
• To handle vector instruction and operands,
vector processor are supported with specialized
firmware and hardware. this specialized
firmware and hardware controls the vector
pipeline rather than software. scalar pipelines
are controlled by software.
Evolution of parallel processing
• Parallel processing involves multiple processes
which are active simultaneously and solving a
given problem, generally on multiple
processors.
• For parallel processing, there are different
types of architectural model employing same
king of parallelism.
• Shared memory multiprocessing
• Distributed memory multiprocessing
Shared memory multiprocessing

• The memory and interconnection mechanism

are common to all CPUs.
Distributed memory multiprocessing

In DM model, the memory is strictly local to

each processor and C1 access the M2 via
interconnection network only.
Future trends towards parallel
processing
• The interest in high performance computing is
not new, always hungry for more and more
computing power.
• Many hardware vendors are today venturing into
the domain of high performance computing.
• Due to immature compilers, insufficient
bandwidth between processors and main
memory as well as among the processors are
hard terrains to cross.
Future trends towards parallel
processing
• The recent trend is clusters where workstation
in the cluster are computer server for another
workstation(HP, IBM proposed various model).
• Gigabit network up to 10 gigabit turn the
parallel processing issue on so that no more
restriction in IPC requirements.
• For the OS and programming, IBM, DEC, Cray
and others standardize the language and its
compiler across a range of machine and
distributed computing.

CCTV Checklist
100% (1)
CCTV Checklist
2 pages
VAS Businessplan
100% (4)
VAS Businessplan
17 pages
1.4-Parallel Computer Architecture
No ratings yet
1.4-Parallel Computer Architecture
22 pages
Parallelism
No ratings yet
Parallelism
22 pages
Parallel Archtecture and Computing
No ratings yet
Parallel Archtecture and Computing
65 pages
Unit 1
No ratings yet
Unit 1
5 pages
Csa Module Iv Notes
No ratings yet
Csa Module Iv Notes
59 pages
Flynns
No ratings yet
Flynns
41 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
No ratings yet
Batch 2 ICS 2101 AND BIT 2102 (1) - 1
17 pages
Module 5
No ratings yet
Module 5
45 pages
COA - Unit 4
No ratings yet
COA - Unit 4
84 pages
Unit 9: Fundamentals of Parallel Processing
No ratings yet
Unit 9: Fundamentals of Parallel Processing
16 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Coa Unit 04
No ratings yet
Coa Unit 04
85 pages
Unit 5
No ratings yet
Unit 5
66 pages
Parallelism - Multiprocessing, Multithreading & Pipelining
No ratings yet
Parallelism - Multiprocessing, Multithreading & Pipelining
65 pages
Campmc Unit Ii
No ratings yet
Campmc Unit Ii
61 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
CH03
No ratings yet
CH03
26 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
Unit 1
No ratings yet
Unit 1
25 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
UNIT 2 CLOUD COMPUTING - converted
No ratings yet
UNIT 2 CLOUD COMPUTING - converted
19 pages
08 Parallel algorithms approches
No ratings yet
08 Parallel algorithms approches
12 pages
CO Module 5 Notes
No ratings yet
CO Module 5 Notes
16 pages
07 - Chapter 1 PDF
No ratings yet
07 - Chapter 1 PDF
27 pages
Ch7 Processing
No ratings yet
Ch7 Processing
22 pages
Lec1 Introduction to Parallel Computing (2)
No ratings yet
Lec1 Introduction to Parallel Computing (2)
40 pages
Model
No ratings yet
Model
14 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
Classification of Parallel Architecture Designs
No ratings yet
Classification of Parallel Architecture Designs
22 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
COA Unit-5
No ratings yet
COA Unit-5
144 pages
Ca Unit 2.2
100% (2)
Ca Unit 2.2
22 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
No ratings yet
VI. Implicit Parallelism - Instruction Level VI. Implicit Parallelism Instruction Level Parallelism. Pipeline Superscalar & Vector P Processors
26 pages
Pipelining
No ratings yet
Pipelining
13 pages
CA Classes-21-25
No ratings yet
CA Classes-21-25
5 pages
Computer Architecture Unit V - Advanced Architecture Part-A
No ratings yet
Computer Architecture Unit V - Advanced Architecture Part-A
4 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Lec2 ParallelProgrammingPlatforms
No ratings yet
Lec2 ParallelProgrammingPlatforms
26 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
21 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
CMP 304 Process
No ratings yet
CMP 304 Process
12 pages
ITEC582-Chapter 16m
No ratings yet
ITEC582-Chapter 16m
55 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Lect5 PDF
No ratings yet
Lect5 PDF
21 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Unit 5
No ratings yet
Unit 5
36 pages
CSO Computer Programming
No ratings yet
CSO Computer Programming
73 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
ACA Unit 8 - 1
No ratings yet
ACA Unit 8 - 1
23 pages
Unit 3 Building Cloud Network
No ratings yet
Unit 3 Building Cloud Network
11 pages
Unit 6: Programmability Issues
No ratings yet
Unit 6: Programmability Issues
18 pages
Unit 5: Distributed Multiprocessor Architectures
No ratings yet
Unit 5: Distributed Multiprocessor Architectures
48 pages
Unit 1 Cloud Computing
No ratings yet
Unit 1 Cloud Computing
59 pages
ACA Unit 4
No ratings yet
ACA Unit 4
41 pages
Introduction To Cloud Computing: Prepared By: Sushil Sah
No ratings yet
Introduction To Cloud Computing: Prepared By: Sushil Sah
71 pages
National Curriculum Framework Nepali
No ratings yet
National Curriculum Framework Nepali
85 pages
9a2fded479b99a98481ae176dfebce22.docx
0% (1)
9a2fded479b99a98481ae176dfebce22.docx
3 pages
Unit - 2 Cloud Architecture: Prepared By: Sushil Sah
No ratings yet
Unit - 2 Cloud Architecture: Prepared By: Sushil Sah
40 pages
A Multi-Objective Quantum-Inspired Genetic Algorithm (Mo-QIGA)
No ratings yet
A Multi-Objective Quantum-Inspired Genetic Algorithm (Mo-QIGA)
9 pages
ACA-unit 1 - Concept 1
No ratings yet
ACA-unit 1 - Concept 1
32 pages
Computer Organization Unit 1: Overview
No ratings yet
Computer Organization Unit 1: Overview
32 pages
Automated Workflow Scheduling in Self-Adaptive Clouds
100% (1)
Automated Workflow Scheduling in Self-Adaptive Clouds
238 pages
Aws Diagram
No ratings yet
Aws Diagram
1 page
Self-Optimization of Energy-Efficient Cloud Resources
No ratings yet
Self-Optimization of Energy-Efficient Cloud Resources
14 pages
National Curriculum Framework 2007 English
100% (1)
National Curriculum Framework 2007 English
87 pages
CLF c01
100% (1)
CLF c01
6 pages
Arizona State University Tri-University Lab Safety Committee
No ratings yet
Arizona State University Tri-University Lab Safety Committee
8 pages
Flexpod Select For High-Performance Oracle Rac: Nva Design
No ratings yet
Flexpod Select For High-Performance Oracle Rac: Nva Design
21 pages
Test Flexpod Overview
No ratings yet
Test Flexpod Overview
55 pages
Critical Incident Reporting
No ratings yet
Critical Incident Reporting
44 pages
Cisc Netapp VM Collabration Design
No ratings yet
Cisc Netapp VM Collabration Design
35 pages
Terminate Process: Moustafa Saad
No ratings yet
Terminate Process: Moustafa Saad
4 pages
Operating Systems Lecture-1 OS Concepts: Tahira Alam University of Asia Pacific
No ratings yet
Operating Systems Lecture-1 OS Concepts: Tahira Alam University of Asia Pacific
24 pages
operating system presentation
No ratings yet
operating system presentation
22 pages
Upgrading To The Cisco Asr 1000 Series Routers Rommon Image Release 12.2 (33R) XND
No ratings yet
Upgrading To The Cisco Asr 1000 Series Routers Rommon Image Release 12.2 (33R) XND
12 pages
Linux Internals and Networking
100% (1)
Linux Internals and Networking
177 pages
Orcad 16.5 Installation Method
No ratings yet
Orcad 16.5 Installation Method
3 pages
OpenCL Heterogenenous Program For Image Processing - ColorSpace Conversion BGR-HSV, HSV-BGR, BGR-GRAY
No ratings yet
OpenCL Heterogenenous Program For Image Processing - ColorSpace Conversion BGR-HSV, HSV-BGR, BGR-GRAY
7 pages
Producer Consumer Problem
No ratings yet
Producer Consumer Problem
4 pages
13 Malloc Basic
No ratings yet
13 Malloc Basic
57 pages
UNIX Shell Scripting
No ratings yet
UNIX Shell Scripting
39 pages
SELinux by Example - Using Security Enhanced Linux
No ratings yet
SELinux by Example - Using Security Enhanced Linux
9 pages
Basic MQ Bridge Configuration Guide
No ratings yet
Basic MQ Bridge Configuration Guide
22 pages
H2 NSD
No ratings yet
H2 NSD
70 pages
Install MediaBundle ProBase 1.3
No ratings yet
Install MediaBundle ProBase 1.3
5 pages
Linux Commands
No ratings yet
Linux Commands
23 pages
Mod Menu Log - Com - Ea.game - nfs14 - Row
No ratings yet
Mod Menu Log - Com - Ea.game - nfs14 - Row
22 pages
Coc 1
No ratings yet
Coc 1
5 pages
Install Debian To CF
No ratings yet
Install Debian To CF
2 pages
Startup
No ratings yet
Startup
4 pages
Distributed UNIT 3
No ratings yet
Distributed UNIT 3
17 pages
V6 Supercharger For Android-Update9beta1.Sh
No ratings yet
V6 Supercharger For Android-Update9beta1.Sh
30 pages
Metadata Extraction Tool: Installation Guide
No ratings yet
Metadata Extraction Tool: Installation Guide
8 pages
IMS Database Control Guide
No ratings yet
IMS Database Control Guide
205 pages
StarWind Virtual SAN FREE Vs PAID PDF
No ratings yet
StarWind Virtual SAN FREE Vs PAID PDF
6 pages
BIBLIO
No ratings yet
BIBLIO
2 pages
Registry Tweaks Related To Network
No ratings yet
Registry Tweaks Related To Network
6 pages
Dynamic and Basic Disk
No ratings yet
Dynamic and Basic Disk
3 pages
Helix Opensource User Manual PDF
No ratings yet
Helix Opensource User Manual PDF
202 pages
Tar
No ratings yet
Tar
44 pages
3. Reverse Engineering and Secure Source Code Review
No ratings yet
3. Reverse Engineering and Secure Source Code Review
19 pages

Introduction To Parallel Processing: Unit-2

Uploaded by

Introduction To Parallel Processing: Unit-2

Uploaded by

Introduction to parallel processing

• Architectures, compilers and operating system have been

• parallelism at the instruction level (fine-grained parallelism),

• Available parallelism can be utilized by

• Data parallelism may be utilized in two different ways. One

• The other possibility is to convert data parallelism into

Although this is a lucid and straightforward scheme, it does not

• Although languages and parallel architectures could

• Data-parallel languages contain language constructs to specify

• The term ‘pipelining’ refers to the temporal overlapping of

• Each task is subdivided into a number of successive subtasks.

• The basic pipeline operates clocked, in other words

• The design space comprises the following two salient aspects:

• we identify and discuss fundamental to the layout of a

• The number of pipeline stages used to perform a given task.

Possibilities for resolving pipeline hazards.

In recent processors dependencies are resolved dynamically, by

• An array processor can handle single

• The memory and interconnection mechanism

In DM model, the memory is strictly local to

You might also like