0% found this document useful (0 votes)
72 views5 pages

IT105 Midterm Lecture Part1

The document discusses parallel programming models which can be used to write parallel programs. There are two main classifications - process interaction and problem decomposition. Process interaction includes shared memory, message passing, and implicit models for communication between parallel processes. Problem decomposition includes task-parallel and data-parallel models which focus on how parallel processes/threads are formulated to work on problems. Hybrid models combine different programming models.

Uploaded by

lov3m3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views5 pages

IT105 Midterm Lecture Part1

The document discusses parallel programming models which can be used to write parallel programs. There are two main classifications - process interaction and problem decomposition. Process interaction includes shared memory, message passing, and implicit models for communication between parallel processes. Problem decomposition includes task-parallel and data-parallel models which focus on how parallel processes/threads are formulated to work on problems. Hybrid models combine different programming models.

Uploaded by

lov3m3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

IT105 – Parallel Processing Midterm Lecture Part 1

Parallel Programming Model

In Computer Science, a Parallel Programming Model is a model for writing parallel programs which can be compiled and
executed //en.wikipedia.org//
Parallel programming model is a set of software technologies to express parallel algorithms and match applications with
underlying parallel systems.
Classifications of parallel programming models can be divided broadly into two areas: process interaction and problem
decomposition.

A. Process interaction relates to the mechanisms by which parallel processes are able to communicate with each
other. The most common forms of interaction are shared memory and message passing, but it can also be
implicit.

1. Shared memory is an efficient means of passing data between programs. Depending on context, programs
may run on a single processor or on multiple separate processors.
 In this model, parallel tasks share a global address space which they read and write to
asynchronously.
 This requires protection mechanisms such as locks, semaphores and monitors to control concurrent
access.
 An advantage of this model from the programmer's point of view is that the notion of data
"ownership" is lacking, so there is no need to specify explicitly the communication of data between
tasks. Program development can often be simplified.
 An important disadvantage in terms of performance is that it becomes more difficult to understand
and manage data locality:
o Keeping data local to the processor that works on it conserves memory accesses, cache
refreshes and bus traffic that occurs when multiple processors use the same data.
o Unfortunately, controlling data locality is hard to understand and may be beyond the
control of the average user.

 Thread Model is a type of Shared memory Model programming


o In the threads model of parallel programming, a single "heavy weight" process can have
multiple "light weight", concurrent execution paths.
o For example: The main program a.out is scheduled to run by the native operating
system. a.out loads and acquires all of the
necessary system and user resources to run.
This is the "heavy weight" process.
o a.out performs some serial work, and then
creates a number of tasks (threads) that can
be scheduled and run by the operating system
concurrently.
o Each thread has local data, but also, shares
the entire resources of a.out. This saves the
overhead associated with replicating a
program's resources for each thread ("light
weight"). Each thread also benefits from a
global memory view because it shares the memory space of a.out.
o A thread's work may best be described as a subroutine within the main program. Any
thread can execute any subroutine at the same time as other threads.

IT105 – Parallel Processing 1


o Threads communicate with each other through global memory (updating address locations).
This requires synchronization constructs to ensure that more than one thread is not
updating the same global address at any time.
o Threads can come and go, but a.out remains present to provide the necessary shared
resources until the application has completed.

2. Message passing is a concept from computer science that is used extensively in the design and
implementation of modern software applications; it is key to some models of concurrency and object-
oriented programming.

 In a message passing model, parallel tasks exchange data through passing messages to one another.
These communications can be asynchronous or synchronous.
 This model demonstrates the following characteristics:
o A set of tasks that use their own local memory
during computation. Multiple tasks can reside on the
same physical machine and/or across an arbitrary
number of machines.
o Tasks exchange data through communications by
sending and receiving messages.
o Data transfer usually requires cooperative
operations to be performed by each process. For
example, a send operation must have a matching
receive operation.

3. Implicit Model - In an implicit model, no process interaction is visible to the programmer, instead the
compiler and/or runtime is responsible for performing it. This is most common with domain-specific
languages where the concurrency within a problem can be more prescribed.
 Advantages
o A programmer that writes implicitly parallel code does not need to worry about task
division or process communication, focusing instead in the problem that his or her program
is intended to solve.
o Implicit parallelism generally facilitates the design of parallel programs and therefore results
in a substantial improvement of programmer productivity.
o Many of the constructs necessary to support this also add simplicity or clarity even in the
absence of actual parallelism. The example above, of List comprehension in the sin()
function, is a useful feature in of itself.
o By using implicit parallelism, languages effectively have to provide such useful constructs to
users simply to support required functionality (a language without a decent for() loop, for
example, is one few programmers will use).
 Disadvantages
o Languages with implicit parallelism reduce the control that the programmer has over the
parallel execution of the program, resulting sometimes in less-than-optimal parallel
efficiency.
o A larger issue is that every program has some parallel and some serial logic. Binary I/O, for
example, requires support for such serial operations as Write() and Seek(). If implicit
parallelism is desired, this creates a new requirement for constructs and keywords to
support code that cannot be threaded or distributed.

IT105 – Parallel Processing 2


B. Problem Decomposition relates to the way in which these processes are formulated. This classification may also
be referred to as algorithmic skeletons or parallel programming paradigms.
1. Task-Parallel Model focuses on processes, or threads of execution. These processes will often be
behaviorally distinct, which emphasizes the need for communication. Task parallelism is a natural way to
express message-passing communication. It is usually classified as MIMD/MPMD or MISD.

2. Data-Parallel Model focuses on performing operations on a data set which is usually regularly
structured in an array. A set of tasks will operate on this data, but independently on separate partitions.
In a shared memory system, the data will be accessible to all, but in a distributed-memory system it will
divided between memories and worked on locally.

 Data parallelism is usually classified as SIMD/SPMD.


 May also be referred to as the Partitioned Global
Address Space (PGAS) model.
 The data parallel model demonstrates the following
characteristics:
o Address space is treated globally
o Most of the parallel work focuses on
performing operations on a data set. The data
set is typically organized into a common
structure, such as an array or cube.
o A set of tasks work collectively on the same
data structure, however, each task works on a
different partition of the same data structure.
o Tasks perform the same operation on their
partition of work, for example, "add 4 to every array element".
 On shared memory architectures, all tasks may have access to the data structure through global
memory
 On distributed memory architectures the data structure is split up and resides as "chunks" in the
local memory of each task.

Other Programming Models

Hybrid Model

 A hybrid model combines more than one of the previously


described programming models.
 Currently, a common example of a hybrid model is the
combination of the message passing model (MPI) with the
threads model (OpenMP).
o Threads perform computationally intensive kernels
using local, on-node data
o Communications between processes on different
nodes occurs over the network using MPI
 This hybrid model lends itself well to the increasingly common hardware environment of clustered multi/many-
core machines.
 Another similar and increasingly popular example of a hybrid model is using MPI with GPU (Graphics Processing
Unit) programming.
o GPUs perform computationally intensive kernels using local, on-node data
o Communications between processes on different nodes occurs over the network using MPI

IT105 – Parallel Processing 3


Tasks and Channels

A simple parallel programming model. The figure shows both the instantaneous state of a computation and a detailed
picture of a single task. A computation consists of a set of tasks (represented by circles) connected by channels (arrows).
A task encapsulates a program and local memory and defines a set of ports that define its interface to its environment. A
channel is a message queue into which a sender can place messages and from which a receiver can remove messages,
``blocking'' if messages are not available.

We consider next the question of which abstractions are appropriate and useful in a parallel programming model. Clearly,
mechanisms are needed that allow explicit discussion about concurrency and locality and that facilitate development of
scalable and modular programs. Also needed are abstractions that are simple to work with and that match the
architectural model, the multicomputer. While numerous possible abstractions could be considered for this purpose, two
fit these requirements particularly well: the task and channel. These are illustrated below and can be summarized as
follows:

The four basic task actions. In addition to reading and writing local memory, a task can send a message, receive a
message, create new tasks (suspending until they terminate), and terminate.

1. A parallel computation consists of one or more tasks. Tasks execute concurrently. The number of tasks can vary
during program execution.
2. A task encapsulates a sequential program and local memory. (In effect, it is a virtual von Neumann machine.) In
addition, a set of inports and outports define its interface to its environment.
3. A task can perform four basic actions in addition to reading and writing its local memory : send messages on its
outports, receive messages on its inports, create new tasks, and terminate.
4. A send operation is asynchronous: it completes immediately. A receive operation is synchronous: it causes
execution of the task to block until a message is available.
5. Outport/inport pairs can be connected by message queues called channels. Channels can be created and
deleted, and references to channels (ports) can be included in messages, so connectivity can vary dynamically.

IT105 – Parallel Processing 4


6. Tasks can be mapped to physical processors in various ways; the mapping employed does not affect the
semantics of a program. In particular, multiple tasks can be mapped to a single processor.

Example: Bridge Construction:

Consider the following real-world problem. A bridge is to be assembled from girders being constructed at a foundry.
These two activities are organized by providing trucks to transport girders from the foundry to the bridge site. This
situation is illustrated below (a) with the foundry and bridge represented as tasks and the stream of trucks as a channel.
Notice that this approach allows assembly of the bridge and construction of girders to proceed in parallel without any
explicit coordination: the foundry crew puts girders on trucks as they are produced, and the assembly crew adds girders
to the bridge as and when they arrive.

Two solutions to the bridge construction problem. Both represent the foundry and the bridge assembly site as separate
tasks, foundry and bridge. The first uses a single channel on which girders generated by foundry are transported as fast as
they are generated. If foundry generates girders faster than they are consumed by bridge, then girders accumulate at the
construction site. The second solution uses a second channel to pass flow control messages from bridge to foundry so as
to avoid overflow.

A disadvantage of this scheme is that the foundry may produce girders much faster than the assembly crew can use them.
To prevent the bridge site from overflowing with girders, the assembly crew instead can explicitly request more girders
when stocks run low.

IT105 – Parallel Processing 5

You might also like