Unit 2 Pram Algorithms: Structure Page Nos
Unit 2 Pram Algorithms: Structure Page Nos
2.0 Introduction 23
2.1 Objectives 23
2.2 Message Passing Programming 23
2.2.1 Shared Memory
2.2.2 Message Passing Libraries
2.2.3 Data Parallel Programming
2.3 Data Structures for Parallel Algorithms 43
2.3.1 Linked List
2.3.2 Arrays Pointers
2.3.3 Hypercube Network
2.4 Summary 47
2.5 Solutions/Answers 47
2.6 References 48
2.0 INTRODUCTION
PRAM (Parallel Random Access Machine) model is one of the most popular models for
designing parallel algorithms. A PRAM consists of unbounded number of processors
interacting with each other through shared memory and a common communication
network. There are many ways to implement the PRAM model. We shall discuss three of
them in this unit: message passing, shared memory and data parallel. We shall also cover
these models and associated data structures here.
A number of languages and routine libraries have been invented to support these models.
Some of them are architecture independent and some are specific to particular platforms.
We shall introduce two of the widely accepted routine libraries in this unit. These are
Message Passing Interface (MPI) and Parallel Virtual Machine (PVM).
2.1 OBJECTIVES
After going through this unit, you should be able to:
Message passing is probably the most widely used parallel programming paradigm today.
It is the most natural, portable and efficient approach for distributed memory systems. It
provides natural synchronisation among the processes so that explicit synchronisation of
memory access is redundant. The programmer is responsible for determining all
parallelism. In this programming model, multiple processes across the arbitrary number
23
Parallel Algorithms &
Parallel Programming
of machines, each with its own local memory, exchange data through send and receive
communication between processes. This model can be best understood through the
diagram shown in Figure 1:
As the diagram indicates, each processor has its own local memory. Processors perform
computations with the data in their own memories and interact with the other processors,
as and when required, through communication network using message-passing libraries.
The message contains the data being sent. But data is not the only constituent of the
message. The other components in the message are:
Once the message has been created it is sent through the communication network. The
communication may be in the following two forms:
i) Point-to-point Communication
ii ) Collective Communications
Barrier: In this mode no actual transfer of data takes place unless all the processors
involved in the communication execute a particular block, called barrier block, in their
message passing program.
24
PRAM Algorithms
Reduction: In reductions, one member of the group collects data from the other
members, reduces them to a single data item which is usually made available to all of the
participating processors.
Drawbacks
In shared memory approach, more focus is on the control parallelism instead of data
parallelism. In this model, multiple processes run independently on different processors,
but they share a common address space accessible to all as shown in Figure 2.
Shared Memory
The processors communicate with one another by one processor writing data into a
location in memory and another processor reading the data. Any change in a memory
location effected by one processor is visible to all other processors. As shared data can be
accessed by more than one processes at the same time, some control mechanism such as
locks/ semaphores should be devised to ensure mutual exclusion. This model is often
referred to as SMP (Symmetric Multi Processors), named so because a common
symmetric implementation is used for several processors of the same type to access the
same shared memory. A number of multi-processor systems implement a shared-
memory programming model; examples include: NEC SX-5, SGI Power Onyx/ Origin
2000; Hewlett-Packard V2600/HyperPlex; SUN HPC 10000 400 MHz ;DELL
PowerEdge 8450.
Thread libraries
The most typical representatives of shared memory programming models are thread
libraries present in most of the modern operating systems. Examples for thread libraries
25
Parallel Algorithms &
Parallel Programming
are, POSIX threads as implemented in Linux, SolarisTM threads for solaris , Win32
threads available in Windows NT and Windows 2000 , and JavaTM threads as part of
the standard JavaTM Development Kit (JDK).
A quite renowned approach in this area is OpenMP, a newly developed industry standard
for shared memory programming on architectures with uniform memory access
characteristics. OpenMP is based on functional parallelism and focuses mostly on the
parallelisation of loops. OpenMP implementations use a special compiler to evaluate the
annotations in the application’s source code and to transform the code into an explicitly
parallel code, which can then be executed. We shall have a detailed discussion on
OpenMP in the next unit.
Shared memory approach provides low-level control of shared memory system, but they
tend to be tedious and error prone. They are more suitable for system programming than
to application programming.
Drawbacks
• Not portable.
• Difficult to manage data locality.
• Scalability is limited by the number of access pathways to memory.
• User is responsible for specifying synchronization, e.g., locks.
In this section, we shall discuss about message passing libraries. Historically, a variety of
message passing libraries have been available since the 1980s. These implementations
differed substantially from each other making it difficult for programmers to develop
portable applications. We shall discuss only two worldwide accepted message passing
libraries namely; MPI and PVM.
26
PRAM Algorithms
from industry and academics. MPI has been implemented as the library of routines that
can be called from languages like, Fortran, C, C++ and Ada programs. MPI was
developed in two stages, MPI-1 and MPI-2. MPI-1 was published in 1994.
Features of MPI-1
• Point-to-point communication,
• Collective communication,
• Process groups and communication domains,
• Virtual process topologies, and
• Binding for Fortran and C.
MPI’s advantage over older message passing libraries is that it is both portable (because
MPI has been implemented for almost every distributed memory architecture) and fast
(because each implementation is optimized for the hardware it runs on).
MPI parallel programs are written using conventional languages like, Fortran and C .One
or more header files such as “mpi.h” may be required to provide the necessary definitions
and declarations. Like any other serial program, programs using MPI need to be
compiled first before running the program. The command to compile the program may
vary according to the compiler being used. If we are using mpcc compiler, then we can
compile a C program named “program.c” using the following command:
Most implementations provide command, typically named mpirun for spawning MPI
processes. It provides facilities for the user to select number of processes and which
processors they will run on. For example to run the object file “program” as n processes
on n processors we use the following command:
MPI functions
MPI includes hundreds of functions, a small subset of which is sufficient for most
practical purposes. We shall discuss some of them in this unit.
It terminates the MPI environment. No MPI function can be called after MPI_Finalize.
Every MPI process belongs to one or more groups (also called communicator). Each
process is identified by its rank (0 to group size –1) within the given group. Initially, all
27
Parallel Algorithms &
Parallel Programming
processes belong to a default group called MPI_COMM_WORLD group. Additional
groups can be created by the user as and when required. Now, we shall learn some
functions related to communicators.
MPI processes do not share memory space and one process cannot directly access other
process’s variables. Hence, they need some form of communication among themselves.
In MPI environment this communication is in the form of message passing. A message in
MPI contains the following fields:
msgaddr: It can be any address in the sender’s address space and refers to location in
memory where message data begins.
datatype: Type of data in message. This field is important in the sense that MPI
supports heterogeneous computing and the different nodes may interpret count field
differently. For example, if the message contains a strings of 2n characters (count =2n),
some machines may interpret it having 2n characters and some having n characters
depending upon the storage allocated per character (1 or 2). The basic datatype in MPI
include all basic types in Fortran and C with two additional types namely MPI_BYTE
and MPI_PACKED. MPI_BYTE indicates a byte of 8 bits .
tag: Identifier for specific message or type of message. It allows the programmer to deal
with the arrival of message in an orderly way, even if the arrival of the message is not
orderly.
int MPI_Send(void *msgaddr, int count, MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm.)
on return, msg can be reused immediately.
int MPI_Recv(void *msgaddr, int count, MPI_Datatype datatype, int dest, int tag,
MPI_Comm comm.)
on return, msg contains requested message.
28
PRAM Algorithms
i) Buffered mode: Send can be initiated whether or not matching receive has been
initiated, and send may complete before matching receive is initiated.
ii) Synchronous mode: Send can be initiated whether or not matching receive has been
initiated, but send will complete only after matching receive has been initiated.
iii) Ready mode: Send can be initiated only if matching receive has already been
initiated.
iv) Standard mode: May behave like either buffered mode or synchronous mode,
depending on specific implementation of MPI and availability of memory for buffer
space.
MPI provides both blocking and non-blocking send and receive operations for all modes.
MPI_Recv and MPI_Irecv are blocking and nonblocking functions for receiving
messages, regardless of mode.
Besides send and receive functions, MPI provides some more useful functions for
communications. Some of them are being introduced here.
MPI_Probe and MPI_Iprobe probe for incoming message without actually receiving it.
Information about message determined by probing can be used to decide how to receive
it.
MPI_Cancel cancels outstanding message request, useful for cleanup at the end of a
program or after major phase of computation.
MPI_Allreduce
29
Parallel Algorithms &
Parallel Programming
Using this function process with rank rank in group comm sends personalized message to
all the processes (including self) and sorted message (according to the rank of sending
processes) are stored in the send buffer of the process. First three parameters define
buffer of sending process and next three define buffer of receiving process.
MPI_Alltoall()
Each process sends a personalized message to every other process in the group.
This function reduces the partial values stored in Sendaddr of each process into a final
result and stores it in Receiveaddr of the process with rank rank. op specifies the
reduction operator.
MPI_Barrier(cmm):
This function synchronises all processes in the group comm.
MPI_Wtime ( ) returns elapsed wall-clock time in seconds since some arbitrary point in
past. Elapsed time for program segment is given by the difference between MPI_Wtime
values at beginning and end of process. Process clocks are not necessarily synchronised,
so clock values are not necessarily comparable across the processes, and care must be
taken in determining overall running time for parallel program. Even if clocks are
explicitly synchronised, variation across clocks still cannot be expected to be
significantly less than round-trip time for zero-length message between the processes.
Example 1:
#include <mpi.h>
int main(int argc, char **argv) {
int i, tmp, sum, s, r, N, x[100];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &s);
MPI_Comm_rank(MPI_COMM_WORLD, &r);
If(r==0)
{
printf( “Enter N:”);
scanf (“%d”, &N);
for (i=1; i<s; i++)
MPI_Send(&N, 1, MPI_INT,i, i, MPI_COMM_WORLD);
for (i=r, i<N; i=i+s)
sum+= x[i];
30
PRAM Algorithms
Merits of MPI
Drawbacks of MPI
PVM was developed by the University of Tennessee, The Oak Ridge National
Laboratory and Emory University. The first version was released in 1989, version 2 was
released in 1991 and finally version 3 was released in 1993. The PVM software enables a
collection of heterogeneous computer systems to be viewed as a single parallel virtual
machine. It transparently handles all message routing, data conversion, and task
scheduling across a network of incompatible computer architectures. The programming
interface of PVM is very simple .The user writes his application as a collection of
cooperating tasks. Tasks access PVM resources through a library of standard interface
routines. These routines allow the initiation and termination of tasks across the network
as well as communication and synchronisation between the tasks. Communication
constructs include those for sending and receiving data structures as well as high-level
primitives such as broadcast and barrier synchronization.
Features of PVM:
• Easy to install;
31
Parallel Algorithms &
Parallel Programming
• Easy to configure;
• Multiple users can each use PVM simultaneously;
• Multiple applications from one user can execute;
• C, C++, and Fortran supported;
• Package is small;
• Users can select the set of machines for a given run of a PVM program;
• Process-based computation;
• Explicit message-passing model, and
• Heterogeneity support.
When the PVM is starts it examines the virtual machine in which it is to operate, and
creates a process called the PVM demon, or simply pvmd on each machine. An example
of a daemon program is the mail program that runs in the background and handles all the
incoming and outgoing electronic mail on a computer, pvmd provides inter-host point of
contact, authenticates task and executes processes on machines. It also provides the fault
detection, routes messages not from or intended for its host, transmits messages from its
application to a destination, receives messages from the other pvmd’s, and buffers it until
the destination application can handle it.
PVM provides a library of functions, libpvm3.a, that the application programmer calls.
Each function has some particular effect in the PVM. However, all this library really
provides is a convenient way of asking the local pvmd to perform some work. The pvmd
then acts as the virtual machine. Both pvmd and PVM library constitute the PVM system.
The PVM system supports functional as well as data decomposition model of parallel
programming. It binds with C, C++, and Fortran . The C and C++ language bindings for
the PVM user interface library are implemented as functions (subroutines in case of
FORTRAN) . User programs written in C and C++ can access PVM library by linking
the library libpvm3.a (libfpvm3.a in case of FORTRAN).
All PVM tasks are uniquely identified by an integer called task identifier (TID) assigned
by local pvmd. Messages are sent to and received from tids. PVM contains several
routines that return TID values so that the user application can identify other tasks in the
system. PVM also supports grouping of tasks. A task may belong to more than one group
and one task from one group can communicate with the task in the other groups. To use
any of the group functions, a program must be linked with libgpvm3.a.
PVM uses two environment variables when starting and running. Each PVM user needs
to set these two variables to use PVM. The first variable is PVM_ROOT, which is set to
the location of the installed pvm3 directory. The second variable is PVM_ARCH , which
tells PVM the architecture of this host. The easiest method is to set these two variables in
your.cshrc file. Here is an example for setting PVM_ROOT:
The user can set PVM_ARCH by concatenating to the file .cshrc, the content of file
$PVM_ROOT/lib/cshrc.stub.
Starting PVM
To start PVM, on any host on which PVM has been installed we can type
% pvm
32
PRAM Algorithms
The PVM console, called pvm, is a stand-alone PVM task that allows the user to
interactively start, query, and modify the virtual machine. Then we can add hosts to
virtual machine by typing at the console prompt (got after last command)
To delete hosts (except the one we are using ) from virtual machine we can type
We can see the configuration of the present virtual machine, we can type
pvm> conf
To see what PVM tasks are running on the virtual machine, we should type
pvm> ps -a
Multiple hosts can be added simultaneously by typing the hostnames in a file one per line
and then type
% pvm hostfile
PVM will then add all the listed hosts simultaneously before the console prompt appears.
Now, we shall learn how to compile and run PVM programs. To compile the program ,
change to the directory pvm/lib/archname where archname is the architecture name of
your computer. Then the following command:
cc program.c -lpvm3 -o prgram
will compile a program called program.c. After compiling, we must put the executable
file in the directory pvm3/bin/ARCH. Also, we need to compile the program separately
for every architecture in virtual machine. In case we use dynamic groups, we should also
add -lgpvm3 to the compile command. The executable file can then be run. To do this,
first run PVM. After PVM is running, executable file may be run from the unix command
line, like any other program.
% aimk master.c
Now, from one window, start PVM and configure some hosts. In another window change
directory to $HOME/pvm3/bin/PVM_ARCH and type
% master
It will ask for a number of tasks to be executed. Then type the number of tasks.
33
Parallel Algorithms &
Parallel Programming
In this section we shall give a brief description of the routines in the PVM 3 user library.
Every PVM program should include the PVM header file “pvm3.h” (in a C program) or
“fpvm3.h” (in a Fortran program).
In PVM 3, all PVM tasks are identified by an integer supplied by the local pvmd. In the
following descriptions this task identifier is called TID. Now we are introducing some
commonly used functions in PVM programming (as in C. For Fortran, we use prefix
pvmf against pvm in C).
Process Management
Returns the tid of the calling process. tid values less than zero indicate an error.
Tells the local pvmd that this process is leaving PVM. info Integer status code returned
by the routine. Values less than zero indicate an error.
• pvm_spawn( char *task, char **argv, int flag, char *where, int ntask, int
*tids )
start new PVM processes. task, a character string is the executable file name of the
PVM process to be started. The executable must already reside on the host on which it is
to be started. Argv is a pointer to an array of arguments to task. If the executable needs
no arguments, then the second argument to pvm_spawn is NULL. flag Integer specifies
spawn options. where , a character string specifying where to start the PVM process. If
flag is 0, then where is ignored and PVM will select the most appropriate host. ntask ,an
integer, specifies the number of copies of the executable to start. tids ,Integer array of
length ntask returns the tids of the PVM processes started by this pvm_spawn call. The
function returns the actual number of processes returned. Negative values indicate error.
Terminates a specified PVM process. tid Integer task identifier of the PVM process to be
killed (not itself). Return values less than zero indicate an error.
34
PRAM Algorithms
Catch output from child tasks. ff is file descriptor on which we write the collected
output. The default is to have the PVM write the stderr and stdout of spawned tasks.
Information
Returns information about the present virtual machine configuration. nhost is the
number of hosts (pvmds) in the virtual machine. narch is the number of different data
formats being used. hostp is pointer to an array of structures which contains the
information about each host including its pvmd task ID, name, architecture, and relative
speed(default is 1000).
• int info = pvm_tasks( int where, int *ntask, struct pvmtaskinfo **taskp )
struct pvmtaskinfo {
int ti_tid; int ti_ptid;
int ti_host;
int ti_flag; char *ti_a_out; } taskp;
Returns the information about the tasks running on the virtual machine. where specifies
what tasks to return the information about. The options are:
0
for all the tasks on the virtual machine
pvmd tid
for all tasks on a given host
tid
for a specific task
ntask returns the number of tasks being reported on.
taskp is a pointer to an array of structures which contains the information about each
task including its task ID, parent tid, pvmd task ID, status flag, and the name of this task's
executable file. The status flag values are: waiting for a message, waiting for the pvmd,
and running.
Dynamic Configuration
35
Parallel Algorithms &
Parallel Programming
Sends a signal to another PVM process. tid is task identifier of PVM process to
receive the signal. signum is the signal number.
• int info = pvm_notify( int what, int msgtag, int cnt, int *tids )
Request notification of PVM event such as host failure. What specifies the type of
event to trigger the notification. Some of them are:
PvmTaskExit
Task exits or is killed.
PvmHostDelete
Host is deleted or crashes.
PvmHostAdd
New host is added.
msgtag is message tag to be used in notification. cnt For PvmTaskExit and
PvmHostDelete, specifies the length of the tids array. For PvmHostAdd specifies the
number of times to notify.
tids for PvmTaskExit and PvmHostDelete is an array of length cnt of task or pvmd TIDs
to be notified about. The array is not used with the PvmHostAdd option.
Message Passing
A nonblocking receive immediately returns with either the data or a flag that the data has
not arrived, while a blocking receive returns only when the data is in the receive buffer.
In addition to these point-to-point communication functions, the model supports the
multicast to a set of tasks and the broadcast to a user-defined group of tasks. There are
also functions to perform global max, global sum, etc. across a user-defined group of
tasks. Wildcards can be specified in the receive for the source and the label, allowing
either or both of these contexts to be ignored. A routine can be called to return the
information about the received messages.
The PVM model guarantees that the message order is preserved. If task 1 sends message
A to task 2, then task 1 sends message B to task 2, message A will arrive at task 2 before
message B. Moreover, if both the messages arrive before task 2 does a receive, then a
wildcard receive will always return message A.
PVM uses SUN's XDR library to create a machine independent data format if you
request it. Settings for the encoding option are:
PvmDataDefault: Use XDR by default, as the local library cannot know in advance
where you are going to send the data.
36
PRAM Algorithms
PvmDataInPlace: Not only is there no encoding, but the data is not even going to be
physically copied into the buffer.
Clear default sends buffer and specifies the message encoding. Encoding specifies the
next message's encoding scheme.
• pvm_packs - Pack the active message buffer with arrays of prescribed data type:
37
Parallel Algorithms &
Parallel Programming
fmt Printf-like format expression specifying what to pack. nitem is the total number of
items to be packed (not the number of bytes). stride is the stride to be used when packing
the items.
• pvm_unpack - Unpacks the active message buffer into arrays of prescribed data
type. It has been implemented for different data types:
Each of the pvm_upk* routines unpacks an array of the given data type from the active
receive buffer. The arguments for each of the routines are a pointer to the array to be
unpacked into, nitem which is the total number of items to unpack, and stride which is
the stride to use when unpacking. An exception is pvm_upkstr() which by definition
unpacks a NULL terminated character string and thus does not need nitem or stride
arguments.
To create and manage dynamic groups, a separate library libgpvm3.a must be linked
with the user programs that make use of any of the group functions. Group management
work is handled by a group server that is automatically started when the first group
function is invoked. Any PVM task can join or leave any group dynamically at any time
without having to inform any other task in the affected groups. Tasks can broadcast
messages to groups of which they are not members. Now we are giving some routines
that handle dynamic processes:
38
PRAM Algorithms
Blocks the calling process until all the processes in a group have called it. count species
the number of group members that must call pvm_barrier before they are all released.
Broadcasts the data in the active message buffer to a group of processes. msgtag is a
message tag supplied by the user. It allows the user's program to distinguish between
different kinds of messages .It should be a nonnegative integer.
• int info = pvm_reduce( void (*func)(), void *data, int count, int datatype, int
msgtag, char *group, int rootginst)
Performs a reduce operation over members of the specified group. func is function
defining the operation performed on the global data. Predefined are PvmMax, PvmMin,
PvmSum and PvmProduct. Users can define their own function. data is pointer to the
starting address of an array of local values. count species the number of elements of
datatype in the data array. Datatype is the type of the entries in the data array. msgtag is
the message tag supplied by the user. msgtag should be greater than zero. It allows the
user's program to distinguish between different kinds of messages. group is the group
name of an existing group. rootginst is the instance number of group member who gets
the result.
We are writing here a program that illustrates the use of these functions in the parallel
programming:
Example 2: Hello.c
#include "pvm3.h"
main()
{
int cc, tid, msgtag;
char buf[100];
printf("%x\n", pvm_mytid());
if (cc == 1) {
msgtag = 1;
pvm_recv(tid, msgtag);
pvm_upkstr(buf);
printf("from t%x: %s\n", tid, buf);
} else
printf("can't start hello_other\n");
pvm_exit();
}
In this program, pvm_mytid( ), returns the TID of the running program (In this case, task
id of the program hello.c). This program is intended to be invoked manually; after
39
Parallel Algorithms &
Parallel Programming
printing its task id (obtained with pvm_mytid()), it initiates a copy of another program
called hello_other using the pvm_spawn() function. A successful spawn causes the
program to execute a blocking receive using pvm_recv. After receiving the message,
the program prints the message sent by its counterpart, as well its task id; the buffer is
extracted from the message using pvm_upkstr. The final pvm_exit call dissociates
the program from the PVM system.
hello_other.c
#include "pvm3.h"
main()
{
int ptid, msgtag;
char buf[100];
ptid = pvm_parent();
pvm_exit();
}
Program is a listing of the ``slave'' or spawned program; its first PVM action is to obtain
the task id of the ``master'' using the pvm_parent call. This program then obtains its
hostname and transmits it to the master using the three-call sequence - pvm_initsend
to initialize the send buffer; pvm_pkstr to place a string, in a strongly typed and
architecture-independent manner, into the send buffer; and pvm_send to transmit it to
the destination process specified by ptid, ``tagging'' the message with the number1.
Programming with the data parallel model is usually accomplished by writing a program
with the data parallel constructs. The constructs can be called to a data parallel subroutine
40
PRAM Algorithms
library or compiler directives. Data parallel languages provide facilities to specify the
data decomposition and mapping to the processors. The languages include data
distribution statements, which allow the programmer to control which data goes on what
processor to minimize the amount of communication between the processors. Directives
indicate how arrays are to be aligned and distributed over the processors and hence
specify agglomeration and mapping. Communication operations are not specified
explicitly by the programmer, but are instead inferred by the compiler from the program.
Data parallel languages are more suitable for SIMD architecture though some languages
for MIMD structure have also been implemented. Data parallel approach is more
effective for highly regular problems, but are not very effective for irregular problems.
The main languages used for this are Fortran 90, High Performance Fortran (HPF) and
HPC++. We shall discuss HPF in detail in the next unit. Now, we shall give a brief
overview of some of the early data parallel languages:
Machine allowed the Fortran arrays to either be distributed across the processing nodes
(called CM arrays, or distributed arrays), or allocated in the memory of the front-end
computer (called front-end arrays, or sequential arrays). Unlike the control unit of the
ILLIAC, the Connection Machine front-end was a conventional, general-purpose
computer--typically a VAX or Sun. But there were still significant restrictions on how
arrays could be manipulated, reflecting the two possible homes.
Glypnir ,IVTRAN and *LISP are some of the other early data parallel languages.
Let us conclude this unit with the introduction of a typical data parallel programming
style called SPMD.
A common style of writing data parallel programs for MIMD computers is SPMD (single
program, multiple data): all the processors execute the same program, but each operates
41
Parallel Algorithms &
Parallel Programming
on a different portion of problem data. It is easier to program than true MIMD, but more
flexible than SIMD. Although most parallel computers today are MIMD architecturally,
they are usually programmed in SPMD style. In this style, although there is no central
controller, the worker nodes carry on doing essentially the same thing at essentially the
same time. Instead of central copies of control variables stored on the control processor
of a SIMD computer, control variables (iteration counts and so on) are usually stored in a
replicated fashion across MIMD nodes. Each node has its own local copy of these global
control variables, but every node updates them in an identical way. There are no centrally
issued parallel instructions, but communications usually happen in the well-defined
collective phases. These data exchanges occur in a prefixed manner that explicitly or
implicitly synchronize the peer nodes. The situation is something like an orchestra
without a conductor. There is no central control, but each individual plays from the same
script. The group as a whole stays in lockstep. This loosely synchronous style has some
similarities to the Bulk Synchronous Parallel (BSP) model of computing introduced by
the theorist Les Valiant in the early 1990s. The restricted pattern of the collective
synchronization is easier to deal with than the complex synchronisation problems of a
general concurrent programming.
A natural assumption was that it should be possible and not too difficult to capture the
SPMD model for programming MIMD computers in data-parallel languages, along lines
similar to the successful SIMD languages. Various research prototype languages
attempted to do this, with some success. By the 90s the value of portable, standarised
programming languages was universally recognized, and there seemed to be some
consensus about what a standard language for SPMD programming ought to look like.
Then the High Performance Fortran (HPF) standard was introduced.
A linked list is a data structure composed of zero or more nodes linked by pointers. Each
node consists of two parts, as shown in Figure 3: info field containing specific
information and next field containing address of next node. First node is pointed by an
external pointer called head. Last node called tail node does not contain address of any
node. Hence, its next field points to null. Linked list with zero nodes is called null linked
list.
Head
42
PRAM Algorithms
A large number of operations can be performed using the linked list. For some of the
operations like insertion or deletion of the new data, linked list takes constant time, but it
is time consuming for some other operations like searching a data. We are giving here an
example where linked list is used:
Example 3:
Given a linear linked list, rank the list elements in terms of the distance from each to the
last element.
A parallel algorithm for this problem is given here. The algorithm assumes there are p
number of processors.
Algorithm:
Processor j, 0≤ j<p, do
if next[j]=j then
rank[j]=0
else rank[j] =1
endif
while rank[next[first]]≠0 Processor j, 0≤ j<p, do
rank[j]=rank[j]+rank[next[j]]
next[j]=next[next[j]]
endwhile
An array is a collection of the similar type of data. Arrays are very popular data
structures in parallel programming due to their easiness of declaration and use. At the one
hand, arrays can be used as a common memory resource for the shared memory
programming, on the other hand they can be easily partitioned into sub-arrays for data
parallel programming. This is the flexibility of the arrays that makes them most
43
Parallel Algorithms &
Parallel Programming
frequently used data structure in parallel programming. We shall study arrays in the
context of two languages Fortran 90 and C.
Consider the array shown below. The size of the array is 10.
5 10 15 20 25 30 35 40 45 50
Index of the first element in Fortran 90 is 1 but that in C is 0 and consequently the index
of the last element in Fortran 90 is 10 and that in C is 9. If we assign the name of array as
A, then ith element in Fortran 90 is A(i) but in C it is A[i-1]. Arrays may be one-
dimensional or they may be multi-dimensional.
int A[10]
declares an array of size 10.
In these languages, array operations are written in a compact form that often makes
programs more readable.
Consider the loop:
s=0
do i=1,n
a(i)=b(i)+c(i)
s=s+a(i)
end do
It can be written (in Fortran 90 notation) as follows:
a(1:n) = b(1:n) +c(1:n)
s=sum(a(1:n))
44
PRAM Algorithms
In addition to Fortran 90 , there are many languages that provide succinct operations on
arrays. Some of the most popular are APL, and MATLAB. Although these languages
were not developed for parallel computing, rather for expressiveness, they can be used to
express parallelism since array operations can be easily executed in parallel. Thus, all the
arithmetic operations (+, -, * /, **) involved in a vector expression can be performed in
parallel. Intrinsic reduction functions, such as the sum above, also can be performed in a
parallel .
The hypercube architecture has played an important role in the development of parallel
processing and is still quite popular and influential. The highly symmetric recursive
structure of the hypercube supports a variety of elegant and efficient parallel algorithms.
Hypercubes are also called n-cubes, where n indicates the number of dimensions. An-
cube can be defined recursively as depicted below:
45
Parallel Algorithms &
Parallel Programming
Properties of Hypercube:
• A node p in a n-cube has a unique label, its binary ID, that is a n-bit binary number.
• The labels of any two neighboring nodes differ in exactly 1 bit.
• Two nodes whose labels differ in k bits are connected by a shortest path of length k.
• Hypercube is both node- and edge- symmetric.
Hypercube structure can be used to implement many parallel algorithms requiring all-
to-all communication, that is, algorithms in which each task must communicate with
every other task. This structure allows a computation requiring all-to-all communication
among P tasks to be performed in just log P steps compared t polynomial time using
other data structures like arrays and linked lists.
2.4 SUMMARY
2.5 SOLUTIONS/ANSWERS
1) hello_other.c
#include "pvm3.h"
main()
{
int ptid, msgtag;
char buf[100];
ptid = pvm_parent();
pvm_exit();
}
Program is a listing of the “slave” or spawned program; its first PVM action is to obtain
the task id of the ``master'' using the pvm_parent call. This program then obtains its
hostname and transmits it to the master using the three-call sequence - pvm_initsend
to initialize the send buffer; pvm_pkstr to place a string, in a strongly typed and
46
PRAM Algorithms
4) Salim G. Parallel Computation, Models and Methods: Akl Prentice Hall of India
47