Pdc Digital Notes 6 17
Pdc Digital Notes 6 17
2
Types of parallel computing
From the open-source and proprietary parallel computing vendors, there are generally three types of parallel
computing available, which are discussed below:
1. Bit-level parallelism: The form of parallel computing in which every task is dependent on processor word
size. In terms of performing a task on large-sized data, it reduces the number of instructions the processor
must execute. There is a need to split the operation into series of instructions. For example, there is an 8-bit
processor, and you want to do an operation on 16-bit numbers. First, it must operate the 8 lower-order bits
and then the 8 higher-order bits. Therefore, two instructions are needed to execute the operation. The
2. Instruction-level parallelism: In a single CPU clock cycle, the processor decides in instruction-
level parallelism how many instructions are implemented at the same time. For each clock cycle phase, a
processor in instruction-level parallelism can have the ability to address that is less than one instruction. The
software approach in instruction-level parallelism functions on static parallelism, where the computer decides
3. Task Parallelism: Task parallelism is the form of parallelism in which the tasks are decomposed into subtasks.
Then, each subtask is allocated for execution. And, the execution of subtasks is performed concurrently by
processors.
o One of the primary applications of parallel computing is Databases and Data mining.
o The real-time simulation of systems is another use of parallel computing.
o The technologies, such as Networked videos and Multimedia.
o Science and Engineering.
o Collaborative work environments.
o The concept of parallel computing is used by augmented reality, advanced graphics, and virtual reality.
3
Advantages of Parallel computing
o In parallel computing, more resources are used to complete the task that led to decrease the time and
cut possible costs. Also, cheap components are used to construct parallel clusters.
o Comparing with Serial Computing, parallel computing can solve larger problems in a short time.
o For simulating, modeling, and understanding complex, real-world phenomena, parallel computing is
much appropriate while comparing with serial computing.
o When the local resources are finite, it can offer benefit you over non-local resources.
o There are multiple problems that are very large and may impractical or impossible to solve them on a
single computer; the concept of parallel computing helps to remove these kinds of issues.
o One of the best advantages of parallel computing is that it allows you to do several things in a time by
using multiple computing resources.
o Furthermore, parallel computing is suited for hardware as serial computing wastes the potential
computing power.
4
Future of Parallel Computing
From serial computing to parallel computing, the computational graph has completely changed. Tech giant
likes Intel has already started to include multicore processors with systems, which is a great step towards
parallel computing. For a better future, parallel computation will bring a revolution in the way of working
the computer. Parallel Computing plays an important role in connecting the world with each other more than
before. Moreover, parallel computing's approach becomes more necessary with multi-processor computers,
faster networks, and distributed systems.
Parallel and distributed computing occurs across many different topic areas in computer science,
including algorithms, computer architecture, networks, operating systems, and software engineering.
Platform-based development
Platform-based development is concerned with the design and development of applications for specific types
of computers and operating systems (“platforms”). Platform-based development takes into account system-
specific characteristics, such as those found in Web programming, multimedia development, mobile application
development, and robotics.
Security and information assurance refers to policy and technical elements that protect information systems by
ensuring their availability, integrity, authentication, and appropriate levels of confidentiality. Information
security concepts occur in many areas of computer science, including operating systems, computer
networks, databases, and software.
Software engineering
Software engineering is the discipline concerned with the application of theory, knowledge, and practice to
building reliable software systems that satisfy the computing requirements of customers and users. It is
applicable to small-, medium-, and large-scale computing systems and organizations. Software engineering
uses engineering methods, processes, techniques, and measurements. Software development, whether done by
an individual or a team, requires choosing the most appropriate tools, methods, and approaches for a
given environment.
Computer scientists must understand the relevant social, ethical, and professional issues that surround their
activities. The ACM Code of Ethics and Professional Conduct provides a basis for personal responsibility and
professional conduct for computer scientists who are engaged in system development that directly affects the
general public.
5
Issues in Distributed Systems
The lack of global knowledge.
Naming.
Scalability.
Compatibility.
Process synchronization (requires global knowledge)
Resource management (requires global knowledge)
Security.
Fault tolerance, error recovery.
Parallel computers can be classified according to the level at which the architecture supports parallelism, with
multi-core and multi-processor computers The paper proceeds by specifying key design issues of operating
system: like processes synchronization, memory management, communication, concurrency control.
Notable applications for parallel processing (also known as parallel computing) include
Social networks, mobile systems, online banking, and online gaming (e.g. multiplayer systems) also use
efficient distributed systems. Additional areas of application for distributed computing include e-learning
platforms, artificial intelligence, and e-commerce
6
Challenges of Parallel and distributed Systems:
Scope of Parallelism
• Conventional architectures coarsely comprise of a processor, memory system, and the data path.
• Each of these components present significant performance bottlenecks.
• Parallelism addresses each of these components in significant ways.
• Different applications utilize different aspects of parallelism – e.g., data intensive applications utilize high
aggregate throughput, server applications utilize high aggregate network bandwidth, and scientific applications
typically utilize high processing and memory system performance.
• It is important to understand each of these performance bottlenecks.
Microprocessor clock speeds have posted impressive gains over the past two decades (two to three orders of
magnitude)
Higher levels of device integration have made available a large number of transistors.
The question of how best to utilize these resources is an important one.
Current processors use these resources in multiple functional units and execute multiple instructions in
the same cycle.
The precise manner in which these instructions are selected and executed provides impressive diversity
in architectures.
7
• Pipelining overlaps various stages of instruction execution to achieve performance.
• At a high level of abstraction, an instruction can be executed while the next one is being decoded and the
next one is being fetched.
• For this reason, conventional processors rely on very deep pipelines (20 stage pipelines in state-of-the-art
Pentium processors).
• However, in typical program traces, every 5-6th instruction is a conditional jump! This requires very accurate
branch prediction.
The penalty of a mis prediction grows with the depth of the pipeline, since a larger number of instructions
will have to be flushed.
One simple way of alleviating these bottlenecks is to use multiple pipelines.
The question then becomes one of selecting these instructions.
• True Data Dependency: The result of one operation is an input to the next.
• Branch Dependency: Scheduling instructions across conditional branch statements cannot be done
deterministically a-priori.
• The scheduler, a piece of hardware looks at a large number of instructions in an instruction queue and selects
appropriate number of instructions to execute concurrently based on these factors.
• The hardware cost and complexity of the superscalar scheduler is a major consideration in processor design.
• To address this issues, VLIW processors rely on compile time analysis to identify and bundle together
instructions that can be executed concurrently.
• These instructions are packed and dispatched together, and thus the name very long instruction word.
• This concept was used with some commercial success in the Multiflow Trace machine (circa 1984).
8
• Variants of this concept are employed in the Intel IA64 processors.
• Compilers, however, do not have runtime information such as cache misses. Scheduling is, therefore,
inherently conservative.
• VLIW performance is highly dependent on the compiler. A number of techniques such as loop unrolling,
speculative execution, branch prediction are critical.
An explicitly parallel program must specify concurrency and interaction between concurrent subtasks.
The former is sometimes also referred to as the control structure and the latter as the communication
model.
• Parallelism can be expressed at various levels of granularity – from instruction level to processes.
• Between these extremes exist a range of models, along with corresponding architectural support.
• Processing units in parallel computers either operate under the centralized control of a single control unit or
work independently.
• If there is a single control unit that dispatches the same instruction to various processors (that work on
different data), the model is referred to as single instruction stream, multiple data stream (SIMD).
• If each processor has its own control control unit, each processor can execute different instructions on different
data items. This model is called multiple instruction stream, multiple data stream (MIMD).
9
SIMD and MIMD Processors
SIMD Processors
• Some of the earliest parallel computers such as the Illiac IV, MPP, DAP, CM-2, and MasPar MP-1 belonged
to this class of machines.
• Variants of this concept have found use in co-processing units such as the MMX units in Intel processors
and DSP chips such as the Sharc.
• SIMD relies on the regular structure of computations (such as those in image processing).
• It is often necessary to selectively turn off operations on certain data items. For this reason, most SIMD
programming paradigms allow for an “activity mask”, which determines if a processor should participate in
a computation or not.
10
MIMD Processors
• In contrast to SIMD processors, MIMD processors can execute different programs on different processors.
• A variant of this, called single program multiple data streams (SPMD) executes the same program on
different processors.
• It is easy to see that SPMD and MIMD are closely related in terms of programming flexibility and
underlying architectural support.
• Examples of such platforms include current generation Sun Ultra Servers, SGI Origin Servers,
multiprocessor PCs, workstation clusters, and the IBM SP.
SIMD-MIMD Comparison
• SIMD computers require less hardware than MIMD computers (single control unit).
• However, since SIMD processors ae specially designed, they tend to be expensive and have long design
cycles.
• In contrast, platforms supporting the SPMD paradigm can be built from inexpensive off-the-shelf
components with relatively little effort in a short amount of time.
• There are two primary forms of data exchange between parallel tasks – accessing a shared data space and
exchanging messages.
• Platforms that provide a shared data space are called shared address-space machines or multiprocessors.
• Platforms that support messaging are also called message passing platforms or multi computers.
Shared-Address-Space Platforms
• If the time taken by a processor to access any memory word in the system global or local is identical, the
platform is classified as a uniform memory access (UMA), else, a non uniform memory access (NUMA)
machine.
11
• The distinction between NUMA and UMA platforms is important from the point of view of algorithm
design. NUMA machines require locality from underlying algorithms for performance. • Programming these
platforms is easier since reads and writes are implicitly visible to other processors.
• However, read-write data to shared data must be coordinated (this will be discussed in greater detail when
we talk about threads programming).
• Caches in such machines require coordinated access to multiple copies. This leads to the cache coherence
problem.
• A weaker model of these machines provides an address map, but not coordinated access. These models are
called non cache coherent shared address space machines.
• It is important to note the difference between the terms shared address space and shared memory.
• We refer to the former as a programming abstraction and to the latter as a physical machine attribute.
We begin this discussion with an ideal parallel machine called Parallel Random Access Machine, or PRAM.
• A natural extension of the Random Access Machine (RAM) serial architecture is the Parallel Random
Access Machine, or PRAM.
• PRAMs consist of p processors and a global memory of unbounded size that is uniformly accessible to all
processors.
• Processors share a common clock but may execute different instructions in each cycle.
Depending on how simultaneous memory accesses are handled, PRAMs can be divided into four subclasses.
12
• Common: write only if all values are identical.
• Since these switches must operate in O(1) time at the level of words, for a system of p processors and m
words, the switch complexity is O (mp ).
13