0% found this document useful (0 votes)
5 views

unit 4

Parallel processing involves the simultaneous execution of multiple tasks across various processing elements to solve large problems efficiently. It offers advantages over serial computation by saving time and resources, allowing for the handling of larger problems, and optimizing hardware usage. The document outlines different types of parallelism, including hardware and software parallelism, as well as various models of parallel computers based on memory communication.

Uploaded by

malviyat42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

unit 4

Parallel processing involves the simultaneous execution of multiple tasks across various processing elements to solve large problems efficiently. It offers advantages over serial computation by saving time and resources, allowing for the handling of larger problems, and optimizing hardware usage. The document outlines different types of parallelism, including hardware and software parallelism, as well as various models of parallel computers based on memory communication.

Uploaded by

malviyat42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Parallel Processing

A parallel computer is a “Collection of processing elements that communicate


and co-operate to solve large problems fast”.

“Processing of multiple tasks simultaneous on multiple processor is


called parallel processing".

 Parallel computing is a type of computation in which many calculations or


the execution of processes are carried out simultaneously.

 Large problems can often be divided into smaller ones, which can then be solved at
the same time.

1
Serial Computation
Traditionally, software has been written for serial computation:
Tobe run on a single computer having a single Central Processing Unit (CPU)
What is Parallel Computing?
In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem.
Advantages of Parallel Computing

 It saves time and money as many resources working together will


reduce the time and cut potential costs.
 It can be impractical to solve larger problems on Serial Computing.
 It can take advantage of non-local resources when the local resources
are finite.
 Serial Computing ‘wastes’ the potential computing power, thus
Parallel Computing makes better work of hardware.

4
Types of Parallelism

▪ Parallelism in Hardware
▪ Parallelism in a Uniprocessor
– Pipelining
– Superscalar
▪ SIMD instructions, Vector processors, GPUs
▪ Multiprocessor
– Symmetric shared-memory multiprocessors
– Distributed-memory multiprocessors
– Chip-multiprocessors a.k.a. Multi-cores
▪ Multicomputers a.k.a. clusters

▪ Parallelism in Software
▪ Instruction level parallelism
▪ Thread or Task-level parallelism
▪ Data parallelism
▪ Bit level parallelism
5
Taxonomy of Parallel Computers
▪ According to instruction and data streams (Flynn):
– Single instruction single data (SISD): this is the standard uniprocessor

– Single instruction, multiple data streams (SIMD):


▪ Same instruction is executed in all processors with different data
▪ E.g., Vector processors, SIMD instructions, GPUs

– Multiple instruction, single data streams (MISD):


▪ Different instructions on the same data
▪ Fault-tolerant computers, Near memory computing (Micron Automata processor).

– Multiple instruction, multiple data streams (MIMD): the “common”


multiprocessor
▪ Each processor uses it own data and executes its own program
▪ Most flexible approach
▪ Easier/cheaper to build by putting together “off-the-shelf ” processors
6
2
Taxonomy of Parallel Computers

7
2
Taxonomy of Parallel Computers
▪ According to memory communication model
– Shared address or shared memory: It emphasizes on control parallelism
than on data parallelism. In the shared memory model, multiple processes
execute on different processors independently, but they share a common
memory space. Due to any processor activity, if there is any change in any
memory location, it is visible to the rest of the processors.
▪ Processes in different processors can use the same virtual address space
▪ Any processor can directly access memory in another processor node
▪ Communication is done through shared memory variables
▪ Explicit synchronization with locks and critical sections
▪ Arguably easier to program??

5
Taxonomy of Parallel Computers
– Physically centralized memory, uniform memory access (UMA)
▪ All memory is allocated at same distance from all processors
▪ Also called symmetric multiprocessors (SMP)
▪ Memory bandwidth is fixed and must accommodate all processors  does not
scale to large number of processors
▪ Used in CMPs today (single-socket ones)

CPU CPU CPU CPU

Cache Cache Cache Cache

Interconnection

Main memory

3
Taxonomy of Parallel Computers
– Physically distributed memory, non-uniform memory access (NUMA)
▪ A portion of memory is allocated with each processor (node)
▪ Accessing local memory is much faster than remote memory
▪ If most accesses are to local memory than overall memory bandwidth increases
linearly with the number of processors
▪ Used in multi-socket CMPs E.g Intel Nehalem

CPU CPU CPU CPU


Node Nehalem-EP Nehalem-EP
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
L1 L1 L1 L1 L1 L1 L1 L1
L2 L2 L2 L2 L2 L2 L2 L2

Cache Cache Cache Cache


Shared L3 Cache (inclusive) Shared L3 Cache (inclusive)

IMC QPI QPI IMC

Mem. Mem. Mem. Mem.

DDR3 C
DDR3 A

DDR3 D
DDR3 B

DDR3 E

DDR3 F
I/O I/O

Interconnection

4
Taxonomy of Parallel Computers
▪ According to memory communication model

– Distributed address or message passing

 Like shared memory systems, distributed memory systems vary


widely but share a common characteristic. Distributed memory
systems require a communication network to connect inter-processor
memory.
 Processors have their own local memory. Memory addresses in one
processor do not map to another processor, so there is no concept of
global address space across all processors.
 Because each processor has its own local memory, it operates
independently. Changes it makes to its local memory have no
effect on the memory of other processors. Hence, the concept of
cache coherency does not apply.
 When a processor needs access to data in another processor, it is
usually the task of the programmer to explicitly define how and when
data is communicated. Synchronization between tasks is likewise the
programmer's responsibility.

5
Taxonomy of Parallel Computers
▪ According to memory communication model

– Distributed address or message passing

5
Types of Parallelism in Applications
▪ Instruction-level parallelism (ILP)
– Multiple instructions from the same instruction stream can be executed
concurrently
– Generated and managed by hardware (superscalar) or by compiler (VLIW)
– Limited in practice by data and control dependences
– Ex. Pipelining, Superscalar Architecture, Very large instruction word
(VLIW) etc.

 Example
1: A = B + C // Instruction I1
2: D = E - F // Instruction I2
3: G = A * D // Instruction I3

instructions completed in 2 cycle


instead of 3

7
Types of Parallelism in Applications
▪ Data-level parallelism (DLP)
– Instructions from a single stream operate concurrently on several data
– Limited by non-regular data manipulation patterns and by memory
bandwidth.
– Ex. SIMD: Vector processing, array processing
– Let’s take an example, summing the contents of an array of size N.
For a single-core system, one thread would simply sum the elements
[0] . . . [N − 1]. For a dual-core system, however, thread A, running
on core 0, could sum the elements [0] . . . [N/2 − 1] and while thread
B, running on core 1, could sum the elements [N/2] . . . [N − 1]. So
the Two threads would be running in parallel on separate computing
cores.

8
Types of Parallelism in Applications
▪ Thread-level or task-level parallelism (TLP)
– Multiple threads or instruction sequences from the same application
can be executed concurrently
– Generated by compiler/user and managed by compiler and hardware
– Limited in practice by communication/synchronization overheads
and by algorithm characteristics.
– Consider an example of task parallelism might involve two threads,
each performing a unique statistical operation on the array of
elements. Again, The threads are operating in parallel on separate
computing cores, but each is performing a unique operation.

8
Types of Parallelism in Applications
▪ Bit-level parallelism
– Bit-level parallelism is a form of parallel computing based on increasing
processor word size, depending on very-large-scale integration (VLSI)
technology. Enhancements in computers designs were done by
increasing bit-level parallelisms.

– For e.g., consider a case where an 8-bit processor must add two 16-bit
integers. The processor must first add the 8 lower-order bits from each
integer, then add the 8 higher-order bits, requiring two instructions to
complete a single operation. A 16-bit processor would be able to
complete the operation with single instruction.

You might also like