0% found this document useful (0 votes)

62 views

Parallel Processing: sp2016 Lec#5

This document discusses parallel processing architectures. It begins by describing explicitly parallel processor configurations including task-level parallelism. The key elements of parallel architectures are then outlined, including processor configurations, memory configurations, and inter-processor communication approaches. Different parallel platforms are examined based on their physical and logical memory configurations as well as their data exchange and synchronization methods. Specific architectures like shared memory multiprocessors, clusters, and vector/array processors are then detailed. The document concludes by summarizing the different memory and interconnect configurations of parallel platforms.

Uploaded by

RohFollower

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Parallel Processing: sp2016 Lec#5

Uploaded by

RohFollower

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 27

Parallel Processing

sp2016
lec#5
Dr M Shamim Baig

1.1

Explicitly Parallel Processor

architectures:
Task-level Parallelism

1.2

Elements of (Explicit) Parallel

Architectures
Processor configurations:
Instruction/Data Stream based
Memory Configurations:
- Physical & Logical based
- Access-Delay based
Inter-processor communication:
Communication-Interface design
- Data Exchange/ Synch approach
1.3

Parallel Platforms:
Memory (Physical vs Logical) Configurations
Physical vs Logical Memory Config
Physical Memory config (SM, DM, CSM)
Logical Address Space config (SAS, NSAS)
Combinations
CSM + SAS (SMP; UMA)
DM + SAS (DSM; NUMA)
DM + NSAD (Multicomputer/Clusters)

1.4

Shared memory (SM) Multiprocessor

It is important to note difference between
Shared Memory & Shared Address Space
Former is physical memory config, while
later is Logical memory address view for
program.
It is possible to provide Shared Address
Space using a physically distributed
memory.
SM-multiprocessors systems are SAS-config
using physical memory configuration
either as CSM or as (DM DSM)
1.5

UMA vs NUMA
SM-multiprocessors are further categorized based on
memory access delay as UMA (uniform memory
access) & NUMA (non uniform memory access)
UMA system is based on (CSM + SAS) config,
where each processor has same delay for
accessing any memory location
NUMA system is based on (DM+SAS = DSM)
config, where a processor may have different
delay for accessing different memory location.

1.6

UMA & NUMA Arch Block Diagrams

Both are SMmultiprocessors

differing in
Memory Access
Delay format

UMA (CSM+ SAS)

NUMA (DM+ SAS= DSM)

Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space

computer; (b) Uniform-memory-access shared-address-space computer with caches and memories;
(c) Non-uniform-memory-access shared-address-space computer with local memory only.

1.7

Simplistic view of a small shared memory

Symmetric Multi Processor (SMP):
(CSM + SAS + Bus)
Processors

Shared memory

Bus

Examples:
Dual Pentiums
Quad Pentiums
1.8

Quad Pentium Shared

Memory SMP
Processor

Processor

L1 cache

L2 Cache

Bus interface

Processor/
memory
bus
I/O interface

Memory controller

I/O bus

Shared memory

Memory
1.9

Multicomputer (Cluster) Platform

Complete computers P (CU + PE) & DM with NSAS &
interconnection network interface at I/O bus level.
Interconnection
network
Messages
Processor

Local
memory
Computers

These platforms comprise of a set of processors

and their own (exclusive/ distributed) memory
Instances of such a view come naturally from
non-shared-address space (NSAS)
multicomputers e.g clustered workstations

1.10

Data Exchange/Synch Approaches:

Shared data vs Message-Passing
There are two primary approaches of
data exchange/synch in parallel systems
Shared Memory Model
Message-Passing Model
SM-multiprocessors use Shared-Data
approach for data exchange/synch.
Multicomputers (Clusters) use MessagePassing approach for data exchange/
synch.
1.11

DataExchange/Synch Platforms:
Shared-memory vs Message-Passing
Shared memory platforms have low comm
overhead, can support lower grain levels,
while message passing platforms have more
comm overhead & therefore are more suited
for coarse grain levels
SM Multiprocessors are faster, but have poor
scalability
Message passing Multicomputer platforms
are slower but have higher scalability.
1.12

Clusters as a Computing Platform

Clusters: A network of computers became a
very attractive alternative to expensive
supercomputers used for high-performance
computing in early 1990s
Several early projects notably: NASA Beowulf project
Berkeley NOW (network of workstations)
project.

1.13

Beowulf Clusters*
A group of interconnected commodity computers
achieving high performance with low cost.
Typically using commodity interconnects e.g
high speed Ethernet & OS e.g Linux.
* Beowulf comes from name given by NASA Goddard
Space Flight Center cluster project.

1.14

Advantages of Cluster Computer:

(NOW-like)
Processing Nodes are high performance PCs/
workstations readily available at low cost.
Interconnection of processing nodes using
high performance LANs/ SANs
Easily Upgradable by incorporating latest
processors into system as they become available
Easily scalable to bigger & more powerful
systems
Existing software can be easily adapted for
parallel execution on Cluster system
1.15

Cluster Interconnects: LAN vs SAN

LANs : fast / Gbits/ 10-Gbits Ethernet
SANs: Myrinet, Quadrics, Infiniband

Comparison LAN vs SAN

Distance: LAN for longer distance few (km vs m),
causing more delay/slower
Reliability: LAN for less reliable networks, so includes
overhead (error correction etc) which adds to delays
Processing Speed: LAN uses OS calls, causing more
processing delays
1.16

Vector/ Array Data Processors

Vector proc:1D-Temporal parallelism using
pipeline Arith unit & Vector chaining
Float add pipe: Comp exp, algn mant, add mant, Normalize

Array proc:1D- Spatial parallelism using

ALU-array as SIMD
Systolic Array: combines 2-D spatial
parallelism with pipelined (computational
wavefront
Block Diagrams of Vector/array & Systolic processing
?????
1.17

Summary: Parallel Platforms;

Memory & Interconnect Configurations
Memory Config (Physical vs Logical)
Physical Memory config (SM, DM, CSM)
Logical Address Space config (SAS, NSAS)
Combinations
CSM + SAS (SMP; UMA)
DM + SAS (DSM; NUMA)
DM + NSAD (Multicomputer/Clusters)

Interconnection Network:
o Interface level: memory bus (using MBEU) in SMmultiprocessors (UMA, NUMA) vs I/O bus (using NIU)
in multicomputer / cluster
o Data Exchange / sync:
Shared Data model vs Message Passing model
1.18

Homework:
self assessed problems
Please mark your solution & note
the marks you achieved
???????

1.19

Problems:
Explicit Parallel Architectures

1.20

Example Problem1:
Bus based SM-Multiprocessor
Limit of Parallelism
Consider a SM-Multiprocessor using
32-bit RISC processors running at 150
MHz, carries out one instruction per
clock cycle. Assume 15% data-load &
10% data-store instructions using
shared Bus having 2GB/sec BW.
Compute Max number of processors
possible to connect on the above Bus
for following parallel configurations:1.21

Example Problems:
Bus based SM-Multiprocessor:
Limit of Parallelism.contd
(a) SMP (without cache memory)
(b) SMP with cache memory
having hit-ratio of 95% &
memory write-through policy
(c) NUMA with program Locality
factor = 80 %
1.22

SMP (SM & Shared Bus IN

Bus-based interconnects (a) with no local caches; (b) with local memory/caches.
Since much of the data accessed by processors is local to the processor, a
local memory can improve the performance of bus-based machines. Example??

1.23

UMA & NUMA Arch Block Diagrams

Both are SMmultiprocessors

differing in
Memory Access
Delay format

UMA (CSM+ SAS)

NUMA (DM+ SAS= DSM)

Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space

computer; (b) Uniform-memory-access shared-address-space computer with caches and memories;
(c) Non-uniform-memory-access shared-address-space computer with local memory only.
1.24

Homework:
self assessed problem
Please mark your solution & note
the marks you achieved
???????

1.25

Example Problem2:
Message Passing Multicomputer,
Local vs Remote memory data access delays
Consider 64-node multicomputer, each node comprises of
32-bit RISC processor having 250 MHz clock rate & 8 MB
local memory. The Local memory access requires 4 clock
cycles, remote comm initiate (setup) overhead is 15 clock
cycles & the Interconnection Network BW is 80 MB/sec.
Total number of instructions executed are 200,000.
If memory data load & store are 15% & 10% respectively
of the instructions, compute:(a)Load/ store time if all accesses are to local nodes
(b)Load/ store time if 20% of accesses are to remote nodes
note: Assume Packet lengths are variable (depend on addr
& data bytes) & communication protocol given???.
Size of packet fields is in multiple of bytes.
1.26

Example Problem2: contd

Message Passing Multicomputer,
Local vs Remote memory data access delays
Interconnection
network
Messages
Processor

Local
memory
Computers

1.27

Unit III Multiprocessor Issues
No ratings yet
Unit III Multiprocessor Issues
42 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
COE4590_9_Shared Mem_MessgPassing
No ratings yet
COE4590_9_Shared Mem_MessgPassing
14 pages
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
No ratings yet
Shared Memory. Distributed Memory. Hybrid Distributed-Shared Memory
22 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Week_6_A
No ratings yet
Week_6_A
22 pages
DSM
No ratings yet
DSM
36 pages
COE4590_8_Multiprocessor
No ratings yet
COE4590_8_Multiprocessor
17 pages
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
No ratings yet
Parallel Computing Platforms and Memory System Performance: John Mellor-Crummey
43 pages
Lecture-3 Parallel Computer Memory Architecture
No ratings yet
Lecture-3 Parallel Computer Memory Architecture
14 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Week 6 A
No ratings yet
Week 6 A
32 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Multiprocessors and Multicomputers
No ratings yet
Multiprocessors and Multicomputers
27 pages
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
No ratings yet
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
19 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
William Stallings Computer Organization and Architecture: Parallel Processing
No ratings yet
William Stallings Computer Organization and Architecture: Parallel Processing
40 pages
Why Multiprocessors?: Motivation: Opportunity
No ratings yet
Why Multiprocessors?: Motivation: Opportunity
20 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
unit6
No ratings yet
unit6
36 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
MCP ppt
No ratings yet
MCP ppt
19 pages
Unit VI
No ratings yet
Unit VI
50 pages
atII Bks Lec 2021 31 32
No ratings yet
atII Bks Lec 2021 31 32
16 pages
2 - Parallel Computer Architecture - 1
No ratings yet
2 - Parallel Computer Architecture - 1
26 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
No ratings yet
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
58 pages
CA Chap7 Multicores Multiprocessors
No ratings yet
CA Chap7 Multicores Multiprocessors
42 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
2ad6a430 1637912349895
No ratings yet
2ad6a430 1637912349895
51 pages
Chapter 5 - Shared Memory Multiprocessor
No ratings yet
Chapter 5 - Shared Memory Multiprocessor
96 pages
Distributed Memory Architecture
No ratings yet
Distributed Memory Architecture
48 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
This Unit: Shared Memory Multiprocessors: - Three Issues
No ratings yet
This Unit: Shared Memory Multiprocessors: - Three Issues
17 pages
09 Communication models of Parallel platforms
No ratings yet
09 Communication models of Parallel platforms
25 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Computer Architecture
No ratings yet
Computer Architecture
20 pages
Flynn's Classification
No ratings yet
Flynn's Classification
46 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
Chapter2 part 3
No ratings yet
Chapter2 part 3
27 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
18 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
COA Assignment
No ratings yet
COA Assignment
21 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Slide02 Parallel Computers
No ratings yet
Slide02 Parallel Computers
44 pages
P D Group2-2
No ratings yet
P D Group2-2
6 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
From Everand
LPIC-3 Exam 306-300 Mastery: 500 Practice Questions on High Availability & Storage Clusters
Steve Brown
No ratings yet
NVMe Performance Hacks
From Everand
NVMe Performance Hacks
Mei Gates
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Parallel Processing: sp2016 Lec#5
No ratings yet
Parallel Processing: sp2016 Lec#5
27 pages
Parallel Processing: sp2016 Lec#3
No ratings yet
Parallel Processing: sp2016 Lec#3
23 pages
Public Key Infrastructure (PKI) Continued.
No ratings yet
Public Key Infrastructure (PKI) Continued.
23 pages
Network Security (Lec 22) : Ipsec
No ratings yet
Network Security (Lec 22) : Ipsec
34 pages
Network Security (Lec 19 and 20)
No ratings yet
Network Security (Lec 19 and 20)
44 pages
Lec 23 and 24
No ratings yet
Lec 23 and 24
33 pages
Network Security (Key Management)
No ratings yet
Network Security (Key Management)
27 pages
Public Key Infrastructure (PKI)
No ratings yet
Public Key Infrastructure (PKI)
23 pages
16 and 17
No ratings yet
16 and 17
38 pages
Network Security Lec 11 (Message Authentication & Hash Functions)
No ratings yet
Network Security Lec 11 (Message Authentication & Hash Functions)
36 pages
CV - Session
No ratings yet
CV - Session
7 pages
Shams Ul Arifeen
No ratings yet
Shams Ul Arifeen
1 page
Dip Power Epm NP Design Total GPA 4 4 3 3.5 4 63 3.705882 0 0 0 0 0 0 0 0 0 0 0 0
No ratings yet
Dip Power Epm NP Design Total GPA 4 4 3 3.5 4 63 3.705882 0 0 0 0 0 0 0 0 0 0 0 0
1 page

Parallel Processing: sp2016 Lec#5

Uploaded by

Parallel Processing: sp2016 Lec#5

Uploaded by

Parallel Processing

Explicitly Parallel Processor

Elements of (Explicit) Parallel

Shared memory (SM) Multiprocessor

UMA & NUMA Arch Block Diagrams

Both are SMmultiprocessors

UMA (CSM+ SAS)

NUMA (DM+ SAS= DSM)

Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space

Simplistic view of a small shared memory

Quad Pentium Shared

Multicomputer (Cluster) Platform

These platforms comprise of a set of processors

Data Exchange/Synch Approaches:

Clusters as a Computing Platform

Advantages of Cluster Computer:

Cluster Interconnects: LAN vs SAN

Comparison LAN vs SAN

Vector/ Array Data Processors

Array proc:1D- Spatial parallelism using

Summary: Parallel Platforms;

SMP (SM & Shared Bus IN

UMA & NUMA Arch Block Diagrams

Both are SMmultiprocessors

UMA (CSM+ SAS)

NUMA (DM+ SAS= DSM)

Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space

Example Problem2: contd

You might also like