0% found this document useful (0 votes)

97 views

Module 1: PARALLEL AND DISTRIBUTED COMPUTING

The document discusses parallelism and its importance in improving performance. It covers key concepts like what parallelism is, why it is needed due to hardware and software limitations, and how parallel computing can help solve challenges in fields like healthcare and environmental science. The document also discusses evolution of parallel architectures from single core to multi-core processors and trends like distributed computing across multiple nodes. It highlights goals of parallelism like scalability and performance and challenges in utilizing parallelism effectively.

Uploaded by

Vandana M 19BCE1763

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views

Module 1: PARALLEL AND DISTRIBUTED COMPUTING

Uploaded by

Vandana M 19BCE1763

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 65

Module 1 : Parallelism

Fundamentals
Motivation
• With respect to both hardware and software,
– Road blocks to improve performance with serial
code / single uni-core systems

– Hardware has to support parallelism for improved

performance

– Software has to be able to offer parallelism (in

terms of codes, compilers etc)
Key Concepts
What is parallelism?
• Parallelism is the process of executing multiple
Instructions on same data or single instruction on
multiple data or a combination of both

• Is parallelism and concurrency the same?

– Concurrency does not need redundant units while
parallel computing involves redundant structures
Why Parallelism?

• It is the obvious choice of high performance

evolution

• A learning from applications

– Concurrency and parallelism is key to future
systems
Why Parallel, Concurrent and Distributed
Computing?
• Parallel computing can transform science and engineering
• Example: Cosmology - the study of the universe
– Its evolution and structure—where one of the most striking
paradigm shifts has occurred. A number of new, tremendously
detailed observations deep into the universe are available from
such instruments as the Hubble Space Telescope and the Digital
Sky Survey. However, until recently, it has been difficult, except in
relatively simple circumstances, to use mathematical theories of
the early universe to allow comparison with observations.
– Scalable parallel computers with large memories have changed
all of that
To port or not to port?
• That should not suffix (porting)
• Write parallel applications for parallel
architectures should involve reformulating
data structures, basic code and the dynamics
involved
Parallel supercomputing can answer
challenges to society
• Parallel computing that not only involves
particle studies but for human related data
– Health care
– Weather / environmental application
The burden is not on the ‘Hardware’ alone

• Even with existing architecture, there are lot

of ways the parallelism can be adapted to
harness efficiency in terms of execution time,
power and so on
Overview of Parallel Computing,
Architectural Demands and Trends
Evolution of Parallel architecture
• Stored Program Concept

• ILP (Pipelining) (SISD)

• TLP

• OOP (Out-of Order Processors)

• Vector Processors (SIMD)

• Simultaneous Multi-Threading (SMT)

• Multi-core Processors (MIMD)

• SIMT Architectures (GPU)

• Multi-node systems (With multiple cores, GPUetc) Distributed

Stored Program Computers
CPU Fundamentals

• Primary function is to execute Instructions

• Program Execution Steps
– CPU transfers instructions and i/p data from
main memory to registers in the CPU
– CPU executes instruction in their stored
sequence (unless altered explicitly)
– CPU transfers o/p data from CPU registers to
main memory
Program Execution
Single Core Processors
Single-core computer
Un-pipelined Data Path
Single Processor Core
• Two Parts : Control Unit and Data Path
• Data Path

• Control Unit
– Unit to control the Data path
Pipelining
Pipelined Data-path

Include Registers in between to

provide ILP (hardware support)
Pipeline Hazards
• Structural Hazards

• Data Hazards

• Control Hazards
Pipeline Hazards & Solutions
• Structural Hazards
– Redundancy
• Data Hazards
– Forwarding, Loop Unrolling
• Control Hazards
– Branch Prediction
Constraints in In-order Execution

Data Hazard Structural Hazard

DIV.D F0,F2,F4 DIV.D F0,F2,F4
ADD.D F6,F0,F8 DIV.D F1,F3,F5
ADD.D F7,F9,F11
SUB.D F1,F3,F5
SUB.D F8,F10,F14
MUL.D F6,F10,F8
MUL.D F6,F10,F8
…Continued
Control Hazard
Beq F0,F2, S1
DIV.D F1,F3,F5
ADD.D F6,F0,F8
Jmp S2
S1: SUB.D F8,F10,F14
S2: MUL.D F7,F11,F9
Pipelined Processor with Out of Order Execution
• The fundamental problem we face when trying to keep four
functional units busy is that it's difficult to find contiguous sets
of instructions that can be executed in parallel.

• The solution to these problems is out-of-order execution and

speculative execution.

• Support by both hardware and software

– Hardware that allows instruction execution out of order
– Software to allow totally independent instructions to be
fetched
Drawback of In Order Execution
• A major limitation of in order execution is
– Stalling the execution pipeline till all the previous
instructions are issued irrespective of its data
dependency
• Eg:
DIV.D F0,F2,F4
ADD.D F10,F0,F8
SUB.D F12,F8,F14
The Idea of Out of Order Execution
• In the previous example, to issue SUB.D, we
must separate issue into two parts:
– Checking for any structural hazard
– Waiting for the absence of data hazard

• We still use in-order instruction issue, but we

want an instruction to begin execution as soon
as operands are available which implies Out Of
Order Completion
DIV F0, F1, F2
MUL F4, F2, F3
ADD F5, F0, F4
SUB F5, F2, F1
ShiftL F1, F6, F7
Continued…
• Out of order execution introduces the possibility of WAR and
WAW hazards which do not exist in the 5-stage pipeline

• Eg:
DIV.D F0,F2,F4
ADD.D F6,F0,F8
SUB.D F8,F10,F14
MUL.D F6,F10,F8
– WAR Hazard between ADD.D and SUB.D
– WAW Hazard between ADD.D and MUL.D
Out-of-Order Completion Complexities
• Exceptions should be preserved and imprecise
exceptions should not arise.

• Imprecise exceptions can occur because of two

possibilities
1. The pipeline may have already completed instructions
that are later in program order than the instruction
causing exception
2. The pipeline may have not completed some instructions
that are earlier in program order than the instruction
causing the exception
Out – of Order Execution
• It requires multiple functional units or pipelined functional
units or both.
• In a dynamically scheduled pipeline, all instructions pass
through the issue stage in order
• However, they can be stalled or bypass each other in the read
operand stage.
• Score-Boarding is a technique for allowing instructions out of
order execution when there are sufficient resources and data
dependences.
• Tomasulo’s Algorithm: handles anti-dependences and output
dependences by effectively renaming the registers
dynamically.
Evolution of Multi-Core
Era before Multi-Core
– Improved hardware technologies resulted in,

• Increased clock frequency

• Increased transistor density

• Exploiting ILP
What is Instruction Level Parallelism?
Power Wall

• Can put more transistors on a chip than afford

to turn on
Frequency Wall
• Dynamic power in a chip is proportional to
V2fC
• So increasing ‘f’ will lead to power wall.
Memory Wall
What’s the Solution?
• Because of the above discussed walls,
performance can no more be benefitted from ILP
or by increasing frequency

• Improved Thread level Parallelism will provide

the solution

• Software level threads were developed to

increase number of threads
Software Solution to Improve ILP
Concurrent Execution Model

 More complex computing systems allow a

user to run multiple applications that execute
at the same time.
Sequential Execution
• Conventional programs are called sequential
because a programmer imagines a computer
executing the code statement by statement

• At any instant, the machine is executing

exactly one statement
Thread Level Parallelism – Increasing Hardware
Threads
How TLP overcomes the barriers?
• Overcoming Power wall & Frequency wall
– A single processor operating at 4 GHz can be
replaced by a multiple processor each operating at
2 GHz.
• Overcoming Memory Wall
– Multithreading means cycle-by-cycle interleaving
of instructions from different processes.
– If one process is busy with memory, that latency
can be masked by executing other threads.
Why not Multi-Processors?
• Communication latency between processors
on different boards are high.

• This will become a bottle neck for

performance.
What is Multi-Core?
Moore’s law : Alive & Well
Gordon Moore (co-founder of Intel) predicted
in 1965 that the transistor density of
semiconductor chips would double roughly
every 18 months.
Comparison of different architectures

Single Core Multiprocessor

Multi-core processor

Multi-core processor with shared cache Multi-core processor with distributed cache
Goals of Parallelism
What are the Goals of parallelism?
• To make the architectural design scalable

• To have improved performance

• To Balance cost- performance and reliability

Have these goals been achieved with multi-
core?
• To certain extent

• The applications are unable to exploit the

parallelism available in the multi-core
processors

• This has to be addressed

Challenges
• To utilize the existing parallel architectures
thoroughly
– Skilled people to write parallel programs
– To develop tools and libraries for parallel
programming support
• To scale the existing systems to perform better
for the future
• Architectural models to cater to fast and big
data computing
Future Architectures
• Future architectures will emphasis not only on
the processors but also on
– I/O devices
– Memory
– Interconnection among nodes
Communication and Co-
ordination
INTERCONNECTION NETWORKS
FOR PARALLEL COMPUTERS

• Physically Shared Memory

Distributed Memory
Network Topologies
• Direct Networks
– Direct networks consist of physical
interconnection links that connect the nodes
(typically PEs) in a parallel computer
• Each node may need a router to make routing
decisions.
– Ring
– Mesh
Continued…
• Mesh Torus
Continued…
Hypercube network
Continued…
• Indirect Networks
– In indirect networks, each processing node is
connected to a network of switches over one or
more (often bidirectional) links.
– Typically, this network consists of one or more
stages of switch boxes
double arraySum = 7;
for (int i = 0; i < 100; i++) {
arraySum += A[i];
printf(arraySum)
}
arraySum=7

for (i=0 ; i<100;i<100;i=i+3)

{
arraySum+=a[i] (4 clk)
arraySum+=a[i+1] (
arraySum+=a[i+2];
}
loop3:
load $f10, 0($5) ; $f10 ← A[i]
add $f8, $f8, $f10 ; $f8 ← $f8 + A[i]
addi $5, $5, 8 ; increment ptr for A[i]
addi $7, $7, -1 ; decrement loop count
test:
bgtz $7, loop3 ; Continue if count > 0

Immediate download (Ebook) A Practical Approach to VLSI System on Chip (SoC) Design: A Comprehensive Guide, 2nd by Veena S. Chakravarthi ISBN 9783031183621, 3031183622 ebooks 2024
100% (6)
Immediate download (Ebook) A Practical Approach to VLSI System on Chip (SoC) Design: A Comprehensive Guide, 2nd by Veena S. Chakravarthi ISBN 9783031183621, 3031183622 ebooks 2024
71 pages
Li-Fi Proposal PDF
No ratings yet
Li-Fi Proposal PDF
4 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
1.parallel Processing
100% (7)
1.parallel Processing
20 pages
pdc1: MODULE 1: PARALLELISM FUNDAMENTALS
No ratings yet
pdc1: MODULE 1: PARALLELISM FUNDAMENTALS
42 pages
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
No ratings yet
Fallsem2019-20 Cse4001 Eth Vl2019201001348 Reference Material Cse4001 Parallel and Distributed Computing May 2019 (003) 18
4 pages
Unit1 Parallel and Distributed
No ratings yet
Unit1 Parallel and Distributed
21 pages
Parallel and Distributed Computing Lecture 03
No ratings yet
Parallel and Distributed Computing Lecture 03
44 pages
Parallel & Distributed Computing
No ratings yet
Parallel & Distributed Computing
47 pages
Nehru College of Management
No ratings yet
Nehru College of Management
1 page
Born To Be Parallel and Beyond - DA015152
No ratings yet
Born To Be Parallel and Beyond - DA015152
15 pages
General System Architecture
No ratings yet
General System Architecture
28 pages
Intro Parallel Computing PDF
No ratings yet
Intro Parallel Computing PDF
58 pages
MBA Assignment - E Commerce
No ratings yet
MBA Assignment - E Commerce
11 pages
Seminar Topic On Speed Control of DC Motor Using TRIAC
100% (1)
Seminar Topic On Speed Control of DC Motor Using TRIAC
18 pages
Arm Neoverse N2:: Arm'S 2 Generation High Performance Infrastructure Cpus and System Ips
100% (1)
Arm Neoverse N2:: Arm'S 2 Generation High Performance Infrastructure Cpus and System Ips
27 pages
Multiprocessor Architecture System
100% (1)
Multiprocessor Architecture System
10 pages
Flynn's Taxonomy and SISD SIMD MISD MIMD
86% (14)
Flynn's Taxonomy and SISD SIMD MISD MIMD
7 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
27 pages
MPMC Digtal Notes
No ratings yet
MPMC Digtal Notes
129 pages
Model Curriculum: Iot - Domain Specialist
100% (1)
Model Curriculum: Iot - Domain Specialist
23 pages
Miss Nasreen Anjum: Artificial Intelligence (AI)
No ratings yet
Miss Nasreen Anjum: Artificial Intelligence (AI)
21 pages
Von Neumann Architecture
No ratings yet
Von Neumann Architecture
3 pages
Unit 1 Modern Processors
No ratings yet
Unit 1 Modern Processors
52 pages
Comparative Study of DOS and NOS
No ratings yet
Comparative Study of DOS and NOS
22 pages
S.No Topics Lec: Advanced Computer Network ETCS-401
No ratings yet
S.No Topics Lec: Advanced Computer Network ETCS-401
4 pages
Analog VLSI Design: Technology Trends
No ratings yet
Analog VLSI Design: Technology Trends
31 pages
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
No ratings yet
Birla Institute of Technology & Science, Pilani: Work Integrated Learning Programmes
8 pages
Unit 2 MPMC Notes
No ratings yet
Unit 2 MPMC Notes
37 pages
SMT and CMP Architectures
100% (3)
SMT and CMP Architectures
19 pages
Design Issues: SMT and CMP Architectures
No ratings yet
Design Issues: SMT and CMP Architectures
9 pages
CS2253 Computer Organization and Architecture QBANK2
No ratings yet
CS2253 Computer Organization and Architecture QBANK2
4 pages
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
No ratings yet
Module-1 Theory of Parallelism: The State of Computing Computer Development Milestones
48 pages
CS 8491 Computer Architecture
No ratings yet
CS 8491 Computer Architecture
103 pages
Coa
No ratings yet
Coa
11 pages
Multilevel Memories
No ratings yet
Multilevel Memories
14 pages
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
No ratings yet
1 Module 1 Parallelism Fundamentals Motivation Key Concepts and Challenges Parallel Computing
81 pages
Low Latency Trading Systems From Basics to Implementation2 header
No ratings yet
Low Latency Trading Systems From Basics to Implementation2 header
207 pages
Chapter 05
No ratings yet
Chapter 05
19 pages
Advanced Computer Architecture: CSE-401 E
No ratings yet
Advanced Computer Architecture: CSE-401 E
71 pages
Coa Unit3
No ratings yet
Coa Unit3
116 pages
Data Base and Management System Lab
No ratings yet
Data Base and Management System Lab
8 pages
COA Unit - IV Notes
No ratings yet
COA Unit - IV Notes
25 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
36 pages
Distributed Deadlock Detection
No ratings yet
Distributed Deadlock Detection
18 pages
Introduction To Multi-Core Architecture
No ratings yet
Introduction To Multi-Core Architecture
16 pages
Computer Architecture - Memory System
100% (1)
Computer Architecture - Memory System
22 pages
Architectural Support For HLL
67% (3)
Architectural Support For HLL
18 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
4 Bit Aritchmatic Logic Unit
No ratings yet
4 Bit Aritchmatic Logic Unit
18 pages
Systolic Array
No ratings yet
Systolic Array
42 pages
Architecture and Computer Organization Part 13.3 Virtual Memory
No ratings yet
Architecture and Computer Organization Part 13.3 Virtual Memory
9 pages
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
0% (1)
EEF011 Computer Architecture 計算機結構: Exploiting Instruction-Level Parallelism with Software Approaches
40 pages
Csa Mod 2
100% (1)
Csa Mod 2
28 pages
ES & IOT UNIT 1 - Notes
No ratings yet
ES & IOT UNIT 1 - Notes
33 pages
Logic synthesis Standard Requirements
From Everand
Logic synthesis Standard Requirements
Gerardus Blokdyk
No ratings yet
Tcl 8.5 Network Programming
From Everand
Tcl 8.5 Network Programming
Wojciech Kocjan
No ratings yet
Parallelism
No ratings yet
Parallelism
22 pages
DBMS Module 4 (Transactions) - 5th Semester - Computer Science and Engineering
No ratings yet
DBMS Module 4 (Transactions) - 5th Semester - Computer Science and Engineering
41 pages
Final Lab Exam
No ratings yet
Final Lab Exam
3 pages
Advanced DB Chapter-3
No ratings yet
Advanced DB Chapter-3
54 pages
Operating Systems Deadlocks
No ratings yet
Operating Systems Deadlocks
35 pages
Os Test 2
No ratings yet
Os Test 2
5 pages
3710213
No ratings yet
3710213
2 pages
CSE5006 Multicore-Architectures ETH 1 AC41
No ratings yet
CSE5006 Multicore-Architectures ETH 1 AC41
9 pages
OS (3 CHP)
No ratings yet
OS (3 CHP)
33 pages
Lab 10
No ratings yet
Lab 10
6 pages
Transactions Ch15 Korth
No ratings yet
Transactions Ch15 Korth
32 pages
Lect11 12 Parallel
No ratings yet
Lect11 12 Parallel
57 pages
Concurrency Control in Mobile Environments: Issues & Challenges
No ratings yet
Concurrency Control in Mobile Environments: Issues & Challenges
13 pages
Gaurav Sir CS Subjects
No ratings yet
Gaurav Sir CS Subjects
3 pages
Distributed Process Management
No ratings yet
Distributed Process Management
56 pages
C++ Concurrency Cheatsheet
100% (1)
C++ Concurrency Cheatsheet
1 page
B I T & S, P - K. K. B G C: Irla Nstitute of Echnology Cience Ilani Irla OA Ampus
No ratings yet
B I T & S, P - K. K. B G C: Irla Nstitute of Echnology Cience Ilani Irla OA Ampus
13 pages
2.3-DD2356-OpenMP Definitions
No ratings yet
2.3-DD2356-OpenMP Definitions
12 pages
ACID Properties: Atomicity
No ratings yet
ACID Properties: Atomicity
2 pages
Comsci 2101 Chapter 4 Part 1
No ratings yet
Comsci 2101 Chapter 4 Part 1
3 pages
Transaction Properties
No ratings yet
Transaction Properties
3 pages
CS87 Spring 2010 Readings 7
No ratings yet
CS87 Spring 2010 Readings 7
5 pages
Bahria University Lahore Campus: Department of Computer Sciences
No ratings yet
Bahria University Lahore Campus: Department of Computer Sciences
10 pages
4. Advanced Databse module
No ratings yet
4. Advanced Databse module
131 pages
OS Theory Syllabus
No ratings yet
OS Theory Syllabus
1 page
Python - Multithreaded Programming
No ratings yet
Python - Multithreaded Programming
9 pages
Aditya Silver Oak Institute of Technology Department of Computer Engineering
No ratings yet
Aditya Silver Oak Institute of Technology Department of Computer Engineering
10 pages
Operating System Notion
No ratings yet
Operating System Notion
41 pages
Concurrency: Mutual Exclusion and Synchronization: Operating Systems: Internals and Design Principles, 6/E
No ratings yet
Concurrency: Mutual Exclusion and Synchronization: Operating Systems: Internals and Design Principles, 6/E
75 pages
Distributed Computing Seminar
No ratings yet
Distributed Computing Seminar
37 pages
Process Concept Process Scheduling Operations On Processes Cooperating Processes Interprocess Communication Communication in Client-Server Systems
No ratings yet
Process Concept Process Scheduling Operations On Processes Cooperating Processes Interprocess Communication Communication in Client-Server Systems
38 pages

Module 1: PARALLEL AND DISTRIBUTED COMPUTING

Uploaded by

Module 1: PARALLEL AND DISTRIBUTED COMPUTING

Uploaded by

Module 1 : Parallelism

– Hardware has to support parallelism for improved

– Software has to be able to offer parallelism (in

• Is parallelism and concurrency the same?

• It is the obvious choice of high performance

• A learning from applications

• Even with existing architecture, there are lot

• ILP (Pipelining) (SISD)

• OOP (Out-of Order Processors)

• Vector Processors (SIMD)

• Simultaneous Multi-Threading (SMT)

• Multi-core Processors (MIMD)

• SIMT Architectures (GPU)

• Multi-node systems (With multiple cores, GPUetc) Distributed

• Primary function is to execute Instructions

Include Registers in between to

Data Hazard Structural Hazard

• The solution to these problems is out-of-order execution and

• Support by both hardware and software

• We still use in-order instruction issue, but we

• Imprecise exceptions can occur because of two

• Increased clock frequency

• Increased transistor density

• Can put more transistors on a chip than afford

• Improved Thread level Parallelism will provide

• Software level threads were developed to

 More complex computing systems allow a

• At any instant, the machine is executing

• This will become a bottle neck for

Single Core Multiprocessor

• To have improved performance

• To Balance cost- performance and reliability

• The applications are unable to exploit the

• This has to be addressed

• Physically Shared Memory

for (i=0 ; i<100;i<100;i=i+3)

You might also like