0% found this document useful (0 votes)

6 views

14013204-3 - Parallel Computing - Lecture1_ (4)

The document outlines a course on Parallel Computing, covering topics such as parallel architectures, algorithm design, and performance tuning. It emphasizes the need for parallel programming due to the limitations of single-processor performance and the increasing complexity of computational problems. The course includes prerequisites, assessment methods, and references for further reading.

Uploaded by

Hiro

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

14013204-3 - Parallel Computing - Lecture1_ (4)

Uploaded by

Hiro

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

14013204-3 - PARALLEL COMPUTING

1/6/25 Lecture - 1 1
14013204-3 - Parallel Computing (3 credits)
n Course Description
n This course examines the theory and practice of parallel computing.
n Topics covered:
n Introduction to Parallel computing.
n Parallel architectures.
n Designing parallel program algorithms, and managing different kinds of parallel
programming overheads, e.g., synchronization, communication, etc. Measuring
and Tuning the parallel performance.
n Programming for shared and distributed parallel architectures.
n Prerequisites
n 14012203-4 Operating Systems,
n 14012401-3 Data Structures

Copyright © 2010, Elsevier Inc. All rights Reserved 2

14013204-3 - Parallel Computing (3 credits)
n Course Weekly Hours
n (2 lec + 2 lab)/week
n Textbook/References
n Introduction to Parallel Programming, Peter Pacheco - Matthew Malensek,
2022.
n Introduction to Parallel Computing From Algorithms to Programming on State-
of-the-Art Platforms, Roman Trobec - Boštjan Slivnik - Patricio Bulić - Borut
Robič, 2018.

Copyright © 2010, Elsevier Inc. All rights Reserved 3

14013204-3 - Parallel Computing (3 credits)
n Assessment Methods
n Quizzes & Participation: 15 %
n Lab : 25 %
n Midterm: 25 %
n Final : 35 %

n There might be a change in this assessment methods.

Copyright © 2010, Elsevier Inc. All rights Reserved 4

An Introduction to Parallel Programming
Peter Pacheco

Chapter 1

Why Parallel Computing?

Copyright © 2010, Elsevier Inc. All rights Reserved 5

Roadmap
n What is parallel Computing?
n How is performance achieved?
n Why we need ever-increasing performance?
n Why we need to write parallel programs?
n How do we write parallel programs?
n Type of Parallel Systems.
n What we’ll be doing.
n Concurrent, parallel, distributed!

Copyright © 2010, Elsevier Inc. All rights Reserved 6

What is Parallel Programming?

§ In general, most software will eventually have to make use of parallelism

as performance matters.

7
How is performance achieved?
n All processors are made of transistors.
n The fundamental components of a CPU and play crucial roles
in its operation.
n Smaller transistors change state faster: they enable higher
speeds.
n Added advanced hardware that made your code faster
automatically.

n Moore’s Law: the number of transistors integrated into

a single chip will double every two years since their
invention, leading to an enormous increase in CPU
performance and consequently to the application.

8
How is performance achieved?

9
How is performance achieved?
n Smaller transistors à More transistors on chips (increase in transistor density).
n More transistors on chips à more computational power (Faster processors )
à higher applications performance.
n Each new generation of processors provides more transistors and offers
higher speed.
n From 1986 – 2003, microprocessors were speeding like a rocket, increasing in
performance by an average of 50% per year.
n This unprecedented increase meant that users and software developers could
often simply wait for the next generation of microprocessors to obtain increased
performance from their applications.
n BUT, this free performance gain is over around 2003-2004!

10
Changing times
n Since 2003, however, single-processor performance improvement has
slowed to the point that in the period from 2015 to 2017, it increased at
less than 4% per year.
n The power of the conventional processor has reached the point where
the processor’s performance and speed are limited and can not be
improved with the increase in transistors.
n Why ??

Copyright © 2010, Elsevier Inc. All rights Reserved 11

Changing times

12
Changing times
n However, as the speed of transistors increases, their power consumption also
increases.
n Most of this power is dissipated as heat, and when an integrated circuit gets too
hot, it becomes unreliable.
n Faster processors à increased power consumption.
n Increased power consumption à increased heat.
n Increased heat à unreliable processors
n Dissipating (removing) the heat is requiring more and more sophisticated
equipment, heat sinks cannot do it anymore.
n In the first decade of the twenty-first century, air-cooled integrated circuits reached the limits of
their ability to dissipate heat. Therefore, it is becoming impossible to continue to increase the
speed of integrated circuits. Indeed, in the last few years, the increase in transistor density has
slowed dramatically.
n Therefore, it is becoming impossible to continue to increase the speed of integrated
circuits.
Copyright © 2010, Elsevier Inc. All rights Reserved 13
Changing times
n Let’s look at some heatsinks:
Intel 386 (25 MHz) Heatsink
n The 386 had no heatsink!
n It did not generate much heat
n Because it has very slow speed

14
Changing times
486 (~50Mhz) Heatsink Pentium 4 (2-3GHz) Heatsink

Pentium 2/3 (233-733MHz) Core i7 (3-3.5GHz) Heatsink

Heatsink

15
Why we need ever-increasing performance
n Computational power is increasing, but so are our computation
problems and needs.
n Problems we never dreamed of have been solved because of past
increases, such as decoding the human genome, ever more
accurate medical imaging, astonishingly fast and accurate Web
searches, and ever more realistic and responsive computer
games would all have been impossible without these increases
n More complex problems are still waiting to be solved.

Copyright © 2010, Elsevier Inc. All rights Reserved 16

Climate modeling
§ To better understand climate change, we need far more accurate computer
models, models that include interactions between the atmosphere, the oceans,
solid land, and the ice caps at the poles. We also need to be able to make detailed
studies of how various interventions might affect the global climate.

Copyright © 2010, Elsevier Inc. All rights Reserved 17

Protein folding
§ It’s believed that misfolded proteins may be involved in diseases such as
Huntington’s, Parkinson’s, and Alzheimer’s, but our ability to study configurations of
complex molecules such as proteins is severely limited by our current
computational power.

Copyright © 2010, Elsevier Inc. All rights Reserved 18

Drug discovery
§ There are many ways in which increased computational power can be used in
research into new medical treatments. For example, there are many drugs that
are effective in treating a relatively small fraction of those suffering from some
disease. It’s possible that we can devise alternative treatments by careful
analysis of the genomes of the individuals for whom the known treatment is
ineffective. This, however, will involve extensive computational analysis of
genomes.

Copyright © 2010, Elsevier Inc. All rights Reserved 19

Energy research
§ Increased computational power will make it possible to program much more
detailed models of technologies, such as wind turbines, solar cells, and
batteries. These programs may provide the information needed to construct far
more efficient clean energy sources.

Copyright © 2010, Elsevier Inc. All rights Reserved 20

Data analysis
§ We generate tremendous amounts of data. By some estimates, the quantity of
data stored worldwide doubles every two years, but most of it is largely useless
unless it’s analyzed.
§ As an example, knowing the sequence of nucleotides in human DNA is, by
itself, of little use. Understanding how this sequence affects development and
how it can cause disease requires extensive analysis.
§ In addition to genomics, huge quantities of data are medical imaging,
astronomical research, and Web search engines—to name a few.

Copyright © 2010, Elsevier Inc. All rights Reserved 21

Solution
n This difference in performance increase has been associated with a
dramatic change in processor design.
n By 2005, most of the major manufacturers of microprocessors had
decided that the road to rapidly increasing performance lay in the
direction of parallelism.
n Instead of designing and building faster processors, put multiple
complete processors on a single integrated circuit.
n Move away from single-core systems to multicore processors.
n “core” = central processing unit (CPU)
n desktop and laptop 4 to 16 cores, even more.
n Servers exceeding 64 cores

Copyright © 2010, Elsevier Inc. All rights Reserved 22

Why we need to write parallel programs
n This change has a very important consequence for software
developers: Adding more processors doesn’t help much if
programmers aren’t aware of them… or don’t know how to use them.

n Serial programs (programs that were written to run on a single

processor)don’t benefit from this approach (in most cases).
n Such programs are unaware of the existence of multiple

processors, and the performance of such a program on a system

with multiple processors will be effectively the same as its
performance on a single processor of the multiprocessor system.

Copyright © 2010, Elsevier Inc. All rights Reserved 23

Why we need to write parallel programs

24
Approaches to the serial problem
n Rewrite serial programs so that they’re parallel, so that
they can make use of multiple cores
n Write translation programs that automatically convert
serial programs into parallel programs.
n This is very difficult to do.
n Success has been limited.
n Sometimes the best parallel solution is to step back and
devise an entirely new algorithm.

Copyright © 2010, Elsevier Inc. All rights Reserved 25

Example
n Compute n values and add them together.
n Serial solution:

Copyright © 2010, Elsevier Inc. All rights Reserved 26

Example (cont.)
n We have p cores, p much smaller than n.
n Each core performs a partial sum of approximately n/p values.

Each core uses it’s own private variables

and executes this block of code
independently of the other cores.

Copyright © 2010, Elsevier Inc. All rights Reserved 27

Example (cont.)
n After each core completes execution of the code, is a private
variable my_sum contains the sum of the values computed by
its calls to Compute_next_value.

n Ex., 8 cores, n = 24, then the calls to Compute_next_value

return: 1,4,3, 9,2,8, 5,1,1, 5,2,7, 2,5,0, 4,1,8, 6,5,1, 2,3,9
n Then the values stores in my_sum will be:

Copyright © 2010, Elsevier Inc. All rights Reserved 28

Example (cont.)
n Once all the cores are done computing their private my_sum,
they form a global sum by sending results to a designated
“master” core which adds the final result.

Copyright © 2010, Elsevier Inc. All rights Reserved 29

Example (cont.)

Global sum
8 + 19 + 7 + 15 + 7 + 13 + 12 + 14 = 95

Copyright © 2010, Elsevier Inc. All rights Reserved 30

But wait!
There’s a much better way
to compute the global sum.

Copyright © 2010, Elsevier Inc. All rights Reserved 31

Better parallel algorithm
n Don’t make the master core do all the work.
n Share it among the other cores.
n Pair the cores so that :
n Core 0 adds its result with core 1’s result.
n Core 2 adds its result with core 3’s result, etc.

n Work with odd and even-numbered pairs of cores.

n Then we can repeat the process with only the even-ranked cores:
n 0 adds in the result of 2,
n 4 adds in the result of 6, and so on.

n Now cores divisible by 4 repeat the process, and so forth until core 0 has the
final result.

Copyright © 2010, Elsevier Inc. All rights Reserved 32

Multiple cores forming a global sum

Copyright © 2010, Elsevier Inc. All rights Reserved 33

Analysis
n In the first example, the master core performs 7 receives and
7 additions.

n In the second example, the master core performs 3 receives

and 3 additions.

n The improvement is more than a factor of 2!

Copyright © 2010, Elsevier Inc. All rights Reserved 34

Analysis (cont.)
n The difference is more dramatic with a larger number of cores.
n If we have 1000 cores:
n The first example would require the master to perform 999 receives and

999 additions.
n The second example would only require 10 receives and 10 additions.

n That’s an improvement of almost a factor of 100!

n The first global sum is a fairly obvious generalization of the serial global
sum: divide the work of adding among the cores, and after each core has
computed its part of the sum, the master core simply repeats the basic serial
addition—if there are p cores, then it needs to add p values.
n The second global sum bears little relation to the original serial addition.
n The point here is that it’s unlikely that a translation program would
“discover” the second global sum.

Copyright © 2010, Elsevier Inc. All rights Reserved 35

How do we write parallel programs?
n There are a number of possible answers to this question, but
most of them depend on the basic idea of partitioning the
work to be done among the cores.
1. Task parallelism
n Partition various tasks carried out solving the problem among the
cores.

2. Data parallelism
n Partition the data used in solving the problem among the cores.
n Each core carries out similar operations on it’s part of the data.

Professor P

15 questions
300 exams

Professor P’s grading assistants

TA#1 TA#3
TA#2

Division of work – data parallelism

TA#1
100 exams
TA#3

100 exams

100 exams
TA#2

Division of work – task parallelism

TA#1
TA#3
Questions 11 - 15
Questions 1 - 5

TA#2
Questions 6 - 10

Division of work – data parallelism

Division of work – task parallelism

Tasks
1) Receiving
2) Addition

Coordination
n Cores usually need to coordinate their work.
n Communication – one or more cores send their current partial sums
to another core.
n Load balancing – share the work evenly among the cores so that one
is not heavily loaded.
n Synchronization – because each core works at its own pace, make
sure cores do not get too far ahead of the rest.

Explicit Vs. Implicit Parallelism
n Explicit Parallelism: Currently, the most powerful parallel programs are written
using explicit parallel constructs.
n that is, they are written using extensions to languages such as C, C++, and
Java.
n These programs include explicit instructions for parallelism:

n E.g. core 0 executes task 0, core 1 executes task 1, . . . , all cores synchronize, .

. . , and so on.
n so such programs are often extremely complex.

n Implicit Parallelism: There are other options for writing parallel programs—for
example, higher level languages.
n They tend to sacrifice performance to make program development somewhat
easier.

Type of parallel systems
n Shared-memory
n The cores can share access to the computer’s memory.
n Coordinate the cores by having them examine and update
shared memory locations.
n Distributed-memory
n Each core has its own, private memory.
n The cores must communicate explicitly by sending messages
across a network.

Type of parallel systems

Shared-memory Distributed-memory

What we’ll be doing
n Learning Parallel HW and SW.
n Parallel architectures.
n Parallel algorithms and coordination details.
n Measuring the performance of parallel algorithms.
n Etc.
n Learn to write programs that are explicitly parallel.
n Will be using the C language.
n Using three different extensions to C.
n Message-Passing Interface (MPI) – Distributed Memory
n OpenMP
n CUDA
Copyright © 2010, Elsevier Inc. All rights Reserved 47
Terminology
n Concurrent computing – a program is one in which multiple
tasks can be in progress at any instant.
n Parallel computing – a program is one in which multiple tasks
cooperate closely to solve a problem
n Distributed computing – a program may need to cooperate
with other programs to solve a problem.

Terminology
n So parallel and distributed programs are concurrent, but a program such as a
multitasking operating system is also concurrent, even when it is run on a machine
with only one core since multiple tasks can be in progress at any instant.
n There isn’t a clear-cut distinction between parallel and distributed programs:
n a parallel program usually runs multiple tasks simultaneously on cores that
are physically close to each other and that either share the same memory or
are connected by a very highspeed network.
n On the other hand, distributed programs tend to be more “loosely coupled.” The

tasks may be executed by multiple computers that are separated by relatively

large distances, and the tasks themselves are often executed by programs that
were created independently.
n As examples, our two concurrent addition programs would be considered
parallel by most authors, while a Web search program would be considered
distributed.

Concluding Remarks (1)
n The laws of physics have brought us to the doorstep of
multicore technology.
n Serial programs typically don’t benefit from multiple cores.
n Automatic parallel program generation from serial
program code isn’t the most efficient approach to get high
performance from multicore computers.

Concluding Remarks (2)
n How Performance achieved?
n Before
n Write a sequential (non-parallel) program.
n It becomes faster with newer processors.
n Higher speed, more advanced.
n Now
n New processor has more cores, but each is slower
n Sequential programs will run slower on a new processor
n They can only use one core
n What will run a faster à Parallel program that can use all the cores!!!

Concluding Remarks (3)
n Learning to write parallel programs involves learning how to
coordinate the cores.
n Parallel programs are usually very complex and therefore,
require sound program techniques and development.
n Many factors affect performance
n Not easy to find the source of bad performance
n Usually requires a deeper understanding of processor architectures
n This is why there is a whole course for it

1k Instacart Drop
67% (3)
1k Instacart Drop
75 pages
Events Ticketing System Requirements
0% (1)
Events Ticketing System Requirements
5 pages
Correct Answer: 10: 1. The Default Value of "Target Scope" For Static Route Is
100% (1)
Correct Answer: 10: 1. The Default Value of "Target Scope" For Static Route Is
4 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
High Performance Computing: Sabah Sayed
No ratings yet
High Performance Computing: Sabah Sayed
22 pages
Introduction To Parallel Computing: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Introduction To Parallel Computing: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
15 pages
Why Parallel Computing?: Peter Pacheco
No ratings yet
Why Parallel Computing?: Peter Pacheco
84 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
CS4961 Parallel Programming: Course Details
No ratings yet
CS4961 Parallel Programming: Course Details
7 pages
Lecture 1
No ratings yet
Lecture 1
23 pages
Unit 1
No ratings yet
Unit 1
54 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
14 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
ch1 PC
No ratings yet
ch1 PC
84 pages
L1 Introduction
No ratings yet
L1 Introduction
12 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
Module 1: Parallelism Fundamentals Week 1 Learning Outcomes
No ratings yet
Module 1: Parallelism Fundamentals Week 1 Learning Outcomes
8 pages
Lec1-Introduction To Parallel - Distributed System
No ratings yet
Lec1-Introduction To Parallel - Distributed System
29 pages
Computer Performance Enhancing Techniques - Session-2
100% (1)
Computer Performance Enhancing Techniques - Session-2
36 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
01 - Introduction: 1 Why Parallel Programming Is Important in Research
No ratings yet
01 - Introduction: 1 Why Parallel Programming Is Important in Research
50 pages
PC 1
No ratings yet
PC 1
53 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Lecture 1
No ratings yet
Lecture 1
18 pages
Lecture Slides-Week1
No ratings yet
Lecture Slides-Week1
59 pages
Parallel Computing An Introduction
No ratings yet
Parallel Computing An Introduction
40 pages
Lecture 2 Introduction to Parallel and Distributed Computing
No ratings yet
Lecture 2 Introduction to Parallel and Distributed Computing
29 pages
Col 11136
No ratings yet
Col 11136
294 pages
Comp422 2011 Lecture1 Introduction
No ratings yet
Comp422 2011 Lecture1 Introduction
50 pages
01_Lecture Intro to HPC
No ratings yet
01_Lecture Intro to HPC
62 pages
High Performance Computing 5.2
No ratings yet
High Performance Computing 5.2
294 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Chapter 1 Measuring Understanding Performance
No ratings yet
Chapter 1 Measuring Understanding Performance
63 pages
Introduction to Parallel Processing Algorithms and Architectures 1st Edition by Behrooz Parhami ISBN 9780306469640 0306469642instant download
100% (3)
Introduction to Parallel Processing Algorithms and Architectures 1st Edition by Behrooz Parhami ISBN 9780306469640 0306469642instant download
78 pages
Trends in Computer Architecture
No ratings yet
Trends in Computer Architecture
30 pages
11 Introduction To Parallel Computing
No ratings yet
11 Introduction To Parallel Computing
14 pages
CAQA5e ch1
No ratings yet
CAQA5e ch1
42 pages
Parallel Computing Seminar Report
100% (3)
Parallel Computing Seminar Report
35 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
High Performance Computing: Course Introduction
No ratings yet
High Performance Computing: Course Introduction
32 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
High Performance Computing
100% (1)
High Performance Computing
294 pages
Lecture 1 - Introduction to PDC
No ratings yet
Lecture 1 - Introduction to PDC
24 pages
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
No ratings yet
CI-0120 Arquitectura de Computadoras Ejemplos FundamentosDiseño
52 pages
CS4230 Parallel Programming: Mary Hall August 21, 2012
No ratings yet
CS4230 Parallel Programming: Mary Hall August 21, 2012
17 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
1 of 1 PDF
No ratings yet
1 of 1 PDF
7 pages
Definition: Higher Speed, or Solving Problems Faster
No ratings yet
Definition: Higher Speed, or Solving Problems Faster
4 pages
p1
No ratings yet
p1
30 pages
P 1
No ratings yet
P 1
44 pages
Parallel Computation Lecture Notes
No ratings yet
Parallel Computation Lecture Notes
44 pages
Computerised Systems Architecture: An embedded systems approach
From Everand
Computerised Systems Architecture: An embedded systems approach
S Mathioudakis
No ratings yet
Multi-Accelerator Systems
From Everand
Multi-Accelerator Systems
Kai Turing
No ratings yet
Fix Common Failures
From Everand
Fix Common Failures
Mei Gates
No ratings yet
Emerging Computing Techniques in Engineering
From Everand
Emerging Computing Techniques in Engineering
Matthew N. O. Sadiku
No ratings yet
Virtual Report Processing: The Mapper Story
From Everand
Virtual Report Processing: The Mapper Story
Louis Schlueter
No ratings yet
PDF Concurrency by Tutorials Second Edition Multithreading in Swift With GCD and Operations Tutorial Team Download
100% (4)
PDF Concurrency by Tutorials Second Edition Multithreading in Swift With GCD and Operations Tutorial Team Download
62 pages
Barriers To Appropriate Technology Growt
No ratings yet
Barriers To Appropriate Technology Growt
12 pages
Welcome To Resident Portal
No ratings yet
Welcome To Resident Portal
1 page
Animated PowerPoint Timeline Template With Morph Transition by PowerPoint School
No ratings yet
Animated PowerPoint Timeline Template With Morph Transition by PowerPoint School
11 pages
OpenLMIS Implementaion Pre-Proposal
No ratings yet
OpenLMIS Implementaion Pre-Proposal
2 pages
Chapter 5 - Control Chart For Attributes
No ratings yet
Chapter 5 - Control Chart For Attributes
28 pages
Valerio Steven Ce175-5w B2 CW8
No ratings yet
Valerio Steven Ce175-5w B2 CW8
8 pages
Website: Vce To PDF Converter: Facebook: Twitter:: Nse6 - Fml-6.2.Vceplus - Premium.Exam.30Q
No ratings yet
Website: Vce To PDF Converter: Facebook: Twitter:: Nse6 - Fml-6.2.Vceplus - Premium.Exam.30Q
19 pages
Accuset 1000 User Manual
No ratings yet
Accuset 1000 User Manual
2 pages
Assignment of Electronics
No ratings yet
Assignment of Electronics
3 pages
E Invoice Relyon Softech LTD
No ratings yet
E Invoice Relyon Softech LTD
13 pages
MM PDF
No ratings yet
MM PDF
228 pages
Report - DCA24MA063 - 193617 - 2 - 6 - 2024 2 - 13 - 55 PM
No ratings yet
Report - DCA24MA063 - 193617 - 2 - 6 - 2024 2 - 13 - 55 PM
19 pages
Triac Mac12n
No ratings yet
Triac Mac12n
9 pages
SV9100 - Hardware Manual - GE - 5 - 0 PDF
No ratings yet
SV9100 - Hardware Manual - GE - 5 - 0 PDF
564 pages
It 8705 F
No ratings yet
It 8705 F
28 pages
csc-363 Final Project Rewards
No ratings yet
csc-363 Final Project Rewards
2 pages
R Hari Prasad 2024pgp297
No ratings yet
R Hari Prasad 2024pgp297
2 pages
TitanEVO 400K To 800K Product Catalog
No ratings yet
TitanEVO 400K To 800K Product Catalog
4 pages
Brain Computer Interfacing: Whole New World of Computing
No ratings yet
Brain Computer Interfacing: Whole New World of Computing
24 pages
All 2014
No ratings yet
All 2014
494 pages
Analysis of The Need To Implement Automated Systems For Managing Electric Vehicle Depots
No ratings yet
Analysis of The Need To Implement Automated Systems For Managing Electric Vehicle Depots
6 pages
PA Ky Thuat Tram Can 120 Tan
No ratings yet
PA Ky Thuat Tram Can 120 Tan
3 pages
Mobile-IP Seminar Report
No ratings yet
Mobile-IP Seminar Report
33 pages
Queensmith Printable Ring Sizer
No ratings yet
Queensmith Printable Ring Sizer
1 page
Usb Debug Remotely Via Adb
No ratings yet
Usb Debug Remotely Via Adb
11 pages
CH 14
100% (3)
CH 14
4 pages

14013204-3 - Parallel Computing - Lecture1_ (4)

Uploaded by

14013204-3 - Parallel Computing - Lecture1_ (4)

Uploaded by

14013204-3 - PARALLEL COMPUTING

Copyright © 2010, Elsevier Inc. All rights Reserved 2

Copyright © 2010, Elsevier Inc. All rights Reserved 3

n There might be a change in this assessment methods.

Copyright © 2010, Elsevier Inc. All rights Reserved 4

Why Parallel Computing?

Copyright © 2010, Elsevier Inc. All rights Reserved 5

Copyright © 2010, Elsevier Inc. All rights Reserved 6

§ In general, most software will eventually have to make use of parallelism

n Moore’s Law: the number of transistors integrated into

Copyright © 2010, Elsevier Inc. All rights Reserved 11

Pentium 2/3 (233-733MHz) Core i7 (3-3.5GHz) Heatsink

Copyright © 2010, Elsevier Inc. All rights Reserved 16

Copyright © 2010, Elsevier Inc. All rights Reserved 17

Copyright © 2010, Elsevier Inc. All rights Reserved 18

Copyright © 2010, Elsevier Inc. All rights Reserved 19

Copyright © 2010, Elsevier Inc. All rights Reserved 20

Copyright © 2010, Elsevier Inc. All rights Reserved 21

Copyright © 2010, Elsevier Inc. All rights Reserved 22

n Serial programs (programs that were written to run on a single

processors, and the performance of such a program on a system

Copyright © 2010, Elsevier Inc. All rights Reserved 23

Copyright © 2010, Elsevier Inc. All rights Reserved 25

Copyright © 2010, Elsevier Inc. All rights Reserved 26

Each core uses it’s own private variables

Copyright © 2010, Elsevier Inc. All rights Reserved 27

n Ex., 8 cores, n = 24, then the calls to Compute_next_value

Copyright © 2010, Elsevier Inc. All rights Reserved 28

Copyright © 2010, Elsevier Inc. All rights Reserved 29

Copyright © 2010, Elsevier Inc. All rights Reserved 30

Copyright © 2010, Elsevier Inc. All rights Reserved 31

n Work with odd and even-numbered pairs of cores.

Copyright © 2010, Elsevier Inc. All rights Reserved 32

Copyright © 2010, Elsevier Inc. All rights Reserved 33

n In the second example, the master core performs 3 receives

n The improvement is more than a factor of 2!

Copyright © 2010, Elsevier Inc. All rights Reserved 34

n That’s an improvement of almost a factor of 100!

Copyright © 2010, Elsevier Inc. All rights Reserved 35

Copyright © 2010, Elsevier Inc. All rights Reserved 36

Copyright © 2010, Elsevier Inc. All rights Reserved 37

Copyright © 2010, Elsevier Inc. All rights Reserved 38

Copyright © 2010, Elsevier Inc. All rights Reserved 39

Copyright © 2010, Elsevier Inc. All rights Reserved 40

Copyright © 2010, Elsevier Inc. All rights Reserved 41

Copyright © 2010, Elsevier Inc. All rights Reserved 42

Copyright © 2010, Elsevier Inc. All rights Reserved 43

Copyright © 2010, Elsevier Inc. All rights Reserved 44

Copyright © 2010, Elsevier Inc. All rights Reserved 45

Copyright © 2010, Elsevier Inc. All rights Reserved 46

Copyright © 2010, Elsevier Inc. All rights Reserved 48

tasks may be executed by multiple computers that are separated by relatively

Copyright © 2010, Elsevier Inc. All rights Reserved 49

Copyright © 2010, Elsevier Inc. All rights Reserved 50

Copyright © 2010, Elsevier Inc. All rights Reserved 51

Copyright © 2010, Elsevier Inc. All rights Reserved 52

You might also like