0% found this document useful (0 votes)

70 views

Parallel2 PDF

The document discusses parallel architecture and Amdahl's law. It provides examples of parallelizable tasks like climate modeling and protein folding. It then discusses how Amdahl's law can be used to calculate maximum expected speedup from parallelization. Finally, it works through examples of calculating speedup for parallel matrix operations on different numbers of processors.

Uploaded by

bahaaalhusainy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views

Parallel2 PDF

Uploaded by

bahaaalhusainy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Parallel Architecture

LE C T U RE 2

1
First we will show why parallel processing is used
Then, we will discuss Amdahl's low
Finally, we will Solve the some examples

PAGE 2
Climate modeling Protein folding Drug discovery Energy research Data analysis

PAGE 3
Power = C * V 2* F Performance = C ores * F

Let’ s have two cores

Power = 2* C * V 2* F Performance = 2* C ores * F

B ut decrease freq uency by 50%

Power = 2* C * V 2/4* F /2 Performance = 2* C ores * F /2

Power = C * V 2/4 * F Performance = C ores * F

PAGE 4
Amdahl's law

• also known as Amdahl's argument, is used to find the maximum expected

improvement to an overall system when only part of the system is improved.

• Amdahl's Law states that potential program speedup is defined by the fraction of
code that can be parallelized

• The speedup of a program using multiple processors in parallel computing is

limited by the time needed for the sequential fraction of the program. For
example, if a program needs 20 hours using a single processor core, and a
particular portion of the program which takes one hour to execute cannot be
parallelized, while the remaining 19 hours (95%) of execution time can be
parallelized, then regardless of how many processors are devoted to a parallelized
execution of this program, the minimum execution time cannot be less than that
critical one hour. Hence the speedup is limited to at most 20×.

PAGE 5
𝑥𝑥 𝑓𝑓

Execution time affected by improvement

= + Execution time unaffected
Amount of improvement

Example:
A simple design problem illustrates it well. Suppose a
program runs in 100 seconds on a computer, with
multiply operations responsible for 80 seconds of this
time. How much do I have to improve the speed of
multiplication if I want my program to run five times
faster?

PAGE 6
Suppose you want to achieve a speed-up of 90 times faster with 100 processors.
What percentage of the original computation can be sequential?

Execution time before

Speed up =
Execution time affected
Execution time before Execution time affected +
Amount of improvement

• The is formula is usually rewritten assuming that the execution time before is 1
for some unit of time, and the execution time affected by improvement is
considered the fraction of the original execution time

PAGE 7
• Suppose you want to perform two sums: one is a sum
of 10 scalar variables, and one is a matrix sum of a
pair of two-dimensional arrays, with dimensions 10 by
10.For now let’s assume only the matrix sum is
parallelizable. What speed-up do you get with 10
versus 40 processors?

• Next, calculate the speed-ups assuming the matrices

grow to 20 by 20.

PAGE 8
To achieve the speed-up of 20.5 on the previous larger
problem with 40 processors, we assumed the load was
perfectly balanced. That is, each of the 40 processors
had 2.5% of the work to do. Instead, show the impact on
speed-up if one processor’s load is higher than all the
rest. Calculate at twice the load (5%) and five time the
load (12.5%) for that hardest working processor. How
well utilized are the rest of the processors?

PAGE 9
For 10*10 array and 40 processors

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑤𝑤𝑤𝑤𝑤𝑤 ℎ 𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖 𝑝𝑝𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = 𝑡𝑡𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑤𝑤𝑤𝑤𝑤𝑤 ℎ 𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 𝑡𝑡𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑤𝑤𝑤𝑤𝑤𝑤 ℎ 𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 = 100𝑡𝑡 + 10𝑡𝑡 = 110𝑡𝑡

The load is distributed according to the following

One processor load is 5% of the load that can be improved (100 addition)

We have 100 additions then 5% of this load will be on one processor

𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 5% ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑏𝑏𝑏𝑏 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 5% ∗ 100 = 0.05 ∗ 100 = 5
The remaining 39 processor will share the remaining load

𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜 39 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 95% ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑏𝑏𝑏𝑏 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑜𝑜𝑜𝑜 39 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 95% ∗ 100 = 0.95 ∗ 100 = 95
Or simply we can say it is

Total load which can be improved – the load on the one processor

=100-5=95
PAGE 10
Now let us calculate the total time after improvement

According to Amdahl’s low

𝑡𝑡
𝑎𝑎𝑎𝑎 𝑎𝑎 𝑖𝑖 𝑖𝑖 = 𝑖𝑖 𝑖𝑖 + 𝑠𝑠

𝑡𝑡
𝑡𝑡
𝑎𝑎𝑎𝑎
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
However the time for improved is the longest time required to calculate the parallel array addition
where it could be either the time for the single processor or the remaining 19
39 processors,
processors; whichever is
highest.

𝑎𝑎𝑎𝑎𝐸𝐸𝐸𝐸𝑎𝑎 𝐸𝐸𝑡𝑡𝑖𝑖𝑎𝑎𝐸𝐸 𝑣𝑣𝐸𝐸𝑡𝑡𝐸𝐸𝐸𝐸𝐸𝐸 = max( 𝑡𝑡𝐸𝐸𝐸𝐸𝐸𝐸 𝑖𝑖𝑎𝑎𝐸𝐸𝐸𝐸𝐸𝐸𝑠𝑠𝑠𝑠𝐸𝐸𝑎𝑎 , 𝑡𝑡𝑎𝑎𝐸𝐸𝑡𝑡𝑎𝑎𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑟𝑟 39 𝑖𝑖𝑎𝑎𝐸𝐸𝐸𝐸𝐸𝐸𝑠𝑠𝑠𝑠𝐸𝐸𝑎𝑎𝑠𝑠 ) + 𝑡𝑡𝑠𝑠𝐸𝐸𝑎𝑎𝐸𝐸𝑎𝑎𝑡𝑡

𝑡𝑡
Where the max operation returns one value which is the highest value between its arguments.

Now the 𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 is equal to the load over number of processors

We have 5 additions over one processor so

5𝑡𝑡
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
1
We multiply the number of additions by t (addition time) so we get 5t

PAGE 11
the 𝑟𝑟 39 is equal to the load of the 39 processors over the number of processors

𝑡𝑡
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
We have 95 additions over 39 processor so
95
=

𝑡𝑡
𝑟𝑟 39
39

𝑡𝑡
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
For simplicity we can say 100t/ 40 however in exams the exact value should be used which is 95t/39

Returning to our equation

𝑡𝑡
𝑎𝑎𝑎𝑎 𝑎𝑎 𝑖𝑖 𝑖𝑖 = max( , 𝑟𝑟 39 )+ 𝑠𝑠

𝑡𝑡
𝑡𝑡
𝑡𝑡
𝑎𝑎𝑎𝑎
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
𝑜𝑜𝑜𝑜𝑜𝑜
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
5𝑡𝑡 95
𝑎𝑎𝑎𝑎𝐸𝐸𝐸𝐸𝑎𝑎 𝐸𝐸𝑡𝑡𝑖𝑖𝑎𝑎𝐸𝐸𝑖𝑖𝐸𝐸𝑡𝑡𝐸𝐸𝐸𝐸𝐸𝐸 = max( , ) + 10
1 39

𝑡𝑡
𝑡𝑡
𝑎𝑎𝑎𝑎 𝑎𝑎 𝑖𝑖 𝑖𝑖 = 5t + 10 = 15
𝑡𝑡
𝑡𝑡
𝑡𝑡
𝑎𝑎𝑎𝑎
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
The speed-up will be
𝑡𝑡 𝑤𝑤 ℎ 𝑖𝑖 𝑖𝑖
𝑠𝑠 − = 𝑡𝑡𝑡𝑡
𝑡𝑡
𝑤𝑤𝑤𝑤
𝑜𝑜𝑜𝑜𝑜𝑜
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
𝑡𝑡 𝑤𝑤 ℎ 𝑖𝑖 𝑖𝑖
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑢𝑢𝑢𝑢
𝑡𝑡𝑡𝑡
𝑡𝑡
𝑤𝑤𝑤𝑤
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖
𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
110
𝑠𝑠 − = = 7.333

𝑡𝑡
15
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝑢𝑢𝑢𝑢
𝑡𝑡
The student should solve for 20*20 array size as well and find the speed up.

PAGE 12
• Architecture/Systems Continuum
• Flynn’s classiﬁcation scheme

PAGE 13
The term coupling refers to the act of joining things together, such as the
links of a chain. The term coupling refers to the method of interconnecting
the components in a system or network and how much those components,
also called elements, depend on each other.
Vector processers Clusters

Instruction-level parallelism (ILP)

Grid Client/ server model
Memory-level parallelism (MLP) Multi-core processors

Tightly Coupled Loosely Coupled

PAGE 14
• The most popular taxonomy of computer architecture was
efine y Flynn in Flynn’s classification scheme is
based on the notion of a stream of information. Two types of
in o mation o into a ocesso inst ctions an ata he
inst ction st eam is efine as the se ence o inst ctions
e o me y the ocessin nit he ata st eam is efine
as the ata t a fic e chan e et een the memo y an the
ocessin nit cco in to Flynn’s classification eithe o
the instruction or data streams can be single or multiple.

PAGE 15
Computer organizations are characterized by the multiplicity of the hardware provided to
service the instruction and data streams. Listed below are Flynn's four machine
organizations:

• Single instruction stream-single data stream (SISD)

• Single instruction stream-multiple data stream (SIMD)

• Multiple instruction stream-single data stream (MISD)

• Multiple instruction stream-multiple data stream (MIMD)

PAGE 16

Headbolts M274 Torque
100% (1)
Headbolts M274 Torque
2 pages
JDF 2.2 Starter Document (Word)
No ratings yet
JDF 2.2 Starter Document (Word)
7 pages
Problems Chapter 17 Parallel Processsing: 17.14 An Application Program Is Executed On A Nine-Computer Cluster. A
No ratings yet
Problems Chapter 17 Parallel Processsing: 17.14 An Application Program Is Executed On A Nine-Computer Cluster. A
4 pages
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Karp
No ratings yet
Karp
5 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
Amdahl's Law, Also Known As Amdahl's Argument,: Parallel Computing Speedup Computer Architect Gene Amdahl Afips
No ratings yet
Amdahl's Law, Also Known As Amdahl's Argument,: Parallel Computing Speedup Computer Architect Gene Amdahl Afips
3 pages
AMDAHL's LAW
No ratings yet
AMDAHL's LAW
3 pages
Cao AMDAHL's Law
No ratings yet
Cao AMDAHL's Law
4 pages
Lecture 6 (Amdahl's Law)
No ratings yet
Lecture 6 (Amdahl's Law)
13 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
1-QP KEY PDC CAT-1 - C1-Slot Answer Key PDF
No ratings yet
1-QP KEY PDC CAT-1 - C1-Slot Answer Key PDF
8 pages
Amdahl's Law
No ratings yet
Amdahl's Law
5 pages
COE4590_12_Amdahls_Law
No ratings yet
COE4590_12_Amdahls_Law
18 pages
p2
No ratings yet
p2
19 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Slides
No ratings yet
Slides
44 pages
Patterson6e_MIPS_Ch06_PPT(2) (1)
No ratings yet
Patterson6e_MIPS_Ch06_PPT(2) (1)
74 pages
Amdahls Law
No ratings yet
Amdahls Law
39 pages
performance metrics
No ratings yet
performance metrics
34 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
Parallel Algorithm Analysis
No ratings yet
Parallel Algorithm Analysis
11 pages
Amdahls Law - Advanced Computer Architecture
No ratings yet
Amdahls Law - Advanced Computer Architecture
2 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Computer Performance
No ratings yet
Computer Performance
35 pages
Amdahls Law
No ratings yet
Amdahls Law
18 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
1.2 Performance
No ratings yet
1.2 Performance
14 pages
Speed Up Laws
No ratings yet
Speed Up Laws
21 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Unit 4
No ratings yet
Unit 4
64 pages
OOAD
No ratings yet
OOAD
67 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
Chapter 1 Lecture 2 & 3 - Computer Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Computer Performance
37 pages
A daa 1-2-3-4.pptx
No ratings yet
A daa 1-2-3-4.pptx
30 pages
10
No ratings yet
10
76 pages
Performance Evaluation of Parallel Computers
No ratings yet
Performance Evaluation of Parallel Computers
37 pages
Aitsam_B21F0230CS015_PDC_ASS02
No ratings yet
Aitsam_B21F0230CS015_PDC_ASS02
5 pages
daa_unit-v
No ratings yet
daa_unit-v
50 pages
Chapter 4
No ratings yet
Chapter 4
16 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
Amdahl's Law (Autosaved)
No ratings yet
Amdahl's Law (Autosaved)
12 pages
Numerical Ray Tracing: 4.2 Lab Objectives
No ratings yet
Numerical Ray Tracing: 4.2 Lab Objectives
12 pages
Chapter 1 Lecture 2 & 3 - Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Performance
36 pages
How Much Parallelism
No ratings yet
How Much Parallelism
23 pages
Parallel and Distributed Computing Lecture 02
No ratings yet
Parallel and Distributed Computing Lecture 02
17 pages
Lab 8: Programming With B&R Automation Studio: Discrete Time Transfer Functions
No ratings yet
Lab 8: Programming With B&R Automation Studio: Discrete Time Transfer Functions
6 pages
Easyloadrunner - Blogspot.in 2013 06 Pacing Calculation 6
No ratings yet
Easyloadrunner - Blogspot.in 2013 06 Pacing Calculation 6
5 pages
Lecture-11 Amdhals Law Gustafsons Law
No ratings yet
Lecture-11 Amdhals Law Gustafsons Law
16 pages
Performance Analysis: PE PE
No ratings yet
Performance Analysis: PE PE
10 pages
Coa Presentation
No ratings yet
Coa Presentation
20 pages
CS 807 Task: C: A GPU-based Parallel Ant Colony Algorithm For Scientific Workflow Scheduling
No ratings yet
CS 807 Task: C: A GPU-based Parallel Ant Colony Algorithm For Scientific Workflow Scheduling
10 pages
Speedup and Efficiency
No ratings yet
Speedup and Efficiency
11 pages
Introduction To Algorithm Analysis and Design
No ratings yet
Introduction To Algorithm Analysis and Design
18 pages
Amdahl's Law 3.1.6
No ratings yet
Amdahl's Law 3.1.6
3 pages
Computer Architecture Unit 1 - Phase 2 PDF
No ratings yet
Computer Architecture Unit 1 - Phase 2 PDF
26 pages
Exercise
No ratings yet
Exercise
1 page
HPC Fall 2010: Prof. Robert Van Engelen
No ratings yet
HPC Fall 2010: Prof. Robert Van Engelen
35 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
Hon Pro
No ratings yet
Hon Pro
8 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Introduction To Parallel Processing and Distributed Systems
No ratings yet
Introduction To Parallel Processing and Distributed Systems
15 pages
Distributed System Lecture 3
No ratings yet
Distributed System Lecture 3
68 pages
Distributed System Lecture 2
No ratings yet
Distributed System Lecture 2
41 pages
Distributed System Lecture 1
No ratings yet
Distributed System Lecture 1
40 pages
State Mechien
No ratings yet
State Mechien
18 pages
JDBC
No ratings yet
JDBC
24 pages
Cte 113 - Computer Application Package Practical Workbook
No ratings yet
Cte 113 - Computer Application Package Practical Workbook
10 pages
Product Information NetLine Ops++
No ratings yet
Product Information NetLine Ops++
2 pages
Product Backlog MACRA Studio 1.5 2020.06.26
No ratings yet
Product Backlog MACRA Studio 1.5 2020.06.26
6 pages
Project Status Presentation Template
No ratings yet
Project Status Presentation Template
2 pages
Aaronia Rtsa-Suite Pro 2024
No ratings yet
Aaronia Rtsa-Suite Pro 2024
12 pages
01 Handout 4-1
No ratings yet
01 Handout 4-1
3 pages
Product Specification: Connector 1 of 9 D SOUDY Aymeric 07/05/2020 Unrestricted
No ratings yet
Product Specification: Connector 1 of 9 D SOUDY Aymeric 07/05/2020 Unrestricted
9 pages
Control Systems Questions and Answers - Signal Flow Graphs
No ratings yet
Control Systems Questions and Answers - Signal Flow Graphs
5 pages
ReleaseNote - FileList of X409MA - 2009 - X64 - V1.00
No ratings yet
ReleaseNote - FileList of X409MA - 2009 - X64 - V1.00
6 pages
Image Encryption Explanation
No ratings yet
Image Encryption Explanation
3 pages
Agile
No ratings yet
Agile
42 pages
Chembian MS Thesis 1706623765
No ratings yet
Chembian MS Thesis 1706623765
159 pages
All-Electric Bus HVAC Solutions: Choose From A Range of Clean, Efficient Solutions
No ratings yet
All-Electric Bus HVAC Solutions: Choose From A Range of Clean, Efficient Solutions
4 pages
Energies: Design Considerations For Wireless Charging Systems With An Analysis of Batteries
No ratings yet
Energies: Design Considerations For Wireless Charging Systems With An Analysis of Batteries
20 pages
Landscape Photography Case Study
No ratings yet
Landscape Photography Case Study
1 page
STM32C011x4/x6: Arm Cortex - M0+ 32-Bit MCU, 32 KB Flash, 6 KB RAM, 2 X USART, Timers, ADC, Comm. I/Fs, 2-3.6 V
No ratings yet
STM32C011x4/x6: Arm Cortex - M0+ 32-Bit MCU, 32 KB Flash, 6 KB RAM, 2 X USART, Timers, ADC, Comm. I/Fs, 2-3.6 V
92 pages
Digipass Authentication For OWA Forms Administrator Guide
No ratings yet
Digipass Authentication For OWA Forms Administrator Guide
72 pages
T790 Treadmill: Customer Support Services Service Manual
No ratings yet
T790 Treadmill: Customer Support Services Service Manual
61 pages
Hydrofloat Separator Brochure
No ratings yet
Hydrofloat Separator Brochure
8 pages
MONTHLY CHECKLIST (1)
No ratings yet
MONTHLY CHECKLIST (1)
14 pages
Design of Isolated Footing (Calculation Sheet) EGYPTION CODE
No ratings yet
Design of Isolated Footing (Calculation Sheet) EGYPTION CODE
23 pages
Creative 3d Business Powerpoint Template
No ratings yet
Creative 3d Business Powerpoint Template
27 pages
Clamshell Equipment - CT - 08 - 09 - 20 - Manisha Grace PDF
0% (1)
Clamshell Equipment - CT - 08 - 09 - 20 - Manisha Grace PDF
8 pages
Ballari Institute of Technology & Management: Fabrication of Kinematic Walker
No ratings yet
Ballari Institute of Technology & Management: Fabrication of Kinematic Walker
21 pages
Fitness Centre Management System Project
No ratings yet
Fitness Centre Management System Project
24 pages
CMOS VLSI Design 192
No ratings yet
CMOS VLSI Design 192
1 page
Template Format Test Case - BINAR Bootcamp Quality Assurance
No ratings yet
Template Format Test Case - BINAR Bootcamp Quality Assurance
4 pages
Government Polytechnic, Jalna: Micro Project Report ON
No ratings yet
Government Polytechnic, Jalna: Micro Project Report ON
8 pages

Parallel2 PDF

Uploaded by

Parallel2 PDF

Uploaded by

Parallel Architecture

Let’ s have two cores

Power = 2* C * V 2* F Performance = 2* C ores * F

B ut decrease freq uency by 50%

Power = 2* C * V 2/4* F /2 Performance = 2* C ores * F /2

Power = C * V 2/4 * F Performance = C ores * F

• also known as Amdahl's argument, is used to find the maximum expected

• The speedup of a program using multiple processors in parallel computing is

Execution time affected by improvement

Execution time before

• Next, calculate the speed-ups assuming the matrices

𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑤𝑤𝑤𝑤𝑤𝑤 ℎ 𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 = 100𝑡𝑡 + 10𝑡𝑡 = 110𝑡𝑡

The load is distributed according to the following

We have 100 additions then 5% of this load will be on one processor

According to Amdahl’s low

We have 5 additions over one processor so

Returning to our equation

Instruction-level parallelism (ILP)

Tightly Coupled Loosely Coupled

• Single instruction stream-single data stream (SISD)

• Single instruction stream-multiple data stream (SIMD)

• Multiple instruction stream-single data stream (MISD)

• Multiple instruction stream-multiple data stream (MIMD)

You might also like