0% found this document useful (0 votes)

24 views44 pages

OpenACC 1

This document provides an overview of parallel programming and introduces OpenACC. It defines parallel programming as exposing an algorithm's ability to execute tasks or operations in parallel to improve performance on modern hardware. A real-world example is presented where cancer detection was improved by simulating light propagation through tissue in parallel rather than sequentially. Amdahl's Law is discussed, noting that speedup from parallelization is limited by remaining serial parts of a program. Finally, OpenACC is introduced as a directive-based approach to parallel programming that aims for performance, portability, and ease of use across CPUs and GPUs.

Uploaded by

Ricardo Coelho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views44 pages

OpenACC 1

Uploaded by

Ricardo Coelho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

INTRODUCTION

MODULE OVERVIEW
Topics to be covered

▪ Introduction to parallel programming

▪ Common difficulties in parallel programming
▪ Introduction to OpenACC
▪ Parallel programming in OpenACC
INTRODUCTION TO PARALLEL
PROGRAMMING
WHAT IS PARALLEL PROGRAMMING?
▪ “Performance Programming”
A+B+C+D
▪ Parallel programming involves exposing an
algorithm’s ability to execute in parallel Sequential Parallel
A B C D A B C D
▪ This may involve breaking a large operation
into smaller tasks (task parallelism)
▪ Or doing the same operation on multiple
data elements (data parallelism)
▪ Parallel execution enables better
performance on modern hardware 2 Steps

3 Steps
A REAL WORLD CASE STUDY
Modern cancer research
▪ The Russian Academy of Science created a program to
simulate light propagation through human tissue
▪ This program was used to be able to more accurately
detect cancerous cells by simulating billions of random
paths that the light could take through human tissue
▪ With parallel programming, they were able to run
thousands of these paths simultaneously
▪ The sequential program took 2.5 hours to run
▪ The parallel version took less than 2 minutes

Parallel Computing Illuminating a Path to Early Cancer Detection

WHAT IS PARALLEL PROGRAMMING?
A real world example

▪ A professor and his 3 teaching assistants (TA) are grading T

1,000 student exams Prof
A
▪ This exam has 8 questions on it
▪ Let’s assume it takes 1 minute to grade 1 question on 1 x1000
exam 8 questions per exam
8,000 questions in total
▪ To maintain fairness, if someone grades a question (for
example, question #1) then they must grade that question
on all other exams
▪ The following is a sequential version of exam grading 1 minute per question
SEQUENTIAL SOLUTION

Grade Exams 1-1000 : Questions #1, 2, 3, 4, 5, 6, 7, 8 : 8000m

8000 m
SEQUENTIAL SOLUTION

Exams 1-1000 Exams 1-1000

: Q #1 : 1000m : Q #5 : 1000m

Exams 1-1000 Exams 1-1000

: Q #2 : 1000m : Q #6 : 1000m

Exams 1-1000 Exams 1-1000

: Q #3 : 1000m : Q #7 : 1000m

Exams 1-1000 Exams 1-1000

: Q #4 : 1000m : Q #8 : 1000m

8000+ m
SEQUENTIAL SOLUTION

Exams 1-1000 : Q #1, 2 :

2000m

Exams 1-1000 : Q #3, 4 :

2000m

Exams 1-1000 : Q #5, 6 :

2000m

Exams 1-1000 : Q #7, 8 :

2000m

8000+ m
PARALLEL SOLUTION

Exams 1-250 : Q #1, 2 : Exams 751-1000 : Q #1, 2 : Exams 501-750 : Q #1, 2 : Exams 251-500 : Q #1, 2 :
500m 500m 500m 500m

Exams 251-500 : Q #3, 4 : Exams 1-250 : Q #3, 4 : Exams 751-1000 : Q #3, 4 : Exams 501-750 : Q #3, 4 :
500m 500m 500m 500m

Exams 501-750 : Q #5, 6 : Exams 251-500 : Q #5, 6 : Exams 1-250 : Q #5, 6 : Exams 751-1000 : Q #5, 6 :
500m 500m 500m 500m

Exams 751-1000 : Q #7, 8 : Exams 501-750 : Q #7, 8 : Exams 251-500 : Q #7, 8 : Exams 1-250 : Q #7, 8 :
500m 500m 500m 500m

2000+ m
PIPELINE

Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2

2m 2m 2m 2m 2m 2m 2m

Q #3, 4 Q #3, 4 Q #3, 4 Q #3, 4 Q #3, 4 Q #3, 4

2m 2m 2m 2m 2m 2m

Q #5, 6 Q #5, 6 Q #5, 6 Q #5, 6 Q #5, 6

2m 2m 2m 2m 2m

Q #7, 8 Q #7, 8 Q #7, 8 Q #7, 8

2m 2m 2m 2m

2006+ m
PIPELINE STALL

Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2

2m 2m 2m 2m 2m

Q #3, 4 Q #3, 4 Q #3, 4 Q #3, 4

2m 2m 2m 2m

Q #5, 6 Q #5, 6 Q #5, 6

2m 2m 2m

Q #7, 8 Q #7, 8 Q #7, 8

2m 2m 2m

2006+ m
GRADING EXAMPLE SUMMARY
It’s critical to understand the problem before trying to parallelize it
▪ Can the work be done in an arbitrary order, or must it be done in sequential order?
▪ Does each task take the same amount of time to complete? If not, it may be
necessary to “load balance.”
In our example, the only restriction is that a single question be graded by a single
grader, so we could divide the work easily, but had to communicate periodically.
▪ This case study is an example of task-based parallelism. Each grader is assigned a
task like “Grade questions 1 & 2 on the first 500 tests”
▪ If instead each question could be graded by different graders, then we could have
data parallelism: all graders work on Q1 of the following tests, then Q2, etc.
AMDAHL’S LAW
AMDAHL’S LAW
Serialization Limits Performance

▪ Amdahl’s law is an observation that how much

speed-up you get from parallelizing the code is
limited by the remaining serial part.
▪ Any remaining serial code will reduce the
possible speed-up
▪ This is why it’s important to focus on
parallelizing the most time consuming parts,
not just the easiest.
APPLYING AMDAHL’S LAW
Estimating Potential Speed-up

▪ What’s the maximum speed-up that can be

obtained by parallelizing 50% of the code?
( 1 / 100% - 50% ) = (1 / 1.0 - 0.50 ) = 2.0X
Total Parallel
Runtime (50%)
▪ What’s the maximum speed-up that can be
obtained by parallelizing 25% of the code?
( 1 / 100% - 25% ) = (1 / 1.0 - 0.25 ) = 1.3X
Total Parallel
Runtime (25%)
▪ What’s the maximum speed-up that can be
obtained by parallelizing 90% of the code?
Total Parallel
( 1 / 100% - 90% ) = (1 / 1.0 - 0.90 ) = 10.0X Runtime (90%)
Total Serial Runtime
INTRODUCTION TO OPENACC
OpenACC is a directives- Add Simple Compiler Directive
based programming approach
main()
to parallel computing {
<serial code>
#pragma acc kernels
designed for performance {
<parallel code>
}
and portability on CPUs }

and GPUs for HPC.

3 WAYS TO ACCELERATE
APPLICATIONS

Applications

Compiler Programming
Libraries
Directives Languages

Easy to use Easy to use Most Performance

Most Performance Portable code Most Flexibility
OpenACC
OPENACC PORTABILITY
Describing a generic parallel machine

▪ OpenACC is designed to be portable to many

Host
existing and future parallel platforms Device
▪ The programmer need not think about specific
hardware details, but rather express the
parallelism in generic terms
▪ An OpenACC program runs on a host
(typically a CPU) that manages one or more Host
parallel devices (GPUs, etc.). The host and Memory
device(s) are logically thought of as having Device
separate memories. Memory
OPENACC
Three major strengths

Incremental Single Source Low Learning Curve

OPENACC

Incremental Begin with a working

Enhance Sequential Code sequential code.

#pragma acc parallel loop

▪ Maintain existing for( i = 0; i < N; i++ )
sequential code {
< loop code > Parallelize it with OpenACC.
▪ Add annotations to }
expose parallelism
#pragma acc parallel loop
▪ After verifying for( i = 0; i < N; i++ )
correctness, annotate {
more of the code Rerun the code to verify
< loop code >
} correct behavior,
remove/alter OpenACC
code as needed.
OPENACC

Incremental Single Source Low Learning Curve

▪ Maintain existing
sequential code
▪ Add annotations to
expose parallelism
▪ After verifying
correctness, annotate
more of the code
OPENACC

The compiler can ignore your

Supported Platforms Single Source OpenACC code additions, so the same
code can be used for parallel or
POWER sequential execution.
▪ Rebuild the same code
Sunway on multiple int main(){
int main(){
architectures
x86 CPU ...
...
▪ Compiler determines
x86 Xeon Phi how to parallelize for #pragma acc
for(int i = 0; i <
the desired machine parallel loop
N; i++)
NVIDIA GPU ▪ Sequential code is < loop
maintained code >
PEZY-SC }
}
OPENACC

Incremental Single Source Low Learning Curve

▪ Rebuild the same code

▪ Maintain existing
on multiple
sequential code
architectures
▪ Add annotations to
▪ Compiler determines
expose parallelism
how to parallelize for
▪ After verifying the desired machine
correctness, annotate
▪ Sequential code is
more of the code
maintained
OPENACC
Parallel Hardware
CPU

Low Learning Curve

▪ OpenACC is meant to
int main(){ be easy to use, and
The programmer will
easy to learn
<sequential give hints to the
code> compiler about which ▪ Programmer remains
parts of the code to in familiar C, C++, or
#pragma acc
Compiler parallelize. Fortran
kernels
Hint The compiler will then ▪ No reason to learn
{
generate parallelism low-level details of the
<parallel for the target parallel hardware.
code> hardware.
}

}
OPENACC

Incremental Single Source Low Learning Curve

▪ Rebuild the same code ▪ OpenACC is meant to

▪ Maintain existing be easy to use, and
on multiple
sequential code easy to learn
architectures
▪ Add annotations to ▪ Programmer remains
▪ Compiler determines
expose parallelism in familiar C, C++, or
how to parallelize for
▪ After verifying the desired machine Fortran
correctness, annotate ▪ No reason to learn
▪ Sequential code is
more of the code low-level details of the
maintained
hardware.
GAUSSIAN 16 ANSYS FLUENT VASP
Prof. Georg Kresse
Mike Frisch, Ph.D. Sunil Sathe
Computational
President and CEO Lead Software Developer
Materials Physics
Gaussian, Inc. ANSYS Fluent
University of Vienna

“Using OpenACC allowed us to “For VASP, OpenACC is the way

continue development of our “We’ve effectively used forward for GPU acceleration.
fundamental algorithms and OpenACC for heterogeneous Performance is similar and in
software capabilities computing in ANSYS Fluent with some cases better than CUDA C,
simultaneously with the GPU- impressive performance. We’re and OpenACC dramatically
related work. In the end, we now applying this work to more decreases GPU development and
could use the same code base of our models and new maintenance efforts. We’re
for SMP, cluster/ network and platforms.” excited to collaborate with NVIDIA
GPU parallelism. PGI's and PGI as an early adopter of
compilers were essential to the CUDA Unified Memory.”
success of our efforts.”
OPENACC SUCCESSES

LSDalton PowerGrid COSMO INCOMP3D

Quantum Chemistry Medical Imaging Weather and Climate CFD

Aarhus University University of Illinois MeteoSwiss, CSCS NC State University
12X speedup 40 days to 40X speedup
1 week 2 hours 3X energy efficiency 4X speedup

MAESTRO
NekCEM CASTRO CloverLeaf FINE/Turbo

CFD
Comp Electromagnetics Astrophysics Comp Hydrodynamics NUMECA
Argonne National Lab Stony Brook University AWE International
2.5X speedup 4.4X speedup 4X speedup 10X faster routines
60% less energy 4 weeks effort Single CPU/GPU code 2X faster app
OPENACC RESOURCES
Guides ● Talks ● Tutorials ● Videos ● Books ● Spec ● Code Samples ● Teaching Materials ● Events ● Success Stories ● Courses ● Slack ● Stack Overflow

Resources Success Stories

https://www.openacc.org/resources https://www.openacc.org/success-stories

FREE
Compilers

Compilers and Tools Events

https://www.openacc.org/tools https://www.openacc.org/events

https://www.openacc.org/community#slack
EXPRESSING PARALLELISM WITH
OPENACC
CODING WITH OPENACC
Array pairing example
void pairing(int *input, int *output, int N){

for(int i = 0; i < N; i++)

output[i] = input[i*2] +
input[i*2+1];
}

6 3 10 7 2 4 3 8 9 2 0 1
input

9 17 6 11 11 1
output
CODING WITH OPENACC
Array pairing example
subroutine pairing(input, output, N)

do i=1,N
output(i) = input(i*2) +
input(i*2+1);
end do
end subroutine
6 3 10 7 2 4 3 8 9 2 0 1
input

9 17 6 11 11 1
output
CODING WITH OPENACC
Array pairing example - parallel
void pairing(int *input, int *output, int N){
#pragma acc parallel loop
for(int i = 0; i < N; i++)
output[i] = input[i*2] +
input[i*2+1];
}

6 3 10 7 2 4 3 8 9 2 0 1
input

9 17 6 11 11 1
output
CODING WITH OPENACC
Array pairing example - parallel
subroutine pairing(input, output, N)
void pairing(int *input, int *output, int N){
!$acc parallel loop
do i=1,N
for(int i = 0; i < N; i++)
output(i) = input(i*2) +
output[i] = input[i*2] +
input(i*2+1);
input[i*2+1];
end do
}
end subroutine
6 3 10 7 2 4 3 8 9 2 0 1
input

9 17 6 11 11 1
output
DATA DEPENDENCIES
Not all loops are parallel
void pairing(int *a, int N){

for(int i = 1; i < N; i++)

a[i] = a[i] + a[i-1];
}

1 2
3 3
6 10
4 15
5 21
6 28
7 36
8 45
9 10
55

i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

DATA DEPENDENCIES
Not all loops are parallel
void pairing(int *a, int N){ If we attempted to parallelize this
#pragma acc parallel loop loop we would get wrong answers
for(int i = 1; i < N; i++) due to a forward dependency.
a[i] = a[i] + a[i-1];
}

1
✓
3
⌧ ⌧ ⌧ ⌧ ⌧ ⌧ ⌧ ⌧
6 10 15 21 28 36 45 55
Sequential
i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

1 2
3 3
5 4
9 5
9 15
6 13
7 21
8 17
9 10
27
✓ ⌧ ⌧⌧ ⌧⌧ ⌧⌧⌧ Parallel
DATA DEPENDENCIES
Not all loops are parallel
void pairing(int *a, int N){ Even changing how the iterations
#pragma acc parallel loop are parallelized will not make this
for(int i = 1; i < N; i++) loop safe to parallelize.
a[i] = a[i] + a[i-1];
}

1
✓
3
✓ ✓ ✓ ✓ ⌧ ⌧ ⌧ ⌧
6 10 15 21 28 36 45 55
Sequential
i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

1 2
3 3
6 10
4 15
5 21
6 13
7 21
8 30
9 10
40
✓ ✓ ✓ ✓ ✓ ⌧ ⌧ ⌧⌧ Parallel
DATA DEPENDENCIES
Not all loops are parallel
subroutine pairing(a, N)

do i = 1,N
a(i) = a(i) + a(i-1)
end do
end subroutine

1 2
3 3
6 10
4 15
5 21
6 28
7 36
8 45
9 10
55

i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

DATA DEPENDENCIES
Not all loops are parallel
subroutine pairing(a, N)
void pairing(int *a, int N){ If we attempted to parallelize this
!$acc parallel loop
do i = 1,N loop we would get wrong answers
for(int i = 1; i < N; i++) due to a forward dependency.
a(i) = a(i) + a(i-1)
a[i] = a[i] + a[i-1];
end do
}
end subroutine

1
✓
3
⌧ ⌧ ⌧ ⌧ ⌧ ⌧ ⌧ ⌧
6 10 15 21 28 36 45 55
Sequential
i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

1 2
3 3
5 4
9 5
9 15
6 13
7 21
8 17
9 10
27
✓ ⌧ ⌧⌧ ⌧⌧ ⌧⌧⌧ Parallel
DATA DEPENDENCIES
Not all loops are parallel
subroutine pairing(a, N)
void pairing(int *a, int N){ Even changing how the iterations
!$acc parallel loop
do i = 1,N are parallelized will not make this
for(int i = 1; i < N; i++) loop safe to parallelize.
a(i) = a(i) + a(i-1)
a[i] = a[i] + a[i-1];
end do
}
end subroutine

1
✓
3
✓ ✓ ✓ ✓ ⌧ ⌧ ⌧ ⌧
6 10 15 21 28 36 45 55
Sequential
i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

1 2
3 3
6 10
4 15
5 21
6 13
7 21
8 30
9 10
40
✓ ✓ ✓ ✓ ✓ ⌧ ⌧ ⌧⌧ Parallel
MODULE 1 REVIEW
CLOSING SUMMARY
Module One: Introduction

▪ Parallel programming is a technique of utilizing modern hardware to do lots of work

all at once.
▪ Amdahl’s law is the gravity of parallel programming, break this law at your own peril.
▪ Not all loops are parallel, but often can be rewritten to be parallelizable
▪ OpenACC is a high level model for generating parallel code from serial loops
THANK YOU

Why Parallel Computing?: Peter Pacheco
No ratings yet
Why Parallel Computing?: Peter Pacheco
84 pages
OpenACC Programming Guide 0 0
No ratings yet
OpenACC Programming Guide 0 0
73 pages
ch3 Parallel PDF
0% (1)
ch3 Parallel PDF
76 pages
HPC Note
No ratings yet
HPC Note
39 pages
Introduction to Scientific Programming
No ratings yet
Introduction to Scientific Programming
17 pages
L04 Parallel Programming Models I
No ratings yet
L04 Parallel Programming Models I
72 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
4-DesigningParallelPrograms
No ratings yet
4-DesigningParallelPrograms
69 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
BDS Session 6
No ratings yet
BDS Session 6
53 pages
Introduction To OpenACC Course 20161026 1550 1
No ratings yet
Introduction To OpenACC Course 20161026 1550 1
68 pages
01_Lecture Intro to HPC
No ratings yet
01_Lecture Intro to HPC
62 pages
in3200-chap05
No ratings yet
in3200-chap05
34 pages
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
No ratings yet
Parallel Algorithms: Theory and Practice: Deterministi C Parallelism
51 pages
04_progbasics
No ratings yet
04_progbasics
51 pages
PDC-3
No ratings yet
PDC-3
26 pages
chapter_1
No ratings yet
chapter_1
47 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Chapter1 (1)
No ratings yet
Chapter1 (1)
39 pages
CS526 3 Design of Parallel Programs
No ratings yet
CS526 3 Design of Parallel Programs
83 pages
Lectures5 14
No ratings yet
Lectures5 14
85 pages
popl24-par-manage
No ratings yet
popl24-par-manage
32 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
L1.3a HPC Concepts
No ratings yet
L1.3a HPC Concepts
43 pages
03 (Parallel Software)
No ratings yet
03 (Parallel Software)
38 pages
3-Parallel Software
No ratings yet
3-Parallel Software
35 pages
01 Introduction
No ratings yet
01 Introduction
41 pages
E- Notes -HPC-Unit 3-1
No ratings yet
E- Notes -HPC-Unit 3-1
26 pages
Perspective On Parallel Programming: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
Perspective On Parallel Programming: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
42 pages
04 Progbasics
No ratings yet
04 Progbasics
62 pages
How To Parallelise An Application
No ratings yet
How To Parallelise An Application
30 pages
Week 7
No ratings yet
Week 7
27 pages
Cours 2
No ratings yet
Cours 2
25 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Watercolor Organic Shapes SlidesMania
No ratings yet
Watercolor Organic Shapes SlidesMania
23 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
No ratings yet
Overview of Parallel Programming in C++ - Pablo Halpern - CppCon 2014
37 pages
ParallelIzation Principles
No ratings yet
ParallelIzation Principles
40 pages
High Performance Computing: Sabah Sayed
No ratings yet
High Performance Computing: Sabah Sayed
22 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
Cse570 Zola
No ratings yet
Cse570 Zola
11 pages
Openacc Online Course: Lecture 1: Introduction To Openacc
No ratings yet
Openacc Online Course: Lecture 1: Introduction To Openacc
47 pages
02 - Introduction To Concurrent Systems PDF
No ratings yet
02 - Introduction To Concurrent Systems PDF
31 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Digital Assignment
No ratings yet
Digital Assignment
2 pages
Parallel N Distributed Systems
No ratings yet
Parallel N Distributed Systems
44 pages
Computação Paralela
No ratings yet
Computação Paralela
18 pages
HND in Computing Networking Unit 2 Assignment 1 Sample
100% (1)
HND in Computing Networking Unit 2 Assignment 1 Sample
85 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
No ratings yet
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
24 pages
.Trashed-1650000204-Hpc Prac Exam
No ratings yet
.Trashed-1650000204-Hpc Prac Exam
5 pages
1.6 Final Thoughts: 1 Parallel Programming Models 49
No ratings yet
1.6 Final Thoughts: 1 Parallel Programming Models 49
5 pages
IT105 Midterm Lecture Part1
No ratings yet
IT105 Midterm Lecture Part1
5 pages
Fusion Payables - Managing Invoice Approval Rules
No ratings yet
Fusion Payables - Managing Invoice Approval Rules
40 pages
Super 30 Course Syllabus
No ratings yet
Super 30 Course Syllabus
1 page
ICETECC-25 Program_Final_Version
No ratings yet
ICETECC-25 Program_Final_Version
13 pages
Jawaban ASAS 25
No ratings yet
Jawaban ASAS 25
173 pages
CAT I - SET 2 - E-COMMERCE SECURITY
No ratings yet
CAT I - SET 2 - E-COMMERCE SECURITY
6 pages
Unit 3 Process Deadlocks
No ratings yet
Unit 3 Process Deadlocks
30 pages
Ict Study Notes Chapter 1-8
No ratings yet
Ict Study Notes Chapter 1-8
7 pages
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
No ratings yet
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
4 pages
Learner Guide
No ratings yet
Learner Guide
39 pages
3.1 MATLAB - Simulink Masterclass For Science and Engineering Course Outline
No ratings yet
3.1 MATLAB - Simulink Masterclass For Science and Engineering Course Outline
2 pages
neosniper08161713608837336
No ratings yet
neosniper08161713608837336
4 pages
Multimedia Information and Media Quiz
No ratings yet
Multimedia Information and Media Quiz
9 pages
VxRail Appliance - VxRail Customer Installation Procedures-Yes
No ratings yet
VxRail Appliance - VxRail Customer Installation Procedures-Yes
147 pages
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages
FEE - Lab 2 Zhiger
No ratings yet
FEE - Lab 2 Zhiger
13 pages
Leica LS10 LS15 User Manual
No ratings yet
Leica LS10 LS15 User Manual
106 pages
Resume Abdul Rab
0% (1)
Resume Abdul Rab
4 pages
Sandeep Kumar Pal INDUSTRIAL TRAINING REPORT
No ratings yet
Sandeep Kumar Pal INDUSTRIAL TRAINING REPORT
17 pages
China Osint
No ratings yet
China Osint
5 pages
Configuring Ipsec Site-To-Site VPN Using SDM
No ratings yet
Configuring Ipsec Site-To-Site VPN Using SDM
35 pages
Grade 1 - Pause & Think Online - Lesson Slides
No ratings yet
Grade 1 - Pause & Think Online - Lesson Slides
15 pages
Wireless Overview Slides
No ratings yet
Wireless Overview Slides
26 pages
Crash
No ratings yet
Crash
1 page
Iso 17025 Whitepaper en
No ratings yet
Iso 17025 Whitepaper en
10 pages
CNC Machine Based On Laser Engraver Using Arduino UNO
100% (1)
CNC Machine Based On Laser Engraver Using Arduino UNO
4 pages
Math 7 Quiz
No ratings yet
Math 7 Quiz
1 page
02 NFS2-640 Hardware
No ratings yet
02 NFS2-640 Hardware
11 pages
Open Order Open Order View GP Open Order View
No ratings yet
Open Order Open Order View GP Open Order View
4 pages
Comparative Analysis of DFS, BFS and Dijkstra Algorithms To Determine The Shortest Route On A Geographic Map PDF
No ratings yet
Comparative Analysis of DFS, BFS and Dijkstra Algorithms To Determine The Shortest Route On A Geographic Map PDF
6 pages
Unity University: Department of Computer Sciences
No ratings yet
Unity University: Department of Computer Sciences
4 pages
READING 2 UNIT 11 Old-Inventions-Reading-Comprehension
No ratings yet
READING 2 UNIT 11 Old-Inventions-Reading-Comprehension
2 pages

OpenACC 1

Uploaded by

OpenACC 1

Uploaded by

INTRODUCTION

▪ Introduction to parallel programming

Parallel Computing Illuminating a Path to Early Cancer Detection

▪ A professor and his 3 teaching assistants (TA) are grading T

Grade Exams 1-1000 : Questions #1, 2, 3, 4, 5, 6, 7, 8 : 8000m

Exams 1-1000 Exams 1-1000

Exams 1-1000 Exams 1-1000

Exams 1-1000 Exams 1-1000

Exams 1-1000 Exams 1-1000

Exams 1-1000 : Q #1, 2 :

Exams 1-1000 : Q #3, 4 :

Exams 1-1000 : Q #5, 6 :

Exams 1-1000 : Q #7, 8 :

Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2

Q #3, 4 Q #3, 4 Q #3, 4 Q #3, 4 Q #3, 4 Q #3, 4

Q #5, 6 Q #5, 6 Q #5, 6 Q #5, 6 Q #5, 6

Q #7, 8 Q #7, 8 Q #7, 8 Q #7, 8

Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2 Q #1, 2

Q #3, 4 Q #3, 4 Q #3, 4 Q #3, 4

Q #5, 6 Q #5, 6 Q #5, 6

Q #7, 8 Q #7, 8 Q #7, 8

▪ Amdahl’s law is an observation that how much

▪ What’s the maximum speed-up that can be

and GPUs for HPC.

Easy to use Easy to use Most Performance

▪ OpenACC is designed to be portable to many

Incremental Single Source Low Learning Curve

Incremental Begin with a working

#pragma acc parallel loop

Incremental Single Source Low Learning Curve

The compiler can ignore your

Incremental Single Source Low Learning Curve

▪ Rebuild the same code

Low Learning Curve

Incremental Single Source Low Learning Curve

▪ Rebuild the same code ▪ OpenACC is meant to

“Using OpenACC allowed us to “For VASP, OpenACC is the way

LSDalton PowerGrid COSMO INCOMP3D

Quantum Chemistry Medical Imaging Weather and Climate CFD

Resources Success Stories

Compilers and Tools Events

for(int i = 0; i < N; i++)

for(int i = 1; i < N; i++)

i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

▪ Parallel programming is a technique of utilizing modern hardware to do lots of work

You might also like