0% found this document useful (0 votes)

17 views41 pages

01 Introduction

Uploaded by

chirag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views41 pages

01 Introduction

Uploaded by

chirag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

CS516: Parallelization of Programs

Introduction

Vishwesh Jatala
Assistant Professor
Department of CSE
Indian Institute of Technology Bhilai
[email protected]

2023-24 W
1
What is the output?

2
Outline of Today’s Lecture

■ Why?
■ What?
■ How?

3
Moore’s Law

■ The number of transistors on a IC doubles about every

two years

4
Moore’s Law Effect

5
Moore’s Law Effect

Single Core Performance

Power and Heat Multicore Processors

6
Parallel Architectures are Everywhere!

7
Parallel Hardwares

Distributed CPUs

Multicores

GPUs
8
Hardware and Software

Core

L1 Cache
Output

L2 Cache

DRAM
Sequential

Single-core CPU

9
Hardware and Software

Core1 Core2

L1 Cache L1 Cache

Core3 Core4

L1 Cache L1 Cache

L2 Cache
Same sequential?
DRAM

Multi-core CPU

10
Professor P

15 questions
300 Answer sheets

11
Professor P’s Teaching Assistants

TA#1
TA#2 TA#3

12
Benefits of Parallel Programming

Fast: Less Save Money

execution time

Solves Larger Problem

13
Parallel Programming Applications

Deep Learning and

Machine Learning Medical Imaging

Climate Modeling
CFD

14
Parallel Programming Applications

OpenAI used a super computer

with more than 285,000 CPU cores,
10,000 GPUs and
400 gigabits per second of network
connectivity for each GPU server.

15
Challenges!

16
Example-1

Sequential Execution

Core-0

for (int i=0; i<5; i++)

A[i]= i A[0]=0

A[1]=1

A[2]=2

A[3]=3

A[4]=4

17
Example-1

for (int i=0; i<5; i++)

A[i]= i

Parallel Execution

Core-0 Core-1 Core-2 Core-3 Core-4

A[0]=0 A[1]=1 A[2]=2 A[3]=3 A[4]=4

18
Example-2

Sequential Execution

Core-0

int count = 0;
A[0]=0
for (int i=0; i<5; i++)
A[i]= count++;
A[1]=1

A[2]=2

A[3]=3

A[4]=4

19
Example-2

int count = 0;
for (int i=0; i<5; i++)
A[i]= count++;

Parallel Execution

Core-0 Core-1 Core-2 Core-3 Core-4

A[0]=1 A[1]=2 A[2]=0 A[3]=4 A[4]=3

20
Challenges:

Detecting Parallelism is Hard!

21
Example-3: Sequential Version

int sum = 0;
for (i = 0; i < n; i++) {
x = f(i);
sum = sum+x;
}

A Sequential Program for Sum

Core-0

1 4 3 9 2 8 5 1 1 6 2 7 2 5 0 4 1 8 6 5 1 2 3 9

sum = 95

22
Example-3: Parallel Version-1

my_sum = 0;
my_first i = . . . ;
my_last i = . . . ;
for (my_i = my_first i; my_i < p_last_i; my_i++) {
my_x = f(i);
my_sum += my_x;
}

Core-0 Core-1 Core-2 Core-3 Core-4 Core-5 Core-6 Core-7

1 4 3 9 2 8 5 1 1 6 2 7 2 5 0 4 1 8 6 5 1 2 3 9

my_sum 8 19 7 15 7 13 12 14

23
Example-3: Parallel Version-1

if (I’m the master core) {

sum = my_sum;
for (each core other than myself) {
receive value from core;
sum += value;
}
}
else
{ send my_sum to the master; }

Core-0 Core-1 Core-2 Core-3 Core-4 Core-5 Core-6 Core-7

1 4 3 9 2 8 5 1 1 6 2 7 2 5 0 4 1 8 6 5 1 2 3 9

my_sum 8 19 7 15 7 13 12 14

sum 95 (8+19+7+15+7+13+12+14)

24
Example-3: Parallel Version-2

25
Parallel Version-1 or Version-2?

■ Both have same number of operations

■ Version-1 sum is sequential
■ Version-2 exposes parallelism

26
Challenges:

Detecting Parallelism is Hard!

Communication

Synchronization

27
Example-4:

int sum = 0;
for (i = 0; i < n; i++) {
x = f(i);
sum = sum+x;
}

28
Challenges:

Detecting Parallelism is Hard!

Communication

Synchronization

Load Balancing

29
What did we learn so far?

Sequential
Parallel Hardwares
Programs are
are Inevitable
Inefficient

Parallelization is
Challenges!
promising

30
Let’s discuss details from next class!

31
Course Outline (Part-1)

■ Introduction (this lecture)

■ Overview of Parallel Architectures and
Programming models
■ Amdahl's law and Performance
■ Parallel programming
❑ GPUs and CUDA programming

❑ Optimizations
■ Case studies
■ Extracting Parallelism from Sequential Programs
Automatically

32
Course Logistics

■ Lecture Hours:
❑ Mon, Tue, Thursday 10:30 am - 11:25 am

■ Course Website: Canvas platform

❑ Lecture notes
❑ Assignments
❑ Project
❑ Discussions
❑ Marks

33
Course Logistics: Evaluation Scheme

■ Evaluation scheme (can be changed slightly):

❑ Exams: ~40%
❑ Project: ~35%
❑ Attendance: ~10%
❑ Assignments (Paper presentation?): 15%

■ Attendance
❑ 0% - 50%: 0 Marks
❑ >50%: Marks will be awarded out of 10 accordingly.
❑ Example:
■ Total sessions: 16
■ #sessions attended = 7 (<50%), marks = 0
■ #sessions attended = 10 (62.5%), marks = 2.5 (2*10/8)

34
Course Logistics: References

■ Lecture notes will be available on Canvas

■ Reference material will be provided on Canvas
■ Text book for extracting parallelism:
❑ Randy Allen, Ken Kennedy,Optimizing Compilers for

Modern Architectures: A Dependence-based

Approach, Morgan Kaufmann, 2001

35
Course Logistics: Tools

■ Platforms:
❑ Prefer Google Colab for GPUs and CUDA Programming.

❑ A demo session

36
Course Logistics

■ Evaluation Policy:
❑ Acknowledge all the sources
❑ Do not cheat

37
Outcome of the Course?

■ State-of-the-art techniques in parallel computing

■ Develop parallel programming skills
■ Transferable skills -
❑ Parallel programming is used in multiple disciplines
❑ Industries
❑ Education and research
■ Handle projects
■ High-performance computing

38
Course Logistics

Office: B-010

[email protected]

39
References

■ https://www.cse.iitd.ac.in/~soham/COL380/page.html
■ https://s3.wp.wsu.edu/uploads/sites/1122/2017/05/6-
9-2017-slides-vFinal.pptx
■ Miscellaneous resources on internet

40
Thank you!

Hardware Support For Exposing More Parallelism at Compile Time
100% (1)
Hardware Support For Exposing More Parallelism at Compile Time
3 pages
Instruction Set of 8086
No ratings yet
Instruction Set of 8086
69 pages
03 (Parallel Software)
No ratings yet
03 (Parallel Software)
38 pages
01 (Why Parallel Computing)
No ratings yet
01 (Why Parallel Computing)
24 pages
AVR單片機ATtiny15資料(中文版)
No ratings yet
AVR單片機ATtiny15資料(中文版)
39 pages
COA Home Assignment-1
No ratings yet
COA Home Assignment-1
5 pages
PDC Lecture 01
No ratings yet
PDC Lecture 01
36 pages
Programming On Parallel Machines
100% (1)
Programming On Parallel Machines
344 pages
CIT314_2022_1
No ratings yet
CIT314_2022_1
2 pages
How Does Microprocessor Work - Attempt Review
No ratings yet
How Does Microprocessor Work - Attempt Review
6 pages
01 Introduction
No ratings yet
01 Introduction
32 pages
Digital Electronics Addressing Mode
No ratings yet
Digital Electronics Addressing Mode
19 pages
Lecture 01
No ratings yet
Lecture 01
28 pages
CS526_1_Intro
No ratings yet
CS526_1_Intro
10 pages
Sap 3
No ratings yet
Sap 3
14 pages
Classification of Instructions Based On Addressing Modes and Comparisons
No ratings yet
Classification of Instructions Based On Addressing Modes and Comparisons
21 pages
CS526 1 Intro
No ratings yet
CS526 1 Intro
15 pages
ch2 PC
No ratings yet
ch2 PC
44 pages
Comp322 s19 Lec01 Slides v1 PDF
No ratings yet
Comp322 s19 Lec01 Slides v1 PDF
17 pages
Presented by G.Kumaran AP/ECE: Sri Shakthi
No ratings yet
Presented by G.Kumaran AP/ECE: Sri Shakthi
24 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
Combinepdf
No ratings yet
Combinepdf
14 pages
Cours 1
No ratings yet
Cours 1
38 pages
19EC408 QR
No ratings yet
19EC408 QR
8 pages
User's Manual: CNPS9900 Max
No ratings yet
User's Manual: CNPS9900 Max
11 pages
MP Lecture DMA
No ratings yet
MP Lecture DMA
14 pages
Presentation 3
No ratings yet
Presentation 3
63 pages
Dynamic Scheduling Using Tomasulo's Algorithm: Lotzi Bölöni
No ratings yet
Dynamic Scheduling Using Tomasulo's Algorithm: Lotzi Bölöni
54 pages
CS-3006_2_PDC_Overview_compressed
No ratings yet
CS-3006_2_PDC_Overview_compressed
107 pages
Week 13-14 PC Stack & Subroutines
No ratings yet
Week 13-14 PC Stack & Subroutines
45 pages
CS4230 Parallel Programming Introduction To Parallel Algorithms
No ratings yet
CS4230 Parallel Programming Introduction To Parallel Algorithms
25 pages
PDA_1
No ratings yet
PDA_1
72 pages
LECTURE 3 Pipelining
No ratings yet
LECTURE 3 Pipelining
27 pages
Perspective On Parallel Programming: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
No ratings yet
Perspective On Parallel Programming: CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley
42 pages
OpenACC 1
No ratings yet
OpenACC 1
44 pages
HPC Module 4
No ratings yet
HPC Module 4
18 pages
13 Wrapup
No ratings yet
13 Wrapup
21 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Cse570 Zola
No ratings yet
Cse570 Zola
11 pages
Parallel and Distributed Computing [CC 510]
No ratings yet
Parallel and Distributed Computing [CC 510]
4 pages
CS4961 Parallel Programming: Course Details
No ratings yet
CS4961 Parallel Programming: Course Details
7 pages
Syllabus
No ratings yet
Syllabus
2 pages
multicore02-2
No ratings yet
multicore02-2
18 pages
(Cpre 381) Computer Organization and Assembly-Level Programming, Fall 2018 Project A Report
No ratings yet
(Cpre 381) Computer Organization and Assembly-Level Programming, Fall 2018 Project A Report
10 pages
PDC-3
No ratings yet
PDC-3
26 pages
System Bus
No ratings yet
System Bus
3 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Faculty of Engineering Science and Technology HIIT, Hamdard University
No ratings yet
Faculty of Engineering Science and Technology HIIT, Hamdard University
2 pages
Lec6 - TLP Data Dependence Solutions
No ratings yet
Lec6 - TLP Data Dependence Solutions
20 pages
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
No ratings yet
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
4 pages
Embedded System
No ratings yet
Embedded System
25 pages
04 Progbasics
No ratings yet
04 Progbasics
62 pages
Cours 1
No ratings yet
Cours 1
38 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Unit1 RMD PDF
No ratings yet
Unit1 RMD PDF
27 pages
CS4230 Parallel Programming: Mary Hall August 21, 2012
No ratings yet
CS4230 Parallel Programming: Mary Hall August 21, 2012
17 pages
Computação Paralela
No ratings yet
Computação Paralela
18 pages
Par Proc Book
No ratings yet
Par Proc Book
335 pages
Pipelining vs. Parallel Processing
No ratings yet
Pipelining vs. Parallel Processing
23 pages
Handbook HPC 23-24
No ratings yet
Handbook HPC 23-24
18 pages
Subject Name Parallel and Distributed Computing
100% (1)
Subject Name Parallel and Distributed Computing
3 pages
02 - Introduction To Concurrent Systems PDF
No ratings yet
02 - Introduction To Concurrent Systems PDF
31 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
No ratings yet
Parallel Programming: Aaron Bloomfield CS 415 Fall 2005
24 pages
Multi Core Processor
No ratings yet
Multi Core Processor
11 pages
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
No ratings yet
CS 133 Parallel & Distributed Computing: Course Instructor: Adam Kaplan Lecture #1: 4/2/2012
22 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Architecture of Pentium Microprocessor
67% (3)
Architecture of Pentium Microprocessor
3 pages
BCS302 DDCO Module 5 - Notes-Aruna
No ratings yet
BCS302 DDCO Module 5 - Notes-Aruna
15 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
Bogo Mips
No ratings yet
Bogo Mips
42 pages
MPMC Print PDF
No ratings yet
MPMC Print PDF
185 pages
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
Design and Development of A Model For Parallelization of Sequential Program For Execution On Multicore Architecture
No ratings yet
Design and Development of A Model For Parallelization of Sequential Program For Execution On Multicore Architecture
19 pages
Lecture1 Introduction PDF
No ratings yet
Lecture1 Introduction PDF
43 pages
Electrical Engineering 2016 2016 2016 2016 2016: Microprocessor
No ratings yet
Electrical Engineering 2016 2016 2016 2016 2016: Microprocessor
7 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
Ec2304 Microprocessor and Microcontroller L T P C
No ratings yet
Ec2304 Microprocessor and Microcontroller L T P C
2 pages
Đề Thi Và Đáp Án Kiến Trúc Máy Tính Giữa Kỳ 1 Năm Học 2021-2022 - UET
No ratings yet
Đề Thi Và Đáp Án Kiến Trúc Máy Tính Giữa Kỳ 1 Năm Học 2021-2022 - UET
8 pages
Parallel
No ratings yet
Parallel
4 pages
Computer Science, Career and Job
From Everand
Computer Science, Career and Job
Ramkrishna Ghosh
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet

01 Introduction

Uploaded by

01 Introduction

Uploaded by

CS516: Parallelization of Programs

■ The number of transistors on a IC doubles about every

Single Core Performance

Power and Heat Multicore Processors

Fast: Less Save Money

Solves Larger Problem

Deep Learning and

OpenAI used a super computer

for (int i=0; i<5; i++)

for (int i=0; i<5; i++)

Core-0 Core-1 Core-2 Core-3 Core-4

A[0]=0 A[1]=1 A[2]=2 A[3]=3 A[4]=4

Core-0 Core-1 Core-2 Core-3 Core-4

A[0]=1 A[1]=2 A[2]=0 A[3]=4 A[4]=3

Detecting Parallelism is Hard!

A Sequential Program for Sum

Core-0 Core-1 Core-2 Core-3 Core-4 Core-5 Core-6 Core-7

if (I’m the master core) {

Core-0 Core-1 Core-2 Core-3 Core-4 Core-5 Core-6 Core-7

■ Both have same number of operations

Detecting Parallelism is Hard!

Detecting Parallelism is Hard!

■ Introduction (this lecture)

■ Course Website: Canvas platform

■ Evaluation scheme (can be changed slightly):

■ Lecture notes will be available on Canvas

Modern Architectures: A Dependence-based

■ State-of-the-art techniques in parallel computing

You might also like