0% found this document useful (0 votes)
5 views

1.Introduction

The document outlines the logistics and grading policy for the Parallel Computing course (CS 633) taught by Preeti Malakar at IIT Kanpur, including class hours, office hours, and communication protocols. It details the structure of assignments, attendance requirements, and the importance of academic integrity, particularly regarding plagiarism and the use of AI tools. The document also introduces key concepts in parallel computing, including multicore systems, performance measures, and programming models.

Uploaded by

1none2none3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

1.Introduction

The document outlines the logistics and grading policy for the Parallel Computing course (CS 633) taught by Preeti Malakar at IIT Kanpur, including class hours, office hours, and communication protocols. It details the structure of assignments, attendance requirements, and the importance of academic integrity, particularly regarding plagiarism and the use of AI tools. The document also introduces key concepts in parallel computing, including multicore systems, performance measures, and programming models.

Uploaded by

1none2none3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Parallel Computing (CS 633)

January 6, 2025

Preeti Malakar
[email protected]
Logistics
• Class hours: MW 3:30 – 5:00 PM (L16)
• Office hour: W 5:00 – 6:00 PM (KD 221)
• https://www.cse.iitk.ac.in/users/cs633/2024-25-2
– Lectures will be uploaded after every class
• Extra class/quiz/doubts: Saturday 11 AM – 12 PM
• Announcements/uploads on
– MooKIT
– Course email alias
• Email to the instructor should always be prefixed with
[CS633] in the subject 2
Switch OFF All Devices

3
4
Grading Policy
75% attendance is
compulsory for this Participate actively in class

course

5
Lectures
• Lecture slides are pointers for the topic
– They won’t be as verbose as a book!
• In case you miss a class, please ensure you are
up to date with the lecture content
– Either ask your friend
– Or, ask the instructor (Saturday class)

6
Assignment

• One programming assignment in C


• In a group (group size = 4 or 5)
– Send group member information by Jan 14 via
Google forms (link will be shared on Jan 8)
– Include clearly names, roll numbers, IITK
email-ids
• Mode of submission will be explained in due
time
7
Assignment

• Timeline: Early February to Early March


• Credit for early submission
• Penalty for late submission
• Cannot be completed in a day!
• Discussion is NOT allowed outside your group
– You are responsible to maintain your code
and report within your group only

8
Plagiarism

Plagiarism will NOT be tolerated


Use of AI tools is NOT allowed

9
Lecture 1

Introduction
Multicore Era
CPU
Intel 4004
(1971)
Single core
single chip

Single core Hydra Multiple cores


(2000)
multiple chips single chip
Cray X-MP IBM POWER4
(1982) (2001)
Multiple cores
multiple chips

11
Moore’s Law (1965)
Number of transistors in a chip doubles every 18 months

[Source: Wikipedia]
“However, it must be programmed with a more complicated parallel programming
12
model to obtain maximum performance.”
Trends

[Source: M. Frans Kaashoek, MIT]


13
14
top500.org (Nov’24)

~ $600 million
~ 7300 sq. ft.
~ 22 MW power
~ 23000 L water
15
Top #1 Supercomputer
https://www.top500.org/resources/top-systems/

16
green500.org (Nov’23)

Metric of interest: Performance per Watt 17


18
https://hpl-mxp.org/

19
Making of a Supercomputer

Source: energy.gov 20
Greenest Data Centre?

Source: MIT TR 06/19

21
“The 149,000 square
foot facility built on a
hillside overlooking the
UC Berkeley campus
and San Francisco Bay
will house one of the
most energy-efficient
computing centers
anywhere, tapping into
the region’s mild
climate to cool the
supercomputers at the
National Energy
Research Scientific
Computing Center
(NERSC) and eliminating
the need for
mechanical cooling. ”

https://www.science.org/content/article/climate-change-threatens-supercomputers 22
Top Supercomputers from India (Nov’23)

23
2024…

24
Supercomputing in India [topsc.cdacb.in, Jul’24]

25
Source: www.iitk.ac.in
26
Credit: Ashish Kuvelkar, CDAC
27
National Supercomputing Mission Sites

28
Big Compute

29
Massively Parallel Codes

Climate simulation of Earth [Credit: NASA]


30
Discretization

Gridded mesh for a global model [Credit: Tompkins, ICTP]

31
Numerical Weather Models

• Use numerical methods to solve equations


that govern atmospheric processes
• Are based on fluid dynamics and depend on
observations of meteorological variables
• Are used to obtain nowcast/forecast

32
Massively Parallel Simulations

Self-healing material simulation


[Nomura et al., “Nanocarbon synthesis by high-temperature
oxidation of nanoparticles”, Scientific Reports, 2016] 33
Massively Parallel Analysis

[Nomura et al., “Nanocarbon synthesis by high-temperature


oxidation of nanoparticles”, Scientific Reports, 2016]
34
Massively Parallel Codes

Cosmological simulation [Credit: ANL]


35
Massively Parallel Analysis
Virgo Consortium

36
Computational Science

[Source: Culler, Singh and Gupta] 37


Big Data

38
Output Data
10 PB / year

High-
2 PB / simulation
energy
Scaled to 786K cores on Mira
physics
Higgs boson simulation
Source: CERN
240 TB / simulation

Cosmology
Q Continuum simulation
Source: Salman Habib et al.

Climate/weather
Hurricane simulation
Source: NASA 39
Input Data

[Credit: World Meteorological Organization]


40
System Architecture Trends

[Credit: Pavan Balaji@ATPESC’17] 41


I/O trends

NERSC I/O trends [Credit: www.nersc.gov]


42
Compute vs. I/O trends
I/O VS. FLOPS FOR #1 SUPERCOMPUTER IN TOP500 LIST
1.00E-03

1.00E-04
Byte/FLOP

1.00E-05

1.00E-06
1997 2001 2004 2008 2010 2011 2013 2015 2018

43
Why Parallel?

A*
20 hours

2 hours
Not really
44
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

45
Speedup
Example – Sum of squares of N numbers
Serial Parallel

for i = 1 to N for i = 1 to N/P


sum += a[i] * a[i] sum += a[i] * a[i]
collate result

O(N) O(N/P) +
Communication time
46
Performance Measure
• Speedup
Time ( 1 processor)
SP =
Time ( P processors)

• Efficiency
SP
EP =
P

47
Parallel Performance (Parallel Sum)
Parallel efficiency of summing 10^7 doubles

#Processes Time (sec) Speedup Efficiency


1 0.025 1 1.00
2 0.013 1.9 0.95
4 0.010 2.5 0.63
8 0.009 2.8 0.35
12 0.007 3.6 0.30

48
Ideal Speedup
Speedup Linear
Superlinear

Sublinear

Processors
49
Issue – Scalability

[Source: M. Frans Kaashoek, MIT]


50
Scalability Bottleneck

Performance of weather simulation application


51
Scalability and Performance

52
C vs. Python Parallel Performance

Performance Analysis of C and Python Parallel Implementations on a Multicore System Using Particle
Simulation, 2024 53
Parallelism
A parallel computer is a collection of processing
elements that communicate and cooperate to solve
large problems fast.

– Almasi and Gottlieb (1989)

54
Distributed Memory Systems

• Networked systems
Node • Distributed memory
• Local memory
• Remote memory
• Parallel
Codefile system

Cluster
55
Parallel Programming Models
Libraries MPI, TBB, Pthread, OpenMP, …
New languages Haskell, X10, Chapel, …
Extensions Coarray Fortran, UPC, Cilk, OpenCL, …

• Shared memory
– OpenMP, Pthreads, …
• Distributed memory
– MPI, UPC, …
• Hybrid
– MPI + OpenMP
56
This course …

57
Large-scale Parallel Computing

Message Parallel
passing algorithms

Designing Performance
parallel codes analysis

58
Message Passing Paradigm

• Point-to-point (P2P) communications


• Collective communications
• Algorithms
• Performance

59
Profiling

60
Parallel I/O
NOT SHARED

2 GB/s SHARED
BRIDGE NODES

4 GB/s

IB NETWORK

128:1

Compute node rack I/O nodes GPFS filesystem

11
Job Scheduling

Wikipedia

NODES USERS

JOBS

Example of a real supercomputer activity


- Argonne National Laboratory Theta jobs
62
Supercomputer Activity

A graphical representation of all jobs running on the supercomputer

63
Parallel Deep Learning

Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis


64
Reference Material

• DE Culler, A Gupta and JP Singh, Parallel Computer Architecture:


A Hardware/Software Approach Morgan-Kaufmann, 1998.
• A Grama, A Gupta, G Karypis, and V Kumar, Introduction to
Parallel Computing. 2nd Ed., Addison-Wesley, 2003.
• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W.
Walker and Jack Dongarra, MPI - The Complete Reference,
Second Edition, Volume 1, The MPI Core.
• Bill Gropp, Using MPI, Third Edition, The MIT Press, 2014.
• Research papers

65

You might also like