SlideShare a Scribd company logo
Understanding and optimizing
parallelism in NumPy-based programs
Ralf Gommers
21 April 2022
First make it work, then make it fast
>>> %timeit main()
50.1 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> # ... perform some optimizations
>>> %timeit main()
9.58 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> # break out your profiler (e.g., py-spy), optimize some more
>>> %timeit main()
2.83 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
`htop` output
Approaches for performant numerical
code (single-threaded)
Vectorization Use compiled code
Python compilers Python interpreters
Pythran
CPython
Plus Cinder, Pyston, and more -- very experimental,
and limited gains for numerical code
multiprocessing & multithreading
A key issue: oversubscription
Package A sees N CPU cores, and decides to use them all:
A key issue: oversubscription
Package B, which uses package A, or the end user decides to use
multiprocessing, 1 process per core:
The more CPU cores a machine has, the worse the effect is!
Parallel APIs & behavior: NumPy
NumPy is single-threaded, no code in NumPy is written for parallel execution.
However, most numpy.linalg functions (those using BLAS or LAPACK) execute in
parallel. They use all available cores on a machine.
NumPy does release the GIL wherever it can.
numpy.random has specific APIs to allow users to:
(a) Obtain independent streams for random number generation across
processes (local or distributed)
(b) Perform multithreaded random number generation
Parallel APIs & behavior: SciPy
SciPy is single-threaded by default (same as NumPy)
Calls to functionality using BLAS or LAPACK is again multithreaded:
● primarily in scipy.linalg and scipy.sparse.linalg,
● also higher-level functionality using linear algebra under the hood:
kernel density estimation, multivariate distributions etc. in scipy.stats,
vector quantization in scipy.cluster, interpolators in scipy.interpolate,
optimizers in scipy.optimize, and more
Some APIs have a workers=1 keyword, which allows the user to control the
number of processes or threads. Or pass in a custom Pool.
scipy.fft provides a context manager:
Parallel APIs & behavior: SciPy
An example using workers=:
Parallel APIs & behavior: scikit-learn
Scikit-learn is mostly single-threaded by default.
However, more and more functionality uses OpenMP for automatic
parallelization. This defaults to the number of virtual (not physical) CPU cores.
Many scikit-learn APIs offer a n_jobs= keyword to let user enable multiple
threads or processes via joblib.
Scikit-learn implements fairly complex control of NumPy/SciPy’s BLAS and
LAPACK libraries to prevent oversubscription in the presence of
multiprocessing on top of multi-threading. This is done via the threadpoolctl
package.
Controlling parallelism - packages
Dependencies (Conda)
Controlling parallelism - packages
Dependencies (PyPI)
Controlling parallelism - packages
Conda PyPI
Tuning the default behavior
Default behavior is inconsistent: too aggressive for linear algebra, and too
conservative for workers (SciPy) and n_jobs (scikit-learn)!
OpenBLAS, MKL and OpenMP don’t have a nice API, only environment variables:
For scikit-learn you can explicitly choose a backend (but defaults are usually fine):
Tuning the default behavior
NumPy, SciPy and scikit-learn all recommend using threadpoolctl in case you
want more granular control over threading behavior of BLAS, LAPACK and
OpenMP libraries (or cannot set environment variables):
A pitfall on multi-tenant machines
Multi-tenant machines: N “vCPU” (virtual CPU) cores for you, M in total.
CircleCI gives you 2 cores for a CI job, on a 64 core machine (and
os.cpu_count() reports 64). Set OPENBLAS_NUM_THREADS=2 to avoid problems!
GitHub Actions, Azure DevOps and other services are better behaved.
The impact can be severe:
Parallel random number generation
Parallel random number generation
First what not to do – simply drawing random numbers in different
subprocesses will give you the same numbers in each process:
Parallel random number generation
Use SeedSequence to obtain independent streams easily:
Parallel random number generation
Second option: use the .jumped() method of BitGenerator instances to obtain
independent streams easily:
Parallel random number generation
Where is NumPy going - technical
Interoperability
Array API standard support
Extensibility
Easier custom dtypes
Performance
SIMD acceleration on:
x86, arm64, PPC, …?
C++
Just dipping our toes in the
water here - so far it was just
Python and C
Platform support
PPC, AIX, s390x,
cross-compiling to embedded
ARM systems, ...
Type annotations
Main namespace annotations
just completed
Note what is not on this list: auto-parallelization
Resources to learn more
Scikit-learn:
https://scikit-learn.org/stable/computing/parallelism.html
https://joblib.readthedocs.io/en/latest/parallel.html
SciPy:
http://scipy.github.io/devdocs/dev/toolchain.html#openmp-support
http://scipy.github.io/devdocs/search.html?q=workers
NumPy:
https://numpy.org/doc/stable/reference/random/parallel.html
Relevant paper: Composable Multi-Threading and Multi-Processing for Numeric Libraries
Find me at: ralf.gommers@gmail.com, rgommers, ralfgommers
Thank you!

More Related Content

What's hot (20)

【Unity道場Houdini編】Houdini Engine とプロシージャル法
【Unity道場Houdini編】Houdini Engine とプロシージャル法【Unity道場Houdini編】Houdini Engine とプロシージャル法
【Unity道場Houdini編】Houdini Engine とプロシージャル法
UnityTechnologiesJapan002
 
RENDERING 最適化「禍つヴァールハイト」
RENDERING 最適化「禍つヴァールハイト」RENDERING 最適化「禍つヴァールハイト」
RENDERING 最適化「禍つヴァールハイト」
KLab Inc. / Tech
 
Loisirs et activités
Loisirs et activitésLoisirs et activités
Loisirs et activités
Noemí
 
GQM+Strategiesの説明 岸田 智子(クニエ)、新谷 勝利(早稲田大学)
GQM+Strategiesの説明 岸田 智子(クニエ)、新谷 勝利(早稲田大学)GQM+Strategiesの説明 岸田 智子(クニエ)、新谷 勝利(早稲田大学)
GQM+Strategiesの説明 岸田 智子(クニエ)、新谷 勝利(早稲田大学)
Hironori Washizaki
 
Tableau connecteurs logiques fle
Tableau connecteurs logiques fleTableau connecteurs logiques fle
Tableau connecteurs logiques fle
jeanphilippeguy
 
Vocabulaire art+ comment décrire un tableau
Vocabulaire art+ comment décrire un tableauVocabulaire art+ comment décrire un tableau
Vocabulaire art+ comment décrire un tableau
lebaobabbleu
 
PyTorch under the hood
PyTorch under the hoodPyTorch under the hood
PyTorch under the hood
Christian Perone
 
Les pronoms compléments COI/COI/EN/Y/ TONIQUES
Les pronoms compléments COI/COI/EN/Y/ TONIQUESLes pronoms compléments COI/COI/EN/Y/ TONIQUES
Les pronoms compléments COI/COI/EN/Y/ TONIQUES
lolyska
 
競技プログラミングでの線型方程式系
競技プログラミングでの線型方程式系競技プログラミングでの線型方程式系
競技プログラミングでの線型方程式系
tmaehara
 
見よう見まねでやってみる2D流体シミュレーション
見よう見まねでやってみる2D流体シミュレーション見よう見まねでやってみる2D流体シミュレーション
見よう見まねでやってみる2D流体シミュレーション
KLab Inc. / Tech
 
Guide du community manager en herbe
Guide du community manager en herbeGuide du community manager en herbe
Guide du community manager en herbe
Mouna Benkaddour
 
dplyrとは何だったのか
dplyrとは何だったのかdplyrとは何だったのか
dplyrとは何だったのか
yutannihilation
 
ノンプログラミングで始めるAR (HoloLens 2 / ARCore / ARKit) 開発 with MRTK
ノンプログラミングで始めるAR (HoloLens 2 / ARCore / ARKit) 開発 with MRTK ノンプログラミングで始めるAR (HoloLens 2 / ARCore / ARKit) 開発 with MRTK
ノンプログラミングで始めるAR (HoloLens 2 / ARCore / ARKit) 開発 with MRTK
Takashi Yoshinaga
 
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
Unity Technologies Japan K.K.
 
OpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming IntroductionOpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming Introduction
Champ Yen
 
Longue liste verbes à prépositions
Longue liste verbes à prépositionsLongue liste verbes à prépositions
Longue liste verbes à prépositions
MSblog
 
게임 유저 행동을 분석해야 하는 이유 (텐투플레이 웨비나)
게임 유저 행동을 분석해야 하는 이유 (텐투플레이 웨비나)게임 유저 행동을 분석해야 하는 이유 (텐투플레이 웨비나)
게임 유저 행동을 분석해야 하는 이유 (텐투플레이 웨비나)
Hyeyon Kwon
 
【GCC18】PUBGライクなゲームをUnityだけで早く確実に作る方法 〜ひとつのUnity上でダミークライアントを100個同時に動かす〜
【GCC18】PUBGライクなゲームをUnityだけで早く確実に作る方法 〜ひとつのUnity上でダミークライアントを100個同時に動かす〜【GCC18】PUBGライクなゲームをUnityだけで早く確実に作る方法 〜ひとつのUnity上でダミークライアントを100個同時に動かす〜
【GCC18】PUBGライクなゲームをUnityだけで早く確実に作る方法 〜ひとつのUnity上でダミークライアントを100個同時に動かす〜
モノビット エンジン
 
Stratégie de marketing cas Eden hotels
Stratégie de marketing  cas Eden hotelsStratégie de marketing  cas Eden hotels
Stratégie de marketing cas Eden hotels
Fethi Ferhane
 
【Unity道場Houdini編】Houdini Engine とプロシージャル法
【Unity道場Houdini編】Houdini Engine とプロシージャル法【Unity道場Houdini編】Houdini Engine とプロシージャル法
【Unity道場Houdini編】Houdini Engine とプロシージャル法
UnityTechnologiesJapan002
 
RENDERING 最適化「禍つヴァールハイト」
RENDERING 最適化「禍つヴァールハイト」RENDERING 最適化「禍つヴァールハイト」
RENDERING 最適化「禍つヴァールハイト」
KLab Inc. / Tech
 
Loisirs et activités
Loisirs et activitésLoisirs et activités
Loisirs et activités
Noemí
 
GQM+Strategiesの説明 岸田 智子(クニエ)、新谷 勝利(早稲田大学)
GQM+Strategiesの説明 岸田 智子(クニエ)、新谷 勝利(早稲田大学)GQM+Strategiesの説明 岸田 智子(クニエ)、新谷 勝利(早稲田大学)
GQM+Strategiesの説明 岸田 智子(クニエ)、新谷 勝利(早稲田大学)
Hironori Washizaki
 
Tableau connecteurs logiques fle
Tableau connecteurs logiques fleTableau connecteurs logiques fle
Tableau connecteurs logiques fle
jeanphilippeguy
 
Vocabulaire art+ comment décrire un tableau
Vocabulaire art+ comment décrire un tableauVocabulaire art+ comment décrire un tableau
Vocabulaire art+ comment décrire un tableau
lebaobabbleu
 
Les pronoms compléments COI/COI/EN/Y/ TONIQUES
Les pronoms compléments COI/COI/EN/Y/ TONIQUESLes pronoms compléments COI/COI/EN/Y/ TONIQUES
Les pronoms compléments COI/COI/EN/Y/ TONIQUES
lolyska
 
競技プログラミングでの線型方程式系
競技プログラミングでの線型方程式系競技プログラミングでの線型方程式系
競技プログラミングでの線型方程式系
tmaehara
 
見よう見まねでやってみる2D流体シミュレーション
見よう見まねでやってみる2D流体シミュレーション見よう見まねでやってみる2D流体シミュレーション
見よう見まねでやってみる2D流体シミュレーション
KLab Inc. / Tech
 
Guide du community manager en herbe
Guide du community manager en herbeGuide du community manager en herbe
Guide du community manager en herbe
Mouna Benkaddour
 
dplyrとは何だったのか
dplyrとは何だったのかdplyrとは何だったのか
dplyrとは何だったのか
yutannihilation
 
ノンプログラミングで始めるAR (HoloLens 2 / ARCore / ARKit) 開発 with MRTK
ノンプログラミングで始めるAR (HoloLens 2 / ARCore / ARKit) 開発 with MRTK ノンプログラミングで始めるAR (HoloLens 2 / ARCore / ARKit) 開発 with MRTK
ノンプログラミングで始めるAR (HoloLens 2 / ARCore / ARKit) 開発 with MRTK
Takashi Yoshinaga
 
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
Unity Technologies Japan K.K.
 
OpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming IntroductionOpenGL ES 2.x Programming Introduction
OpenGL ES 2.x Programming Introduction
Champ Yen
 
Longue liste verbes à prépositions
Longue liste verbes à prépositionsLongue liste verbes à prépositions
Longue liste verbes à prépositions
MSblog
 
게임 유저 행동을 분석해야 하는 이유 (텐투플레이 웨비나)
게임 유저 행동을 분석해야 하는 이유 (텐투플레이 웨비나)게임 유저 행동을 분석해야 하는 이유 (텐투플레이 웨비나)
게임 유저 행동을 분석해야 하는 이유 (텐투플레이 웨비나)
Hyeyon Kwon
 
【GCC18】PUBGライクなゲームをUnityだけで早く確実に作る方法 〜ひとつのUnity上でダミークライアントを100個同時に動かす〜
【GCC18】PUBGライクなゲームをUnityだけで早く確実に作る方法 〜ひとつのUnity上でダミークライアントを100個同時に動かす〜【GCC18】PUBGライクなゲームをUnityだけで早く確実に作る方法 〜ひとつのUnity上でダミークライアントを100個同時に動かす〜
【GCC18】PUBGライクなゲームをUnityだけで早く確実に作る方法 〜ひとつのUnity上でダミークライアントを100個同時に動かす〜
モノビット エンジン
 
Stratégie de marketing cas Eden hotels
Stratégie de marketing  cas Eden hotelsStratégie de marketing  cas Eden hotels
Stratégie de marketing cas Eden hotels
Fethi Ferhane
 

Similar to Parallelism in a NumPy-based program (20)

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
Jeff Larkin
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
Nitin Kumar
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
Jeff Larkin
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
Peter Skomoroch
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Akihiro Hayashi
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Akihiro Hayashi
 
Balancing Power & Performance Webinar
Balancing Power & Performance WebinarBalancing Power & Performance Webinar
Balancing Power & Performance Webinar
Qualcomm Developer Network
 
Use Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeUse Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient code
Alessio Coltellacci
 
Effective Benchmarks
Effective BenchmarksEffective Benchmarks
Effective Benchmarks
Workhorse Computing
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
Travis Oliphant
 
Multicore
MulticoreMulticore
Multicore
Birgit Plötzeneder
 
Distributed computing and hyper-parameter tuning with Ray
Distributed computing and hyper-parameter tuning with RayDistributed computing and hyper-parameter tuning with Ray
Distributed computing and hyper-parameter tuning with Ray
Jan Margeta
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
Unai Lopez-Novoa
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
klepsydratechnologie
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
PeterAndreasEntschev
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
Marcel Caraciolo
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
Jeff Larkin
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
Nitin Kumar
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
Jeff Larkin
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU SelectionMachine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Akihiro Hayashi
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Akihiro Hayashi
 
Use Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient codeUse Data-Oriented Design to write efficient code
Use Data-Oriented Design to write efficient code
Alessio Coltellacci
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
Travis Oliphant
 
Distributed computing and hyper-parameter tuning with Ray
Distributed computing and hyper-parameter tuning with RayDistributed computing and hyper-parameter tuning with Ray
Distributed computing and hyper-parameter tuning with Ray
Jan Margeta
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
PeterAndreasEntschev
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
Marcel Caraciolo
 
Ad

More from Ralf Gommers (13)

Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Ralf Gommers
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with Python
Ralf Gommers
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefits
Ralf Gommers
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with Pythran
Ralf Gommers
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond code
Ralf Gommers
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
Ralf Gommers
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decade
Ralf Gommers
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in Python
Ralf Gommers
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts
Ralf Gommers
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS Forum
Ralf Gommers
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_session
Ralf Gommers
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and Code
Ralf Gommers
 
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Ralf Gommers
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with Python
Ralf Gommers
 
Python array API standardization - current state and benefits
Python array API standardization - current state and benefitsPython array API standardization - current state and benefits
Python array API standardization - current state and benefits
Ralf Gommers
 
Building SciPy kernels with Pythran
Building SciPy kernels with PythranBuilding SciPy kernels with Pythran
Building SciPy kernels with Pythran
Ralf Gommers
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
Strengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond codeStrengthening NumPy's foundations - growing beyond code
Strengthening NumPy's foundations - growing beyond code
Ralf Gommers
 
PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019PyData NYC whatsnew NumPy-SciPy 2019
PyData NYC whatsnew NumPy-SciPy 2019
Ralf Gommers
 
Inside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decadeInside NumPy: preparing for the next decade
Inside NumPy: preparing for the next decade
Ralf Gommers
 
The evolution of array computing in Python
The evolution of array computing in PythonThe evolution of array computing in Python
The evolution of array computing in Python
Ralf Gommers
 
__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts__array_function__ conceptual design & related concepts
__array_function__ conceptual design & related concepts
Ralf Gommers
 
NumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS ForumNumPy Roadmap presentation at NumFOCUS Forum
NumPy Roadmap presentation at NumFOCUS Forum
Ralf Gommers
 
NumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_sessionNumFOCUS_Summit2018_Roadmaps_session
NumFOCUS_Summit2018_Roadmaps_session
Ralf Gommers
 
SciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and CodeSciPy 1.0 and Beyond - a Story of Community and Code
SciPy 1.0 and Beyond - a Story of Community and Code
Ralf Gommers
 
Ad

Recently uploaded (20)

How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
Nacho Cougil
 
ICDL FULL STANDARD 2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
ICDL FULL STANDARD  2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdfICDL FULL STANDARD  2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
ICDL FULL STANDARD 2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
M. Luisetto Pharm.D.Spec. Pharmacology
 
zOS CommServer support for the Network Express feature on z17
zOS CommServer support for the Network Express feature on z17zOS CommServer support for the Network Express feature on z17
zOS CommServer support for the Network Express feature on z17
zOSCommserver
 
Internship in South western railways on software
Internship in South western railways on softwareInternship in South western railways on software
Internship in South western railways on software
abhim5889
 
Oliveira2024 - Combining GPT and Weak Supervision.pdf
Oliveira2024 - Combining GPT and Weak Supervision.pdfOliveira2024 - Combining GPT and Weak Supervision.pdf
Oliveira2024 - Combining GPT and Weak Supervision.pdf
GiliardGodoi1
 
Risk Management in Software Projects: Identifying, Analyzing, and Controlling...
Risk Management in Software Projects: Identifying, Analyzing, and Controlling...Risk Management in Software Projects: Identifying, Analyzing, and Controlling...
Risk Management in Software Projects: Identifying, Analyzing, and Controlling...
gauravvmanchandaa200
 
Marketing And Sales Software Services.pptx
Marketing And Sales Software Services.pptxMarketing And Sales Software Services.pptx
Marketing And Sales Software Services.pptx
julia smits
 
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdfHow a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
mary rojas
 
Delivering More with Less: AI Driven Resource Management with OnePlan
Delivering More with Less: AI Driven Resource Management with OnePlan Delivering More with Less: AI Driven Resource Management with OnePlan
Delivering More with Less: AI Driven Resource Management with OnePlan
OnePlan Solutions
 
AI Alternative - Discover the best AI tools and their alternatives
AI Alternative - Discover the best AI tools and their alternativesAI Alternative - Discover the best AI tools and their alternatives
AI Alternative - Discover the best AI tools and their alternatives
AI Alternative
 
Techdebt handling with cleancode focus and as risk taker
Techdebt handling with cleancode focus and as risk takerTechdebt handling with cleancode focus and as risk taker
Techdebt handling with cleancode focus and as risk taker
RajaNagendraKumar
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...
Rishab Acharya
 
aswjkdwelhjdfshlfjkhewljhfljawerhwjarhwjkahrjar
aswjkdwelhjdfshlfjkhewljhfljawerhwjarhwjkahrjaraswjkdwelhjdfshlfjkhewljhfljawerhwjarhwjkahrjar
aswjkdwelhjdfshlfjkhewljhfljawerhwjarhwjkahrjar
muhammadalikhanalikh1
 
AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION
AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATIONAI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION
AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION
miso_uam
 
War Story: Removing Offensive Language from Percona Toolkit
War Story: Removing Offensive Language from Percona ToolkitWar Story: Removing Offensive Language from Percona Toolkit
War Story: Removing Offensive Language from Percona Toolkit
Sveta Smirnova
 
iOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod KumariOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod Kumar
Pramod Kumar
 
Shortcomings of EHS Software – And How to Overcome Them
Shortcomings of EHS Software – And How to Overcome ThemShortcomings of EHS Software – And How to Overcome Them
Shortcomings of EHS Software – And How to Overcome Them
TECH EHS Solution
 
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire
 
Secure and Simplify IT Management with ManageEngine Endpoint Central.pdf
Secure and Simplify IT Management with ManageEngine Endpoint Central.pdfSecure and Simplify IT Management with ManageEngine Endpoint Central.pdf
Secure and Simplify IT Management with ManageEngine Endpoint Central.pdf
Northwind Technologies
 
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
How John started to like TDD (instead of hating it) (ViennaJUG, June'25)
Nacho Cougil
 
ICDL FULL STANDARD 2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
ICDL FULL STANDARD  2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdfICDL FULL STANDARD  2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
ICDL FULL STANDARD 2025 Luisetto mauro - Academia domani- 55 HOURS LONG pdf
M. Luisetto Pharm.D.Spec. Pharmacology
 
zOS CommServer support for the Network Express feature on z17
zOS CommServer support for the Network Express feature on z17zOS CommServer support for the Network Express feature on z17
zOS CommServer support for the Network Express feature on z17
zOSCommserver
 
Internship in South western railways on software
Internship in South western railways on softwareInternship in South western railways on software
Internship in South western railways on software
abhim5889
 
Oliveira2024 - Combining GPT and Weak Supervision.pdf
Oliveira2024 - Combining GPT and Weak Supervision.pdfOliveira2024 - Combining GPT and Weak Supervision.pdf
Oliveira2024 - Combining GPT and Weak Supervision.pdf
GiliardGodoi1
 
Risk Management in Software Projects: Identifying, Analyzing, and Controlling...
Risk Management in Software Projects: Identifying, Analyzing, and Controlling...Risk Management in Software Projects: Identifying, Analyzing, and Controlling...
Risk Management in Software Projects: Identifying, Analyzing, and Controlling...
gauravvmanchandaa200
 
Marketing And Sales Software Services.pptx
Marketing And Sales Software Services.pptxMarketing And Sales Software Services.pptx
Marketing And Sales Software Services.pptx
julia smits
 
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdfHow a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
How a Staff Augmentation Company IN USA Powers Flutter App Breakthroughs.pdf
mary rojas
 
Delivering More with Less: AI Driven Resource Management with OnePlan
Delivering More with Less: AI Driven Resource Management with OnePlan Delivering More with Less: AI Driven Resource Management with OnePlan
Delivering More with Less: AI Driven Resource Management with OnePlan
OnePlan Solutions
 
AI Alternative - Discover the best AI tools and their alternatives
AI Alternative - Discover the best AI tools and their alternativesAI Alternative - Discover the best AI tools and their alternatives
AI Alternative - Discover the best AI tools and their alternatives
AI Alternative
 
Techdebt handling with cleancode focus and as risk taker
Techdebt handling with cleancode focus and as risk takerTechdebt handling with cleancode focus and as risk taker
Techdebt handling with cleancode focus and as risk taker
RajaNagendraKumar
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...Online Queue Management System for Public Service Offices [Focused on Municip...
Online Queue Management System for Public Service Offices [Focused on Municip...
Rishab Acharya
 
aswjkdwelhjdfshlfjkhewljhfljawerhwjarhwjkahrjar
aswjkdwelhjdfshlfjkhewljhfljawerhwjarhwjkahrjaraswjkdwelhjdfshlfjkhewljhfljawerhwjarhwjkahrjar
aswjkdwelhjdfshlfjkhewljhfljawerhwjarhwjkahrjar
muhammadalikhanalikh1
 
AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION
AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATIONAI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION
AI-ASSISTED METAMORPHIC TESTING FOR DOMAIN-SPECIFIC MODELLING AND SIMULATION
miso_uam
 
War Story: Removing Offensive Language from Percona Toolkit
War Story: Removing Offensive Language from Percona ToolkitWar Story: Removing Offensive Language from Percona Toolkit
War Story: Removing Offensive Language from Percona Toolkit
Sveta Smirnova
 
iOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod KumariOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod Kumar
Pramod Kumar
 
Shortcomings of EHS Software – And How to Overcome Them
Shortcomings of EHS Software – And How to Overcome ThemShortcomings of EHS Software – And How to Overcome Them
Shortcomings of EHS Software – And How to Overcome Them
TECH EHS Solution
 
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire Unlocking the Future of Tech Talent with AI-Powered Hiring Solution...
GirikHire
 
Secure and Simplify IT Management with ManageEngine Endpoint Central.pdf
Secure and Simplify IT Management with ManageEngine Endpoint Central.pdfSecure and Simplify IT Management with ManageEngine Endpoint Central.pdf
Secure and Simplify IT Management with ManageEngine Endpoint Central.pdf
Northwind Technologies
 

Parallelism in a NumPy-based program

  • 1. Understanding and optimizing parallelism in NumPy-based programs Ralf Gommers 21 April 2022
  • 2. First make it work, then make it fast >>> %timeit main() 50.1 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> # ... perform some optimizations >>> %timeit main() 9.58 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> # break out your profiler (e.g., py-spy), optimize some more >>> %timeit main() 2.83 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • 4. Approaches for performant numerical code (single-threaded) Vectorization Use compiled code Python compilers Python interpreters Pythran CPython Plus Cinder, Pyston, and more -- very experimental, and limited gains for numerical code
  • 6. A key issue: oversubscription Package A sees N CPU cores, and decides to use them all:
  • 7. A key issue: oversubscription Package B, which uses package A, or the end user decides to use multiprocessing, 1 process per core: The more CPU cores a machine has, the worse the effect is!
  • 8. Parallel APIs & behavior: NumPy NumPy is single-threaded, no code in NumPy is written for parallel execution. However, most numpy.linalg functions (those using BLAS or LAPACK) execute in parallel. They use all available cores on a machine. NumPy does release the GIL wherever it can. numpy.random has specific APIs to allow users to: (a) Obtain independent streams for random number generation across processes (local or distributed) (b) Perform multithreaded random number generation
  • 9. Parallel APIs & behavior: SciPy SciPy is single-threaded by default (same as NumPy) Calls to functionality using BLAS or LAPACK is again multithreaded: ● primarily in scipy.linalg and scipy.sparse.linalg, ● also higher-level functionality using linear algebra under the hood: kernel density estimation, multivariate distributions etc. in scipy.stats, vector quantization in scipy.cluster, interpolators in scipy.interpolate, optimizers in scipy.optimize, and more Some APIs have a workers=1 keyword, which allows the user to control the number of processes or threads. Or pass in a custom Pool. scipy.fft provides a context manager:
  • 10. Parallel APIs & behavior: SciPy An example using workers=:
  • 11. Parallel APIs & behavior: scikit-learn Scikit-learn is mostly single-threaded by default. However, more and more functionality uses OpenMP for automatic parallelization. This defaults to the number of virtual (not physical) CPU cores. Many scikit-learn APIs offer a n_jobs= keyword to let user enable multiple threads or processes via joblib. Scikit-learn implements fairly complex control of NumPy/SciPy’s BLAS and LAPACK libraries to prevent oversubscription in the presence of multiprocessing on top of multi-threading. This is done via the threadpoolctl package.
  • 12. Controlling parallelism - packages Dependencies (Conda)
  • 13. Controlling parallelism - packages Dependencies (PyPI)
  • 14. Controlling parallelism - packages Conda PyPI
  • 15. Tuning the default behavior Default behavior is inconsistent: too aggressive for linear algebra, and too conservative for workers (SciPy) and n_jobs (scikit-learn)! OpenBLAS, MKL and OpenMP don’t have a nice API, only environment variables: For scikit-learn you can explicitly choose a backend (but defaults are usually fine):
  • 16. Tuning the default behavior NumPy, SciPy and scikit-learn all recommend using threadpoolctl in case you want more granular control over threading behavior of BLAS, LAPACK and OpenMP libraries (or cannot set environment variables):
  • 17. A pitfall on multi-tenant machines Multi-tenant machines: N “vCPU” (virtual CPU) cores for you, M in total. CircleCI gives you 2 cores for a CI job, on a 64 core machine (and os.cpu_count() reports 64). Set OPENBLAS_NUM_THREADS=2 to avoid problems! GitHub Actions, Azure DevOps and other services are better behaved. The impact can be severe:
  • 19. Parallel random number generation First what not to do – simply drawing random numbers in different subprocesses will give you the same numbers in each process:
  • 20. Parallel random number generation Use SeedSequence to obtain independent streams easily:
  • 21. Parallel random number generation Second option: use the .jumped() method of BitGenerator instances to obtain independent streams easily:
  • 23. Where is NumPy going - technical Interoperability Array API standard support Extensibility Easier custom dtypes Performance SIMD acceleration on: x86, arm64, PPC, …? C++ Just dipping our toes in the water here - so far it was just Python and C Platform support PPC, AIX, s390x, cross-compiling to embedded ARM systems, ... Type annotations Main namespace annotations just completed Note what is not on this list: auto-parallelization
  • 24. Resources to learn more Scikit-learn: https://scikit-learn.org/stable/computing/parallelism.html https://joblib.readthedocs.io/en/latest/parallel.html SciPy: http://scipy.github.io/devdocs/dev/toolchain.html#openmp-support http://scipy.github.io/devdocs/search.html?q=workers NumPy: https://numpy.org/doc/stable/reference/random/parallel.html Relevant paper: Composable Multi-Threading and Multi-Processing for Numeric Libraries
  • 25. Find me at: [email protected], rgommers, ralfgommers Thank you!