SlideShare a Scribd company logo
Parsl:
Pervasive Parallel Programming in Python
18 October 2019
Daniel S. Katz
(d.katz@ieee.org, http://danielskatz.org, @danielskatz)
Assistant Director for Scientific
Software & Applications, NCSA
Research Associate Professor,
CS, ECE, iSchool
Parsl Team: Y. Babuji, A. Woodard,
Z. Li, D. S. Katz, B. Clifford, R. Kumar,
L. Lacinski, R. Chard, J. M. Wozniak,
I. Foster, M. Wilde, K. Chard
Supporting composition and parallelism in Python
Software is increasingly assembled rather than written
• High-level language (e.g., Python) to integrate and wrap components from
many sources
Parallel and distributed computing is no longer a niche area
• Increasing data sizes combined with plateauing sequential processing power
• Parallel hardware (e.g., accelerators) and distributed computing systems
Parsl allows for the natural expression of parallelism in such a way that
programs can express opportunities for parallelism that can then be
realized, at execution time, using different execution models on different
parallel platforms
Traditional workflow
• A set of tasks and dependencies between them
• Perhaps expressed as data structure, e.g. graph (DAG or cyclic)
• How is this different than a procedural computer program?
• At the level of a set of instructions:
• Dependencies are often explicit, perhaps like a compiled intermediate
representation
• Tasks are much longer (running time O(sec) – O(hr))
• At level of set of functions:
• Tasks as more well-defined (inputs, outputs)
• Tasks are often longer (running time O(sec) – O(hr))
https://danielskatzblog.wordpress.com/2018/01/08/expressing-workflows-as-code-vs-data/
https://danielskatzblog.wordpress.com/2019/02/05/using-workflows-expressed-as-code-and-workflows-expressed-as-data-together/
Traditional workflow expression
• Why express a workflow differently than a program?
• Program (script) is a natural way of expressing a workflow
• Easy to understand, easy to change
• Examples: shell scripts, programs in Parsl
• Parsl: functions used to identify components
• Expressing it as data corresponds to the compiled (assembly)
version of the workflow
• Maybe easier to execute?
• Perhaps easier to reproduce?
https://danielskatzblog.wordpress.com/2018/01/08/expressing-workflows-as-code-vs-data/
https://danielskatzblog.wordpress.com/2019/02/05/using-workflows-expressed-as-code-and-workflows-expressed-as-data-together/
Workflow as code vs workflow as data
• Goes back to workflow lifecycle concept
• Workflows follow a cycle:
• Experimentation/exploration phase (scientific hacking): workflow is extension of
thought processes of workflow maker
• Productization/ dissemination phase: developer (or someone else) prepares for
wider and repeated use by documentation and optimization then disseminates
• This use by others can simply be use, or it can be further development.
• Different types of users have different needs
• Experts want to be able to do as much as possible
• Other users trade away complex features for simpler user interface
C. Wroe, C. Goble, et al., “Recycling workflows and services through discovery and reuse,”
CCPE v19, pp. 181-194, 2007. doi:10.1002/cpe.1050
Representative Parsl Use Cases
Input
Output
DLHub SwiftSeq LSST-DESC
DESC image simulation
Catalog
Simulator
Image
Simulator
Atmosphere
, Telescope,
Camera...
LSST Data
Management
Stack
Fake
Observations!SCIENCE!
NASA/JPL-
Caltech/ESO
/R. Hurt
HSC Project /
NAOJ
LSST Project
Credit: Antonio Villarreal
ImSim workflow
Bundler
189 sensors
x ~10,000s
of instance
catalogs
Node-sized bundles
(64 tasks each)
JSON description
Catalog 1 189
tasks
x 4000 nodes
Parsl
Extreme
Scale
Executor
x 189 x ~10,000s
256K cores
for 3 days
Catalog 2
Catalog 3
128K cores
for 3.5 days
x 2000 nodes
Representative Parsl use cases
DLHub
Machine Learning
Inference
SwiftSeq
DNA Sequence
Analysis
LSST-DESC
Simulated Sky
Survey
O(Tasks) Thousands Thousands Millions
O(Nodes) Tens Hundreds Thousands
O(Duration) Milliseconds-Seconds Hours-Days Hours-Day
Pattern Bag-of-tasks Dataflow Dataflow
Requirements Low latency bounds High throughput Extreme scale
Parsl Basics
Parsl: Interactive parallel programming in Python
Apps define opportunities for parallelism
• Python apps call Python functions
• Bash apps call external applications
Apps return “futures”: a proxy for a result
that might not yet be available
Apps run concurrently respecting data
dependencies. Natural parallel
programming!
Parsl scripts are independent of where
they run. Write once run anywhere!
pip install parsl
Try parsl via binder at bottom left of http://parsl-project.org
Expressing a many task workflow in Parsl
1) Wrap the science applications as Parsl Apps:
@bash_app
def simulate(outputs=[]):
return './simulation_app.exe {outputs[0]}’
@bash_app
def merge(inputs=[], outputs=[]):
i = inputs; o = outputs
return './merge {1} {0}'.format(' '.join(i), o[0])
@python_app
def analyze(inputs=[]):
return analysis_package(inputs)
Expressing a many task workflow in Parsl
2) Execute the parallel workflow by calling Apps:
sims = []
for i in range (nsims):
sims.append(simulate(outputs=['sim-%s.txt' % i]))
all = merge(inputs=[i.outputs[0] for i in sims],
outputs=['all.txt'])
result = analyze(inputs=[all.outputs[0]])
simulate simulate simulate
…
merge
analyze
sim-1.txt sim-2.txt sim-N.txt
all.txt
Decomposing dynamic parallel execution into a task-dependency
graph
Parsl
Parsl scripts are execution provider independent
The same script can be run locally, on grids, clouds,
or supercomputers
Growing support for various schedulers and cloud
vendors
From Parsl docs
Separation of code and execution
Choose execution environment
at runtime. Parsl will direct
tasks to the configured
execution environment(s).
Authentication and authorization
Authn/z is hard…
• 2FA, X509, GSISSH, etc.
Integration with Globus Auth
to support native app
integration for accessing
Globus (and other) services
Using scoped access tokens,
refresh tokens, delegation
support
Transparent (wide area) data management
Implicit data movement to/from
repositories, laptops,
supercomputers
Globus for third-party, high
performance and reliable data
transfer
• Support for site-specific DTNs
HTTP/FTP direct data staging
parsl_file =
File(globus://EP/path/file)
www.globus.org
Parsl Performance
Different types of scientific workloads
High-throughput workloads
• Protein docking, image processing, materials reconstructions
• Requirements: 1000s of tasks, 100s of nodes, reliability, usability,
monitoring, elasticity, etc.
Extreme-scale workloads
• Cosmology simulations, imaging the arctic, genomics analysis
• Requirements: millions of tasks, 1000s of nodes (100,000s cores)
Interactive and real-time workloads
• Materials science, cosmic ray shower analysis, machine learning inference
• Requirements: 10s of nodes, rapid response, pipelining
Different types of execution
High-throughput executor (HTEX)
• Pilot job-based model with multi-threaded manager deployed on workers
• Designed for ease of use, fault-tolerance, etc.
• <2000 nodes (~60K workers), ms tasks, task duration/nodes > 0.01
Extreme-scale executor (EXEX)
• Distributed MPI job manages execution. Manager rank communicates
workload to other worker ranks directly
• Designed for extreme scale execution on supercomputers
• >1000 nodes (>30K workers), ms tasks, >1 m task duration
Low-latency Executor (LLEX)
• Direct socket communication to workers, fixed resource pool, limited features
• 10s nodes, <1M tasks, <1m tasks
Short tasks scale to thousands of workers
Strong scaling: 50,000 tasks submitted
with increasing number of workers
* Fireworks only 5,000 tasks
HTEX and EXEX outperform other
Python-based approaches when >256
workers
Other approaches are limited to fewer
than 128 nodes; HTEX and EXEX
continue to scale
0s tasks
1s tasks
Executors scale to 2M tasks/256K workers
0s tasks
1s tasks
Weak scaling: 10 tasks per worker
HTEX and EXEX again outperform
other Python-based approaches up
to ~2M tasks
HTEX and EXEX scale to 2K nodes
(~65k workers) and 8K nodes
(~262K workers), respectively, with
>1K tasks/s
Parsl executors can provide low latency
• LLEX achieves low
(3.47ms) and
consistent latency
• HTEX (6.87ms) and
EXEX (9.83) are less
consistent
Scalability summary
• EXEX scales to over 250,000 workers across 8,000 nodes
• Both EXEX and HTEX deliver ~1200 tasks/s
• LLEX achieves an average latency of 3.47 ms with tight bounds
Framework Max. number of
workers
Max. number of
nodes
Max tasks/sec
Parsl-IPP 2048 64 330
Parsl-HTEX 65 536 2048 1181
Parsl-EXEX 262 144 8192 1176
FireWorks 1024 32 4
Dask distributed 4096 128 2617
More Parsl Functionality
Interactive supercomputing in Jupyter notebooks
Monitoring
and
visualization
Workflow view Task view
DOE Distributed Computing & Data Ecosystem
(DCDE)
• A DOE group is identifying best practices and research challenges to create
and operate a DOE/SC wide federated Distributed Computing & Data
Ecosystem (DCDE)
• Future Lab Computing Working Group (FLC-WG)
• Initially working towards a pilot
• Using OAuth, working with Globus
• Test deployment at BNL
• Parsl is part of this effort, via initial work in linking ORNL and BNL
• We’ve added support for an OAuthSSHChannel
• Now being tested on test deployment
Multi-site execution
1.Loading Parsl
configuration triggers:
a) Creation of SSH
channels
b) Deployment of an
interchange process
onto login nodes
c) Submission of pilot jobs
that will connect to the
interchange
2.Parsl submits tasks
directly to interchange
3.Parsl uses Globus to
stage data
Interchange Interchange
Parsl
Multi-site execution
Too much small code
See demo instead
https://bit.ly/2Wsjlep
(code in https://github.com/Parsl/demo_multifacility)
Other functionality provided by Parsl
Globus. Delegated authentication
and wide area data management
Fault tolerance. Support for retries,
checkpointing, and memoization
Containers. Sandboxed execution
environments for workers and tasks
Data management. Automated
staging with HTTP, FTP, and Globus
Multi site. Combining
executors/providers for execution
across different resources
Elasticity. Automated resource
expansion/retraction based on
workload
Monitoring. Workflow and resource
monitoring and visualization
Reproducibility. Capture of workflow
provenance in the task graph
Jupyter integration. Seamless
description and management of
workflows
Resource abstraction. Block-based
model overlaying different providers
and resources
Summary
Parsl’s parallelism in Python
• Simple: minimal new constructs
• Safe: deterministic parallel programs through immutable
input/output objects, dependency task graph, etc.
• Scalable: efficient execution from laptops to the largest
supercomputers
• Flexible: programs composed from existing components and
then applied to different resources/workloads
Open source
https://github.com/Parsl/parsl
Questions?
http://parsl-project.org
U . S . D E P A R T M E N T O F
ENERGY
Ad

More Related Content

What's hot (20)

MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
University of California, San Diego
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
Josh Patterson
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing Ecosystem
Databricks
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
MLconf
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
Asim Jalis
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
MLconf
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
Anubhav Jain
 
Weld Strata talk
Weld Strata talkWeld Strata talk
Weld Strata talk
Deepak Narayanan
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlib
DataWorks Summit
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
Databricks
 
AI Development with H2O.ai
AI Development with H2O.aiAI Development with H2O.ai
AI Development with H2O.ai
Yalçın Yenigün
 
Parallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysisParallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysis
Manojit Nandi
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Databricks
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
Jen Aman
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
Sujit Pal
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
Anubhav Jain
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
Josh Patterson
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing Ecosystem
Databricks
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
MLconf
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
Asim Jalis
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
MLconf
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
Anubhav Jain
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlib
DataWorks Summit
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
Databricks
 
Parallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysisParallel Programming in Python: Speeding up your analysis
Parallel Programming in Python: Speeding up your analysis
Manojit Nandi
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Databricks
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
Jen Aman
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
Sujit Pal
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
Anubhav Jain
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Big Data Science with H2O in R
Big Data Science with H2O in RBig Data Science with H2O in R
Big Data Science with H2O in R
Anqi Fu
 

Similar to Parsl: Pervasive Parallel Programming in Python (20)

Scalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with ParslScalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with Parsl
Globus
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
openseesdays
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
Globus
 
Expressing and sharing workflows
Expressing and sharing workflowsExpressing and sharing workflows
Expressing and sharing workflows
Daniel S. Katz
 
Design of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore ArchitectureDesign of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore Architecture
Luiz Henrique Zambom Santana
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
Feng Li
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
Ryan Bosshart
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
Sathish24111
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
Aamir Ameen
 
Spark
SparkSpark
Spark
newmooxx
 
Scientific
Scientific Scientific
Scientific
marpierc
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
DataWorks Summit
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Bp301
Bp301Bp301
Bp301
Bill Buchan
 
Scalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with ParslScalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with Parsl
Globus
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
openseesdays
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
Globus
 
Expressing and sharing workflows
Expressing and sharing workflowsExpressing and sharing workflows
Expressing and sharing workflows
Daniel S. Katz
 
Design of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore ArchitectureDesign of Experiments on Federator Polystore Architecture
Design of Experiments on Federator Polystore Architecture
Luiz Henrique Zambom Santana
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
Feng Li
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
Sathish24111
 
Scientific
Scientific Scientific
Scientific
marpierc
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
DataWorks Summit
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Ad

More from Daniel S. Katz (20)

Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
Daniel S. Katz
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSA
Daniel S. Katz
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Daniel S. Katz
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
Daniel S. Katz
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research Objects
Daniel S. Katz
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
Daniel S. Katz
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainability
Daniel S. Katz
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and Practice
Daniel S. Katz
 
URSSI
URSSIURSSI
URSSI
Daniel S. Katz
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSI
Daniel S. Katz
 
Software citation
Software citationSoftware citation
Software citation
Daniel S. Katz
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in software
Daniel S. Katz
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and Impact
Daniel S. Katz
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groups
Daniel S. Katz
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...
Daniel S. Katz
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panel
Daniel S. Katz
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software opening
Daniel S. Katz
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
Daniel S. Katz
 
What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?
Daniel S. Katz
 
Looking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSFLooking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSF
Daniel S. Katz
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
Daniel S. Katz
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSA
Daniel S. Katz
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Daniel S. Katz
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
Daniel S. Katz
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research Objects
Daniel S. Katz
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
Daniel S. Katz
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainability
Daniel S. Katz
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and Practice
Daniel S. Katz
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSI
Daniel S. Katz
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in software
Daniel S. Katz
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and Impact
Daniel S. Katz
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groups
Daniel S. Katz
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...
Daniel S. Katz
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panel
Daniel S. Katz
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software opening
Daniel S. Katz
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
Daniel S. Katz
 
What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?
Daniel S. Katz
 
Looking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSFLooking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSF
Daniel S. Katz
 
Ad

Recently uploaded (20)

Hundred applicable AI Cases for oil and Gas
Hundred applicable AI Cases for oil and GasHundred applicable AI Cases for oil and Gas
Hundred applicable AI Cases for oil and Gas
bengsoon3
 
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdfAre Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Telecoms Supermarket
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...
BookNet Canada
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Vibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdfVibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdf
Baiju Muthukadan
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
TrsLabs Consultants - DeFi, WEb3, Token Listing
TrsLabs Consultants - DeFi, WEb3, Token ListingTrsLabs Consultants - DeFi, WEb3, Token Listing
TrsLabs Consultants - DeFi, WEb3, Token Listing
Trs Labs
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrsLabs - Leverage the Power of UPI Payments
TrsLabs - Leverage the Power of UPI PaymentsTrsLabs - Leverage the Power of UPI Payments
TrsLabs - Leverage the Power of UPI Payments
Trs Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Hundred applicable AI Cases for oil and Gas
Hundred applicable AI Cases for oil and GasHundred applicable AI Cases for oil and Gas
Hundred applicable AI Cases for oil and Gas
bengsoon3
 
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdfAre Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Are Cloud PBX Providers in India Reliable for Small Businesses (1).pdf
Telecoms Supermarket
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...Transcript: Canadian book publishing: Insights from the latest salary survey ...
Transcript: Canadian book publishing: Insights from the latest salary survey ...
BookNet Canada
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Vibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdfVibe Coding_ Develop a web application using AI (1).pdf
Vibe Coding_ Develop a web application using AI (1).pdf
Baiju Muthukadan
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
TrsLabs Consultants - DeFi, WEb3, Token Listing
TrsLabs Consultants - DeFi, WEb3, Token ListingTrsLabs Consultants - DeFi, WEb3, Token Listing
TrsLabs Consultants - DeFi, WEb3, Token Listing
Trs Labs
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrsLabs - Leverage the Power of UPI Payments
TrsLabs - Leverage the Power of UPI PaymentsTrsLabs - Leverage the Power of UPI Payments
TrsLabs - Leverage the Power of UPI Payments
Trs Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 

Parsl: Pervasive Parallel Programming in Python

  • 1. Parsl: Pervasive Parallel Programming in Python 18 October 2019 Daniel S. Katz ([email protected], http://danielskatz.org, @danielskatz) Assistant Director for Scientific Software & Applications, NCSA Research Associate Professor, CS, ECE, iSchool Parsl Team: Y. Babuji, A. Woodard, Z. Li, D. S. Katz, B. Clifford, R. Kumar, L. Lacinski, R. Chard, J. M. Wozniak, I. Foster, M. Wilde, K. Chard
  • 2. Supporting composition and parallelism in Python Software is increasingly assembled rather than written • High-level language (e.g., Python) to integrate and wrap components from many sources Parallel and distributed computing is no longer a niche area • Increasing data sizes combined with plateauing sequential processing power • Parallel hardware (e.g., accelerators) and distributed computing systems Parsl allows for the natural expression of parallelism in such a way that programs can express opportunities for parallelism that can then be realized, at execution time, using different execution models on different parallel platforms
  • 3. Traditional workflow • A set of tasks and dependencies between them • Perhaps expressed as data structure, e.g. graph (DAG or cyclic) • How is this different than a procedural computer program? • At the level of a set of instructions: • Dependencies are often explicit, perhaps like a compiled intermediate representation • Tasks are much longer (running time O(sec) – O(hr)) • At level of set of functions: • Tasks as more well-defined (inputs, outputs) • Tasks are often longer (running time O(sec) – O(hr)) https://danielskatzblog.wordpress.com/2018/01/08/expressing-workflows-as-code-vs-data/ https://danielskatzblog.wordpress.com/2019/02/05/using-workflows-expressed-as-code-and-workflows-expressed-as-data-together/
  • 4. Traditional workflow expression • Why express a workflow differently than a program? • Program (script) is a natural way of expressing a workflow • Easy to understand, easy to change • Examples: shell scripts, programs in Parsl • Parsl: functions used to identify components • Expressing it as data corresponds to the compiled (assembly) version of the workflow • Maybe easier to execute? • Perhaps easier to reproduce? https://danielskatzblog.wordpress.com/2018/01/08/expressing-workflows-as-code-vs-data/ https://danielskatzblog.wordpress.com/2019/02/05/using-workflows-expressed-as-code-and-workflows-expressed-as-data-together/
  • 5. Workflow as code vs workflow as data • Goes back to workflow lifecycle concept • Workflows follow a cycle: • Experimentation/exploration phase (scientific hacking): workflow is extension of thought processes of workflow maker • Productization/ dissemination phase: developer (or someone else) prepares for wider and repeated use by documentation and optimization then disseminates • This use by others can simply be use, or it can be further development. • Different types of users have different needs • Experts want to be able to do as much as possible • Other users trade away complex features for simpler user interface C. Wroe, C. Goble, et al., “Recycling workflows and services through discovery and reuse,” CCPE v19, pp. 181-194, 2007. doi:10.1002/cpe.1050
  • 6. Representative Parsl Use Cases Input Output DLHub SwiftSeq LSST-DESC
  • 7. DESC image simulation Catalog Simulator Image Simulator Atmosphere , Telescope, Camera... LSST Data Management Stack Fake Observations!SCIENCE! NASA/JPL- Caltech/ESO /R. Hurt HSC Project / NAOJ LSST Project Credit: Antonio Villarreal
  • 8. ImSim workflow Bundler 189 sensors x ~10,000s of instance catalogs Node-sized bundles (64 tasks each) JSON description Catalog 1 189 tasks x 4000 nodes Parsl Extreme Scale Executor x 189 x ~10,000s 256K cores for 3 days Catalog 2 Catalog 3 128K cores for 3.5 days x 2000 nodes
  • 9. Representative Parsl use cases DLHub Machine Learning Inference SwiftSeq DNA Sequence Analysis LSST-DESC Simulated Sky Survey O(Tasks) Thousands Thousands Millions O(Nodes) Tens Hundreds Thousands O(Duration) Milliseconds-Seconds Hours-Days Hours-Day Pattern Bag-of-tasks Dataflow Dataflow Requirements Low latency bounds High throughput Extreme scale
  • 11. Parsl: Interactive parallel programming in Python Apps define opportunities for parallelism • Python apps call Python functions • Bash apps call external applications Apps return “futures”: a proxy for a result that might not yet be available Apps run concurrently respecting data dependencies. Natural parallel programming! Parsl scripts are independent of where they run. Write once run anywhere! pip install parsl Try parsl via binder at bottom left of http://parsl-project.org
  • 12. Expressing a many task workflow in Parsl 1) Wrap the science applications as Parsl Apps: @bash_app def simulate(outputs=[]): return './simulation_app.exe {outputs[0]}’ @bash_app def merge(inputs=[], outputs=[]): i = inputs; o = outputs return './merge {1} {0}'.format(' '.join(i), o[0]) @python_app def analyze(inputs=[]): return analysis_package(inputs)
  • 13. Expressing a many task workflow in Parsl 2) Execute the parallel workflow by calling Apps: sims = [] for i in range (nsims): sims.append(simulate(outputs=['sim-%s.txt' % i])) all = merge(inputs=[i.outputs[0] for i in sims], outputs=['all.txt']) result = analyze(inputs=[all.outputs[0]]) simulate simulate simulate … merge analyze sim-1.txt sim-2.txt sim-N.txt all.txt
  • 14. Decomposing dynamic parallel execution into a task-dependency graph Parsl
  • 15. Parsl scripts are execution provider independent The same script can be run locally, on grids, clouds, or supercomputers Growing support for various schedulers and cloud vendors From Parsl docs
  • 16. Separation of code and execution Choose execution environment at runtime. Parsl will direct tasks to the configured execution environment(s).
  • 17. Authentication and authorization Authn/z is hard… • 2FA, X509, GSISSH, etc. Integration with Globus Auth to support native app integration for accessing Globus (and other) services Using scoped access tokens, refresh tokens, delegation support
  • 18. Transparent (wide area) data management Implicit data movement to/from repositories, laptops, supercomputers Globus for third-party, high performance and reliable data transfer • Support for site-specific DTNs HTTP/FTP direct data staging parsl_file = File(globus://EP/path/file) www.globus.org
  • 20. Different types of scientific workloads High-throughput workloads • Protein docking, image processing, materials reconstructions • Requirements: 1000s of tasks, 100s of nodes, reliability, usability, monitoring, elasticity, etc. Extreme-scale workloads • Cosmology simulations, imaging the arctic, genomics analysis • Requirements: millions of tasks, 1000s of nodes (100,000s cores) Interactive and real-time workloads • Materials science, cosmic ray shower analysis, machine learning inference • Requirements: 10s of nodes, rapid response, pipelining
  • 21. Different types of execution High-throughput executor (HTEX) • Pilot job-based model with multi-threaded manager deployed on workers • Designed for ease of use, fault-tolerance, etc. • <2000 nodes (~60K workers), ms tasks, task duration/nodes > 0.01 Extreme-scale executor (EXEX) • Distributed MPI job manages execution. Manager rank communicates workload to other worker ranks directly • Designed for extreme scale execution on supercomputers • >1000 nodes (>30K workers), ms tasks, >1 m task duration Low-latency Executor (LLEX) • Direct socket communication to workers, fixed resource pool, limited features • 10s nodes, <1M tasks, <1m tasks
  • 22. Short tasks scale to thousands of workers Strong scaling: 50,000 tasks submitted with increasing number of workers * Fireworks only 5,000 tasks HTEX and EXEX outperform other Python-based approaches when >256 workers Other approaches are limited to fewer than 128 nodes; HTEX and EXEX continue to scale 0s tasks 1s tasks
  • 23. Executors scale to 2M tasks/256K workers 0s tasks 1s tasks Weak scaling: 10 tasks per worker HTEX and EXEX again outperform other Python-based approaches up to ~2M tasks HTEX and EXEX scale to 2K nodes (~65k workers) and 8K nodes (~262K workers), respectively, with >1K tasks/s
  • 24. Parsl executors can provide low latency • LLEX achieves low (3.47ms) and consistent latency • HTEX (6.87ms) and EXEX (9.83) are less consistent
  • 25. Scalability summary • EXEX scales to over 250,000 workers across 8,000 nodes • Both EXEX and HTEX deliver ~1200 tasks/s • LLEX achieves an average latency of 3.47 ms with tight bounds Framework Max. number of workers Max. number of nodes Max tasks/sec Parsl-IPP 2048 64 330 Parsl-HTEX 65 536 2048 1181 Parsl-EXEX 262 144 8192 1176 FireWorks 1024 32 4 Dask distributed 4096 128 2617
  • 27. Interactive supercomputing in Jupyter notebooks
  • 29. DOE Distributed Computing & Data Ecosystem (DCDE) • A DOE group is identifying best practices and research challenges to create and operate a DOE/SC wide federated Distributed Computing & Data Ecosystem (DCDE) • Future Lab Computing Working Group (FLC-WG) • Initially working towards a pilot • Using OAuth, working with Globus • Test deployment at BNL • Parsl is part of this effort, via initial work in linking ORNL and BNL • We’ve added support for an OAuthSSHChannel • Now being tested on test deployment
  • 30. Multi-site execution 1.Loading Parsl configuration triggers: a) Creation of SSH channels b) Deployment of an interchange process onto login nodes c) Submission of pilot jobs that will connect to the interchange 2.Parsl submits tasks directly to interchange 3.Parsl uses Globus to stage data Interchange Interchange Parsl
  • 31. Multi-site execution Too much small code See demo instead https://bit.ly/2Wsjlep (code in https://github.com/Parsl/demo_multifacility)
  • 32. Other functionality provided by Parsl Globus. Delegated authentication and wide area data management Fault tolerance. Support for retries, checkpointing, and memoization Containers. Sandboxed execution environments for workers and tasks Data management. Automated staging with HTTP, FTP, and Globus Multi site. Combining executors/providers for execution across different resources Elasticity. Automated resource expansion/retraction based on workload Monitoring. Workflow and resource monitoring and visualization Reproducibility. Capture of workflow provenance in the task graph Jupyter integration. Seamless description and management of workflows Resource abstraction. Block-based model overlaying different providers and resources
  • 34. Parsl’s parallelism in Python • Simple: minimal new constructs • Safe: deterministic parallel programs through immutable input/output objects, dependency task graph, etc. • Scalable: efficient execution from laptops to the largest supercomputers • Flexible: programs composed from existing components and then applied to different resources/workloads