SlideShare a Scribd company logo
ORNL is managed by UT-Battelle LLC for the US Department of Energy
Towards an Infrastructure for Enabling Systematic
Development and Research of Scientific
Workflow Systems and Applications
Rafael Ferreira da Silva, Ph.D.
Senior Research Scientist
Data Lifecycle and Scalable Workflows
https://rafaelsilva.com – silvarf@ornl.gov
This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S.
Department of Energy (DOE). The U.S. government retains and the publisher, by accepting the article for
publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide
license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S.
government purposes. DOE will provide public access to these results of federally sponsored research in
accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
2
2
Scientific Workflows
Workflows are becoming more complex
and require more sophisticated workflow
management capabilities
Workflows can now analyze terabyte-
scale datasets, be composed of millions
of individual tasks that execute for
milliseconds up to several hours on
distributed heterogenous platforms
Catering to these workflow features and
demands requires WMS research and
development at several levels, from
algorithms and systems to user
interfaces
Thomas
McCauley
©2018
CERN
©2019
NASA
3
3
Overview of Scientific Workflows Research Challenges
DataSpaces
Parallelism Interoperability Cloud
Sequence Language HPC HTC
Structure Composition
Design
Architecture
Provenance
Monitoring
NVRAM
In Situ
Burst Buffers
HDFS
In Transit
Execution
Clustering
Pilot Jobs
I/O
Energy
CO2
Budget
Network
Transfers
Optimization
Exception
Anomaly
Failure
Scheduling / Resource Provisioning
Modeling
and
Simulation
Community
Building
Education and
Training
Workflow Management Systems
Domain Science Applications
4
4
There is a myriad of workflow systems…
The workflow systems landscape
is segmented and presents
significant barriers to entry due
to the hundreds of seemingly
comparable, yet incompatible,
systems that exist
https://s.apache.org/existing-workflow-systems
https://github.com/pditommaso/awesome-pipeline
5
5
Characterization of Workflow Systems
for Extreme-Scale Applications
https://doi.org/10.1016/j.future.2017.02.026
6
6
Framework for Enabling Workflow
Research and Development
https://wfcommons.org
WfCommons is a framework that provides
a collection of tools for analyzing workflow
execution instances, producing realistic
synthetic workflow instances, and
simulating workflow executions
Open source Python package
to analyze instances,
generating workflow recipes,
and generating synthetic, yet
realistic, workflow instances
7
7
Modeling and Simulation https://wrench-project.org
Cyberinfrastructure Simulation Workbench
https://simgrid.org
Accurate and scalable
simulation models of
hardware/software
stacks
Objective #1: Make it easy to develop simulators of
complex CI application executions
Done by providing high-level, reusable simulation abstractions
Objective #2: Produce accurate and scalable simulations
Done by building on SimGrid
8
8
Workflows Community Summits
ExaWorks
Identify challenges and actionable directions
for short- and long-term community efforts
Jan 2021: Bringing the
Scientific Workflows
Community Together
April 2021: Advancing the
State-of-the-art of Scientific
Workflows Management
Systems Research and
Development
board of experts
http://workflowsri.org
https://exaworks.org
9
9
Workflows Community Summits (Jan 2021)
• Themes
– FAIR computational workflows
– Training and education for workflow
users
– AI workflows
– Exascale challenges and beyond
– APIs, interoperability, reuse, and
standards
– Building a workflows community
https://doi.org/10.5281/zenodo.4606958
ExaWorks
http://workflowsri.org
https://exaworks.org
10
10
Workflows Community Summits (April 2021)
• Topics
– Definition of common workflow
patterns and benchmarks
– Identifying paths toward
interoperability of workflow systems
– Improving workflow systems' interface
with legacy and emerging HPC
software and hardware stacks
https://doi.org/10.5281/zenodo.4915801
ExaWorks
http://workflowsri.org
https://exaworks.org
11
11
FAIR Computational Workflows
The FAIR principles have laid a foundation for sharing
and publishing digital assets and in particular data
Scientific workflows should support the creation of FAIR
data and themselves adhere to the FAIR principles
Challenges
Define FAIR principles for computational
workflows considering complex lifecycle from
specification to execution and data products
Define metrics to measure the FAIRness of a
workflow
Potential Efforts
Outline rules for FAIR workflows
Define recommendations for FAIR workflow
repositories
Automate FAIRness in workflows
12
12
AI/ML Workflows
Scientific workflows empowered with ML techniques
largely differ from traditional scientific workflows
running on HPC machines
Challenges
Lack of support for fine-grained data
management features (file level) and versioning
features
Lack of capabilities for enabling workflow
steering and dynamic workflow execution
Integration of ML frameworks into the current
HPC landscape
Potential Efforts
Develop use cases for sample problems with
representative workflow structures and data
types
Develop AI workflows that can benchmark HPC
systems
13
13
Heterogeneous Computing
Challenges
Lack of support for heterogeneity of compute
resources (CPUs, GPUs, FPGAs, RDMA, etc.)
Unfavorable design of resource descriptions
and mechanisms for workflow users/systems
Potential Efforts
Specify community benchmarks for
heterogeneous resources
Design common interfaces for enabling
interoperability among resources
Can we allow users to describe
their computational steps
independently of the compute
engine and/or hardware used
for running it?
Science applications tend to have a longer lifetime than
individual workflows technologies
This is partly due to the evolution of hardware
capabilities that force changes in software architecture
and application organization
14
14
Co-Design of Workflow Systems and Distributed
Computing Infrastructures
Challenges
Theoreticians produce results that are never
used by practitioners, and conversely
practitioners use approaches that may be vastly
sub-optimal because they are not informed by
any theory
Potential Efforts
Develop a co-design methodology for bridging
theoretical results and practical implementations
Simulation-driven development of simulated
platforms to quantify potential improvements
yielded by theoretical results
The disconnect between theoretical and practical works is an
impediment to the advancement of distributed computing
Towards an Infrastructure for Enabling
Systematic Development and
Research of Scientific Workflow
Systems and Applications
Rafael Ferreira da Silva, Ph.D.
Senior Research Scientist
Data Lifecycle and Scalable Workflows
https://rafaelsilva.com – silvarf@ornl.gov
Thank you!
Questions?
Ad

Recommended

A Semantic-Based Approach to Attain Reproducibility of Computational Environm...
A Semantic-Based Approach to Attain Reproducibility of Computational Environm...
Idafen Santana Pérez
 
Thrombus Training Dec. 2013
Thrombus Training Dec. 2013
CREATIS
 
Simulagora (Euroscipy2014 - Logilab)
Simulagora (Euroscipy2014 - Logilab)
Logilab
 
What's New in Cytoscape
What's New in Cytoscape
Keiichiro Ono
 
Scientific
Scientific
marpierc
 
Building Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization Workflows
Keiichiro Ono
 
The Exascale Computing Project and the future of HPC
The Exascale Computing Project and the future of HPC
inside-BigData.com
 
seanresume15-a
seanresume15-a
Sean Lynch
 
A Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning Architecture
Flurry, Inc.
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev Ops
Eric Chiang
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
Flurry, Inc.
 
Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2
Samrat Jha
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
Keiichiro Ono
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
Khalid Belhajjame
 
r1501e
r1501e
George Vamos
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
Khalid Belhajjame
 
Keyur_Joshi_resume - Copy
Keyur_Joshi_resume - Copy
Keyur Joshi
 
Network Visualization and Analysis with Cytoscape
Network Visualization and Analysis with Cytoscape
Alexander Pico
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
Chunlei Wu
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathon
Raul Palma
 
NextGenML
NextGenML
Moldovan Radu Adrian
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Sparkr sigmod
Sparkr sigmod
waqasm86
 
Weave GitOps - continuous delivery for any Kubernetes
Weave GitOps - continuous delivery for any Kubernetes
Weaveworks
 
FAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
Rafael Ferreira da Silva
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
An Overview of VIEW
An Overview of VIEW
Shiyong Lu
 
Scientific workflow-overview-2012-01-rev-2
Scientific workflow-overview-2012-01-rev-2
Terence Critchlow
 

More Related Content

What's hot (16)

A Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning Architecture
Flurry, Inc.
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev Ops
Eric Chiang
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
Flurry, Inc.
 
Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2
Samrat Jha
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
Keiichiro Ono
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
Khalid Belhajjame
 
r1501e
r1501e
George Vamos
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
Khalid Belhajjame
 
Keyur_Joshi_resume - Copy
Keyur_Joshi_resume - Copy
Keyur Joshi
 
Network Visualization and Analysis with Cytoscape
Network Visualization and Analysis with Cytoscape
Alexander Pico
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
Chunlei Wu
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathon
Raul Palma
 
NextGenML
NextGenML
Moldovan Radu Adrian
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Sparkr sigmod
Sparkr sigmod
waqasm86
 
Weave GitOps - continuous delivery for any Kubernetes
Weave GitOps - continuous delivery for any Kubernetes
Weaveworks
 
A Query Model for Ad Hoc Queries using a Scanning Architecture
A Query Model for Ad Hoc Queries using a Scanning Architecture
Flurry, Inc.
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev Ops
Eric Chiang
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
Flurry, Inc.
 
Pegasus-Poster-2016-final-v2
Pegasus-Poster-2016-final-v2
Samrat Jha
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
Keiichiro Ono
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
Khalid Belhajjame
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
Khalid Belhajjame
 
Keyur_Joshi_resume - Copy
Keyur_Joshi_resume - Copy
Keyur Joshi
 
Network Visualization and Analysis with Cytoscape
Network Visualization and Analysis with Cytoscape
Alexander Pico
 
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
Chunlei Wu
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathon
Raul Palma
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
Databricks
 
Sparkr sigmod
Sparkr sigmod
waqasm86
 
Weave GitOps - continuous delivery for any Kubernetes
Weave GitOps - continuous delivery for any Kubernetes
Weaveworks
 

Similar to Towards an Infrastructure for Enabling Systematic Development and Research of Scientific Workflow Systems and Applications (20)

FAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
Rafael Ferreira da Silva
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
An Overview of VIEW
An Overview of VIEW
Shiyong Lu
 
Scientific workflow-overview-2012-01-rev-2
Scientific workflow-overview-2012-01-rev-2
Terence Critchlow
 
FAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo
 
Towards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representations
dgarijo
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
dgarijo
 
Invited Talk for EUDAT Workshop in Barcelona
Invited Talk for EUDAT Workshop in Barcelona
Ilkay Altintas, Ph.D.
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
dgarijo
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
Richard Zijdeman
 
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
Stian Soiland-Reyes
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
Rafael Ferreira da Silva
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Carole Goble
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
Paolo Romano
 
Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud
lyingcom
 
FAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
Rafael Ferreira da Silva
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
inside-BigData.com
 
An Overview of VIEW
An Overview of VIEW
Shiyong Lu
 
Scientific workflow-overview-2012-01-rev-2
Scientific workflow-overview-2012-01-rev-2
Terence Critchlow
 
FAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
dgarijo
 
Towards Workflow Ecosystems Through Semantic and Standard Representations
Towards Workflow Ecosystems Through Semantic and Standard Representations
dgarijo
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
dgarijo
 
Invited Talk for EUDAT Workshop in Barcelona
Invited Talk for EUDAT Workshop in Barcelona
Ilkay Altintas, Ph.D.
 
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
dgarijo
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
Richard Zijdeman
 
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
Stian Soiland-Reyes
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
Rafael Ferreira da Silva
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Carole Goble
 
Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
Paolo Romano
 
Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud
lyingcom
 
FAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Ad

More from Rafael Ferreira da Silva (20)

Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Rafael Ferreira da Silva
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Rafael Ferreira da Silva
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Rafael Ferreira da Silva
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Rafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Rafael Ferreira da Silva
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
Rafael Ferreira da Silva
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
Rafael Ferreira da Silva
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Rafael Ferreira da Silva
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
Rafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
Rafael Ferreira da Silva
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Rafael Ferreira da Silva
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Rafael Ferreira da Silva
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
Rafael Ferreira da Silva
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
Rafael Ferreira da Silva
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Rafael Ferreira da Silva
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Rafael Ferreira da Silva
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
Rafael Ferreira da Silva
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Rafael Ferreira da Silva
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
Rafael Ferreira da Silva
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Rafael Ferreira da Silva
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Rafael Ferreira da Silva
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Rafael Ferreira da Silva
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Rafael Ferreira da Silva
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Rafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Rafael Ferreira da Silva
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
Rafael Ferreira da Silva
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
Rafael Ferreira da Silva
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Rafael Ferreira da Silva
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
Rafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
Rafael Ferreira da Silva
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Rafael Ferreira da Silva
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Rafael Ferreira da Silva
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
Rafael Ferreira da Silva
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
Rafael Ferreira da Silva
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Rafael Ferreira da Silva
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Rafael Ferreira da Silva
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
Rafael Ferreira da Silva
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Rafael Ferreira da Silva
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
Rafael Ferreira da Silva
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Rafael Ferreira da Silva
 
Ad

Recently uploaded (20)

Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
June Patch Tuesday
June Patch Tuesday
Ivanti
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Data Validation and System Interoperability
Data Validation and System Interoperability
Safe Software
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
 

Towards an Infrastructure for Enabling Systematic Development and Research of Scientific Workflow Systems and Applications

  • 1. ORNL is managed by UT-Battelle LLC for the US Department of Energy Towards an Infrastructure for Enabling Systematic Development and Research of Scientific Workflow Systems and Applications Rafael Ferreira da Silva, Ph.D. Senior Research Scientist Data Lifecycle and Scalable Workflows https://rafaelsilva.com – [email protected] This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy (DOE). The U.S. government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
  • 2. 2 2 Scientific Workflows Workflows are becoming more complex and require more sophisticated workflow management capabilities Workflows can now analyze terabyte- scale datasets, be composed of millions of individual tasks that execute for milliseconds up to several hours on distributed heterogenous platforms Catering to these workflow features and demands requires WMS research and development at several levels, from algorithms and systems to user interfaces Thomas McCauley ©2018 CERN ©2019 NASA
  • 3. 3 3 Overview of Scientific Workflows Research Challenges DataSpaces Parallelism Interoperability Cloud Sequence Language HPC HTC Structure Composition Design Architecture Provenance Monitoring NVRAM In Situ Burst Buffers HDFS In Transit Execution Clustering Pilot Jobs I/O Energy CO2 Budget Network Transfers Optimization Exception Anomaly Failure Scheduling / Resource Provisioning Modeling and Simulation Community Building Education and Training Workflow Management Systems Domain Science Applications
  • 4. 4 4 There is a myriad of workflow systems… The workflow systems landscape is segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist https://s.apache.org/existing-workflow-systems https://github.com/pditommaso/awesome-pipeline
  • 5. 5 5 Characterization of Workflow Systems for Extreme-Scale Applications https://doi.org/10.1016/j.future.2017.02.026
  • 6. 6 6 Framework for Enabling Workflow Research and Development https://wfcommons.org WfCommons is a framework that provides a collection of tools for analyzing workflow execution instances, producing realistic synthetic workflow instances, and simulating workflow executions Open source Python package to analyze instances, generating workflow recipes, and generating synthetic, yet realistic, workflow instances
  • 7. 7 7 Modeling and Simulation https://wrench-project.org Cyberinfrastructure Simulation Workbench https://simgrid.org Accurate and scalable simulation models of hardware/software stacks Objective #1: Make it easy to develop simulators of complex CI application executions Done by providing high-level, reusable simulation abstractions Objective #2: Produce accurate and scalable simulations Done by building on SimGrid
  • 8. 8 8 Workflows Community Summits ExaWorks Identify challenges and actionable directions for short- and long-term community efforts Jan 2021: Bringing the Scientific Workflows Community Together April 2021: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development board of experts http://workflowsri.org https://exaworks.org
  • 9. 9 9 Workflows Community Summits (Jan 2021) • Themes – FAIR computational workflows – Training and education for workflow users – AI workflows – Exascale challenges and beyond – APIs, interoperability, reuse, and standards – Building a workflows community https://doi.org/10.5281/zenodo.4606958 ExaWorks http://workflowsri.org https://exaworks.org
  • 10. 10 10 Workflows Community Summits (April 2021) • Topics – Definition of common workflow patterns and benchmarks – Identifying paths toward interoperability of workflow systems – Improving workflow systems' interface with legacy and emerging HPC software and hardware stacks https://doi.org/10.5281/zenodo.4915801 ExaWorks http://workflowsri.org https://exaworks.org
  • 11. 11 11 FAIR Computational Workflows The FAIR principles have laid a foundation for sharing and publishing digital assets and in particular data Scientific workflows should support the creation of FAIR data and themselves adhere to the FAIR principles Challenges Define FAIR principles for computational workflows considering complex lifecycle from specification to execution and data products Define metrics to measure the FAIRness of a workflow Potential Efforts Outline rules for FAIR workflows Define recommendations for FAIR workflow repositories Automate FAIRness in workflows
  • 12. 12 12 AI/ML Workflows Scientific workflows empowered with ML techniques largely differ from traditional scientific workflows running on HPC machines Challenges Lack of support for fine-grained data management features (file level) and versioning features Lack of capabilities for enabling workflow steering and dynamic workflow execution Integration of ML frameworks into the current HPC landscape Potential Efforts Develop use cases for sample problems with representative workflow structures and data types Develop AI workflows that can benchmark HPC systems
  • 13. 13 13 Heterogeneous Computing Challenges Lack of support for heterogeneity of compute resources (CPUs, GPUs, FPGAs, RDMA, etc.) Unfavorable design of resource descriptions and mechanisms for workflow users/systems Potential Efforts Specify community benchmarks for heterogeneous resources Design common interfaces for enabling interoperability among resources Can we allow users to describe their computational steps independently of the compute engine and/or hardware used for running it? Science applications tend to have a longer lifetime than individual workflows technologies This is partly due to the evolution of hardware capabilities that force changes in software architecture and application organization
  • 14. 14 14 Co-Design of Workflow Systems and Distributed Computing Infrastructures Challenges Theoreticians produce results that are never used by practitioners, and conversely practitioners use approaches that may be vastly sub-optimal because they are not informed by any theory Potential Efforts Develop a co-design methodology for bridging theoretical results and practical implementations Simulation-driven development of simulated platforms to quantify potential improvements yielded by theoretical results The disconnect between theoretical and practical works is an impediment to the advancement of distributed computing
  • 15. Towards an Infrastructure for Enabling Systematic Development and Research of Scientific Workflow Systems and Applications Rafael Ferreira da Silva, Ph.D. Senior Research Scientist Data Lifecycle and Scalable Workflows https://rafaelsilva.com – [email protected] Thank you! Questions?