SlideShare a Scribd company logo
2016-09-04 BioExcel SIG, ECCB, Amsterdam
Advances in Scientific
Workflow Environments
Carole Goble, Stian Soiland-Reyes
The University of Manchester
carole.goble@manchester.ac.uk
http://esciencelab.org.uk/
What is a Workflow?
• Orchestrating multiple
computational tasks
• Managing the control and
data flow between them
• In a world that is
homogeneous or
heterogeneous
• Tasks
– Local / remote
– Local / third party
– White, grey or black boxes
– Reliable / fragile
– Reserved / dynamic
– Various underpinning
infrastructure
– Various access controls
BioExcel: Biomolecular recognition
What is a Workflow?
Automation
– Automate computational aspects
– Repetitive pipelines, sweep campaigns
Scaling – compute cycles
– Make use of computational infrastructure
& handle large data
Abstraction – people cycles
– Shield complexity and incompatibilities
– Report, re-use, evolve, share, compare
– Repeat –Tweak - Repeat
– First class commodities
Provenance - reporting
– Capture, report and utilize log and data
lineage auto-documentation
– Traceable evolution, audit, transparency
– Compare
With thanks to Bertram Ludascher:WORKS 2015 Keynote
Findable
Accessible
Interoperable
Reusable
(Reproducible)
https://pegasus.isi.edu/2016/02/11/pegasus-powers-ligo-gravitational-waves-detection-analysis/
Laser Interferometer Gravitational-Wave
Observatory – first detection of gravitational
waves from colliding black holes
Morphological, hemodynamic and
structural analyses linked to aneurysm
genesis, growth and rupture.
[Susheel Varma] http://www.vph-share.eu/
http://taverna.org.uk
Galaxy
https://usegalaxy.org/
Marine metagenomics
+ Bespoke Scripts
[Rob Finn]
Open PHACTS
https://www.knime.org/
BioExcel
workflow
https://www.openphacts.org/
Targets
Pharmacological queries
target, compound and pathway data
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115460
Scripts, Ensemble toolkit, execution patterns
http://www.extasy-project.org/
http://www.myexperiment.org
WF Zoo
Advances in Scientific Workflow Environments
Workflow Patterns, templates
Data
wrangling
& analytics
Simulations
Instrument
pipelines
+
+
http://tpeterka.github.io/maui-project/
The Future of ScientificWorkflows, Report of DOEWorkshop 2015,
http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd
Workflow Patterns, templates
Data
wrangling
& analytics
Simulations
Instrument
pipelines
+
+ Garijo et al Common Motifs in ScientificWorkflows: An EmpiricalAnalysis, FGCS, 36, July 2014, 338–351
Workflow Patterns, templates
• Long running and complex code
• Tunable parameters and input sets
• Simulation sweeps / iterations
• Ensembles, comparisons
• Tricky set-ups, human-in-the-loop
interaction
• Computational steering
• In situ workflows – multiple tasks, same
box, within fixed time
– data locality.
– human-in-the-loop.
– capture provenance.
Data
wrangling
& analytics
Simulations
Instrument
pipelines
+
+
Traction + Examples
Reuse behaviours
Exploratory vs Production
Different kinds of user / deployment
Developer – User Ratios
BiologistDeveloper Computational
Scientist
Existing computational research
workflow systems
https://github.com/common-workflow-
WFMS Zoo
Existing computational research
workflow systems
https://github.com/common-workflow-
Existing computational research
workflow systems
s://github.com/common-workflow-language/common-workflow-language/wiki/Existing-
Workflow-systems
“Multi-scale” WFMS
• Workflow
Management
System
– Its design and reporting
environment
– Its execution
environment
• The tasks
– tools, codes and services
and their execution
environments
• Stack layer
– App level, infrastructure
level
Component making
Tasks loosely coupled through files,
• execute on geographically distributed
clusters, clouds, grids across systems
• execute on multiple facilities
• call host services (web / grid services)
DAIC
Distributed Area/Instrument
Computing
“Multi-scale” WFMS
Tasks tightly coupled
• exchanging info over memory/storage
• network of supercomputers
• In situ workflows – multiple tasks, same
box, within fixed time
HPC
Interoperability
Portability
Granularity
Maintenance
Workflow Environment Ecosystem
Copernicus workflow engine for
parallel adaptive molecular dynamics
• Peer-to-peer distributed
computing platform
– high-level parallelization of
statistical sampling problems
• Consolidation of heterogeneous
compute resources
• Automatic resource matching of
jobs against compute resources
• Automatic fault tolerance of
distributed work
• Workflow execution engine to
define a problem (reporting) and
trace its results live (provenance)
• Flexible plugin facilities
– programs to be integrated to the
workflow execution engine
Free Energy
Workflow using
GROMACS
http://copernicus-computing.org/
COMPs/PyCOMPs:
Programmer Productivity
framework
• Sequential programming
– Parallelisation and
distribution heavy-lifting
– Dependency detection
• Infrastructure unaware
– Abstract application from
underlying infrastructure
– Portability
• Standard Programming
Languages
– Java, Python, C/C++
• No (or few!) APIs
– Standard Java
Shield the
user/programmer
Exposure to the
infrastructure
System Design
Manage/minimize data transfers
Stop Press!
GUIs not essential!
• Canvas, drag-drop blocks, arrows,
run button
• Command-line & embedding in
developer or user applications
Scripts can be workflows!
• WMS<->Scripts
• Script vs Workflows/ASAP:
– Automation: *****
– Scaling: **
– Abstraction: *
– Provenance: **
Stop Press!
GUIs not essential!
• Canvas, drag-drop blocks, arrows,
run button
• Command-line & embedding in
developer or user applications
Scripts can be workflows!
• WMS <-> Scripts
• Script vs Workflows/ASAP:
– Automation: *****
– Scaling: **
– Abstraction: *
– Provenance: **
Work close to a problem-
specific ad-hoc data model
Domain Specific Language
"programming-lite" scripts
• wire with declarative
"makefile"-like DAG
Plus
• procedural scripting and
expressions in languages
like Javascript and Python
Nextflow, SnakeMake,
CommonWorkflow Language
GUIs Are Essential 
take-up by the user base
Workflowising script software eco-systems
prime example: provenance
ASAP
• common,
interoperable
provenance recording
– W3C PROV
ASAP
• YesWorkflow.org
– Annotations in script
yield workflow view
ASAP
• Library profilers
– noWorkflow
• runtime provenance
recorders
– Sumatra, RDataTracker
Provenance the link between computation and results
W3C PROV model standard
record for reporting
compare diffs/discrepancies
provenance analytics
track changes, adapt
partial repeat/reproduce
carry attributions
compute credits
compute data quality/trust
select data to keep/release
optimisation and debugging
Metadata propagation –where was the
physical sample collected, and who
should be attributed?
Task-based abstractions: simplifying
provenance using motifs and tool
annotations
“Free energy calculation” rather than 5
steps including preparation of PDB files
and GROMACS execution
Provenance the link workflow variants
and workflow reuse and repurpose
W3C PROV model standard?
record for reporting
compare diffs/discrepancies
provenance analytics
track changes, adapt
carry attributions
compute design credits
versioning, forking, cloning
Nested workflows
functions by stealth
Copy and paste fragmentation
Designing for reuse
Find and Go
Software practices
Systematic reuse
Guidelines for persistently identifying
software using DataCite
https://epubs.stfc.ac.uk/work/24058274
https://www.force11.org/software-citation-
principles
ASAP Wfms for FAIR Science
Automate: workflows,
programs and services folks
already use or want to use
Scale: Enable computational
productivity
Abstract: Enable human
productivity
Provenance: Record and use Usability
Workflow Plugged in Code
Reporting Comparison
Thanks to Bertram Ludascher
Dependency Management
Codes Behaviours & Reliability
● Task-specific “mini-workflow”
fragments
– e.g. using Gromacs, CPMD,
HADDOCK
● Packaged
– EGIVM images and Docker
containers
● Backed by existing registries
– ELIXIR’s bio.tools and EGI App DB
● Instantiated as cloud instances
– private (Open Nebula, Open Stack)
– public (e.g.AmazonAWS )
Application Building Blocks
BioExcel Virtualised Software Library
“transversal workflow units”, higher level operations
BioExcel Use cases
● Genomics
● Ensembl Molecular
simulations
● Free Energy simulations
● Multiscale modelling of
molecular basis for odor
and taste
● Biomolecular recognition
● Pharmacological queries
● Virtual Screening
Finding valid pathways through free-energy
landscapes: implementation of the “string of
swarms” method using Copernicus as a
workflow manager, and GROMACS as a
compute engine.
Workflow Interoperability.
• Common format for bioinformatics tool &
workflow execution
• Community based standards effort
• Designed for clusters & clouds
• Supports the use of containers (e.g. Docker)
• Specify data dependencies between steps
• Scatter/gather on steps
• Nest workflows in steps
• Develop your pipeline on your local computer
(optionally with Docker)
• Execute on your research cluster or in the cloud
• Deliver to users via workbenches
• EDAM ontology (ELIXIR-DK) to specify file
formats and reason about them: “FASTQ
Sanger” encoding is a type of FASTQ file
Workflow Research Object Bundle
researchobject.org
Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects,
JWeb Semantics doi:10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
Z. Zhao et al., “Workflow bus for e-Science”, in IEEE e-Science 2006, Amsterdam
2007
2015
http://bioexcel.eu/events/bioexcel-workflow-training-for-computational-biomolecular-
research/
Adam Hospital (IRB), Anna Montras (IRB), Stian Soiland-Reyes (UNIMAN), Alexandre Bonvin
(UU), Adrien Melquiond (UU), Josep Lluís Gelpí (BSC), Daniele Lezzi (BSC), Steven Newhouse
(EBI), Jose A. Dianes (EBI), Mark Abraham (KTH), Rossen Apostolov (KTH), Emiliano Ippoliti
(Jülich), Adam Carter (UEDIN), Darren J. White (UEDIN)
Slides: Bertram Ludascher, Ewa Deelman, Vasa Curcin, Paolo Missier, Pinar Alper, Susheel
Varma, Rob Finn, Michael Crusoe, Rizos Sakellariou
Sign up
ASAP!
Bonus Slides
Ad

More Related Content

What's hot (20)

The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble
 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...
FAIRDOM
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
Carole Goble
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
Carole Goble
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
Carole Goble
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
Carole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Carole Goble
 
ROHub
ROHubROHub
ROHub
Raul Palma
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
Carole Goble
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Carole Goble
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
Martin Scharm
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
Carole Goble
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
FAIRDOM
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
Carole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
FAIRDOM
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble
 
Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...Reproducibility of model-based results: standards, infrastructure, and recogn...
Reproducibility of model-based results: standards, infrastructure, and recogn...
FAIRDOM
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
Carole Goble
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
Carole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Carole Goble
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
Carole Goble
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Carole Goble
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
Martin Scharm
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
Carole Goble
 
Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...Citing data in research articles: principles, implementation, challenges - an...
Citing data in research articles: principles, implementation, challenges - an...
FAIRDOM
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
Carole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
FAIRDOM
 

Viewers also liked (10)

Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...
FAIRDOM
 
Improving the management of computational models.
Improving the management of computational models.Improving the management of computational models.
Improving the management of computational models.
FAIRDOM
 
FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)
FAIRDOM
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
FAIRDOM
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
FAIRDOM
 
Report of the second FAIRDOM foundry
Report of the second FAIRDOM foundryReport of the second FAIRDOM foundry
Report of the second FAIRDOM foundry
FAIRDOM
 
Licensing, Citation and Sustainability.
Licensing, Citation and Sustainability.Licensing, Citation and Sustainability.
Licensing, Citation and Sustainability.
FAIRDOM
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
FAIRDOM
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
FAIRDOM
 
Precision Medicine in Oncology Informatics
Precision Medicine in Oncology InformaticsPrecision Medicine in Oncology Informatics
Precision Medicine in Oncology Informatics
Warren Kibbe
 
Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...Capturing the context: one small(ish step for modellers, one giant leap for m...
Capturing the context: one small(ish step for modellers, one giant leap for m...
FAIRDOM
 
Improving the management of computational models.
Improving the management of computational models.Improving the management of computational models.
Improving the management of computational models.
FAIRDOM
 
FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)FAIR data and model management for systems biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)
FAIRDOM
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
FAIRDOM
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
FAIRDOM
 
Report of the second FAIRDOM foundry
Report of the second FAIRDOM foundryReport of the second FAIRDOM foundry
Report of the second FAIRDOM foundry
FAIRDOM
 
Licensing, Citation and Sustainability.
Licensing, Citation and Sustainability.Licensing, Citation and Sustainability.
Licensing, Citation and Sustainability.
FAIRDOM
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
FAIRDOM
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
FAIRDOM
 
Precision Medicine in Oncology Informatics
Precision Medicine in Oncology InformaticsPrecision Medicine in Oncology Informatics
Precision Medicine in Oncology Informatics
Warren Kibbe
 
Ad

Similar to Advances in Scientific Workflow Environments (20)

2016-10-20 BioExcel: Advances in Scientific Workflow Environments
2016-10-20 BioExcel: Advances in Scientific Workflow Environments2016-10-20 BioExcel: Advances in Scientific Workflow Environments
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
Stian Soiland-Reyes
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
myGrid team
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
Globus
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Timothy McPhillips
 
eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
Andrea Wiggins
 
Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...
Ákos Horváth
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Ed Dodds
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Carole Goble
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
Globus
 
IncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery Labs
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
David Wallom
 
Scientific
Scientific Scientific
Scientific
marpierc
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
myGrid team
 
Introduction to Web Services - Architecture
Introduction to Web Services - ArchitectureIntroduction to Web Services - Architecture
Introduction to Web Services - Architecture
Matrix823409
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
Marco Quartulli
 
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
2016-10-20 BioExcel: Advances in Scientific Workflow Environments2016-10-20 BioExcel: Advances in Scientific Workflow Environments
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
Stian Soiland-Reyes
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Pinar Alper
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
myGrid team
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
Globus
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Timothy McPhillips
 
eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
Andrea Wiggins
 
Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...Next-Generation Completeness and Consistency Management in the Digital Threa...
Next-Generation Completeness and Consistency Management in the Digital Threa...
Ákos Horváth
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Ed Dodds
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Carole Goble
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
Globus
 
IncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptxIncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery_presentation_Incose_EMEA_WSEC.pptx
IncQuery Labs
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
David Wallom
 
Scientific
Scientific Scientific
Scientific
marpierc
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
myGrid team
 
Introduction to Web Services - Architecture
Introduction to Web Services - ArchitectureIntroduction to Web Services - Architecture
Introduction to Web Services - Architecture
Matrix823409
 
Ad

More from Carole Goble (20)

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
Carole Goble
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Carole Goble
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
Carole Goble
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
Carole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
Carole Goble
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
Carole Goble
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
Carole Goble
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
Carole Goble
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Carole Goble
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
Carole Goble
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
Carole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
Carole Goble
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
Carole Goble
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
Carole Goble
 

Recently uploaded (20)

Chromatography, types, techniques, ppt.pptx
Chromatography, types, techniques, ppt.pptxChromatography, types, techniques, ppt.pptx
Chromatography, types, techniques, ppt.pptx
Dr Showkat Ahmad Wani
 
whole ANATOMY OF EYE with eye ball .pptx
whole ANATOMY OF EYE with eye ball .pptxwhole ANATOMY OF EYE with eye ball .pptx
whole ANATOMY OF EYE with eye ball .pptx
simranjangra13
 
Metallurgical process class 11_Govinda Pathak
Metallurgical process class 11_Govinda PathakMetallurgical process class 11_Govinda Pathak
Metallurgical process class 11_Govinda Pathak
GovindaPathak6
 
2025 Insilicogen Company Korean Brochure
2025 Insilicogen Company Korean Brochure2025 Insilicogen Company Korean Brochure
2025 Insilicogen Company Korean Brochure
Insilico Gen
 
Lipids: Classification, Functions, Metabolism, and Dietary Recommendations
Lipids: Classification, Functions, Metabolism, and Dietary RecommendationsLipids: Classification, Functions, Metabolism, and Dietary Recommendations
Lipids: Classification, Functions, Metabolism, and Dietary Recommendations
Sarumathi Murugesan
 
RAPID DIAGNOSTIC TEST (RDT) overviewppt.pptx
RAPID DIAGNOSTIC TEST (RDT)  overviewppt.pptxRAPID DIAGNOSTIC TEST (RDT)  overviewppt.pptx
RAPID DIAGNOSTIC TEST (RDT) overviewppt.pptx
nietakam
 
Introduction to Mobile Forensics Part 1.pptx
Introduction to Mobile Forensics Part 1.pptxIntroduction to Mobile Forensics Part 1.pptx
Introduction to Mobile Forensics Part 1.pptx
Nivya George
 
Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025
Shiva Nejati
 
Parallel resonance circuits of science.pdf
Parallel resonance circuits of science.pdfParallel resonance circuits of science.pdf
Parallel resonance circuits of science.pdf
rk5867336912
 
SuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptxSuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptx
BurkanAlpKale
 
when is CT scan need in breast cancer patient.pptx
when is CT scan need in breast cancer patient.pptxwhen is CT scan need in breast cancer patient.pptx
when is CT scan need in breast cancer patient.pptx
Rukhnuddin Al-daudar
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure
Insilico Gen
 
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
ss0077014
 
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Sérgio Sacani
 
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdfBotany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
JseleBurgos
 
Polymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer PintPolymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer Pint
Dr Showkat Ahmad Wani
 
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
home
 
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptxQuiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
NutriGen
 
Concise Notes on tree and graph data structure
Concise Notes on tree and graph data structureConcise Notes on tree and graph data structure
Concise Notes on tree and graph data structure
YekoyeTigabu2
 
Chromatography, types, techniques, ppt.pptx
Chromatography, types, techniques, ppt.pptxChromatography, types, techniques, ppt.pptx
Chromatography, types, techniques, ppt.pptx
Dr Showkat Ahmad Wani
 
whole ANATOMY OF EYE with eye ball .pptx
whole ANATOMY OF EYE with eye ball .pptxwhole ANATOMY OF EYE with eye ball .pptx
whole ANATOMY OF EYE with eye ball .pptx
simranjangra13
 
Metallurgical process class 11_Govinda Pathak
Metallurgical process class 11_Govinda PathakMetallurgical process class 11_Govinda Pathak
Metallurgical process class 11_Govinda Pathak
GovindaPathak6
 
2025 Insilicogen Company Korean Brochure
2025 Insilicogen Company Korean Brochure2025 Insilicogen Company Korean Brochure
2025 Insilicogen Company Korean Brochure
Insilico Gen
 
Lipids: Classification, Functions, Metabolism, and Dietary Recommendations
Lipids: Classification, Functions, Metabolism, and Dietary RecommendationsLipids: Classification, Functions, Metabolism, and Dietary Recommendations
Lipids: Classification, Functions, Metabolism, and Dietary Recommendations
Sarumathi Murugesan
 
RAPID DIAGNOSTIC TEST (RDT) overviewppt.pptx
RAPID DIAGNOSTIC TEST (RDT)  overviewppt.pptxRAPID DIAGNOSTIC TEST (RDT)  overviewppt.pptx
RAPID DIAGNOSTIC TEST (RDT) overviewppt.pptx
nietakam
 
Introduction to Mobile Forensics Part 1.pptx
Introduction to Mobile Forensics Part 1.pptxIntroduction to Mobile Forensics Part 1.pptx
Introduction to Mobile Forensics Part 1.pptx
Nivya George
 
Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025Keynote presentation at DeepTest Workshop 2025
Keynote presentation at DeepTest Workshop 2025
Shiva Nejati
 
Parallel resonance circuits of science.pdf
Parallel resonance circuits of science.pdfParallel resonance circuits of science.pdf
Parallel resonance circuits of science.pdf
rk5867336912
 
SuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptxSuperconductingMagneticEnergyStorage.pptx
SuperconductingMagneticEnergyStorage.pptx
BurkanAlpKale
 
when is CT scan need in breast cancer patient.pptx
when is CT scan need in breast cancer patient.pptxwhen is CT scan need in breast cancer patient.pptx
when is CT scan need in breast cancer patient.pptx
Rukhnuddin Al-daudar
 
APES 6.5 Presentation Fossil Fuels .pdf
APES 6.5 Presentation Fossil Fuels   .pdfAPES 6.5 Presentation Fossil Fuels   .pdf
APES 6.5 Presentation Fossil Fuels .pdf
patelereftu
 
2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure2025 Insilicogen Company English Brochure
2025 Insilicogen Company English Brochure
Insilico Gen
 
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
Water analysis practical for ph, tds, hardness, acidity, conductivity, and ba...
ss0077014
 
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Direct Evidence for r-process Nucleosynthesis in Delayed MeV Emission from th...
Sérgio Sacani
 
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdfBotany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
Botany-Finals-Patterns-of-Inheritance-DNA-Synthesis.pdf
JseleBurgos
 
Polymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer PintPolymerase Chain Reaction (PCR).Poer Pint
Polymerase Chain Reaction (PCR).Poer Pint
Dr Showkat Ahmad Wani
 
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
DNA Profiling and STR Typing in Forensics: From Molecular Techniques to Real-...
home
 
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptxQuiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
Quiz 3 Basic Nutrition 1ST Yearcmcmc.pptx
NutriGen
 
Concise Notes on tree and graph data structure
Concise Notes on tree and graph data structureConcise Notes on tree and graph data structure
Concise Notes on tree and graph data structure
YekoyeTigabu2
 

Advances in Scientific Workflow Environments

  • 1. 2016-09-04 BioExcel SIG, ECCB, Amsterdam Advances in Scientific Workflow Environments Carole Goble, Stian Soiland-Reyes The University of Manchester [email protected] http://esciencelab.org.uk/
  • 2. What is a Workflow? • Orchestrating multiple computational tasks • Managing the control and data flow between them • In a world that is homogeneous or heterogeneous • Tasks – Local / remote – Local / third party – White, grey or black boxes – Reliable / fragile – Reserved / dynamic – Various underpinning infrastructure – Various access controls BioExcel: Biomolecular recognition
  • 3. What is a Workflow? Automation – Automate computational aspects – Repetitive pipelines, sweep campaigns Scaling – compute cycles – Make use of computational infrastructure & handle large data Abstraction – people cycles – Shield complexity and incompatibilities – Report, re-use, evolve, share, compare – Repeat –Tweak - Repeat – First class commodities Provenance - reporting – Capture, report and utilize log and data lineage auto-documentation – Traceable evolution, audit, transparency – Compare With thanks to Bertram Ludascher:WORKS 2015 Keynote Findable Accessible Interoperable Reusable (Reproducible)
  • 5. Morphological, hemodynamic and structural analyses linked to aneurysm genesis, growth and rupture. [Susheel Varma] http://www.vph-share.eu/ http://taverna.org.uk
  • 7. Marine metagenomics + Bespoke Scripts [Rob Finn]
  • 8. Open PHACTS https://www.knime.org/ BioExcel workflow https://www.openphacts.org/ Targets Pharmacological queries target, compound and pathway data http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115460
  • 9. Scripts, Ensemble toolkit, execution patterns http://www.extasy-project.org/
  • 12. Workflow Patterns, templates Data wrangling & analytics Simulations Instrument pipelines + + http://tpeterka.github.io/maui-project/ The Future of ScientificWorkflows, Report of DOEWorkshop 2015, http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd
  • 13. Workflow Patterns, templates Data wrangling & analytics Simulations Instrument pipelines + + Garijo et al Common Motifs in ScientificWorkflows: An EmpiricalAnalysis, FGCS, 36, July 2014, 338–351
  • 14. Workflow Patterns, templates • Long running and complex code • Tunable parameters and input sets • Simulation sweeps / iterations • Ensembles, comparisons • Tricky set-ups, human-in-the-loop interaction • Computational steering • In situ workflows – multiple tasks, same box, within fixed time – data locality. – human-in-the-loop. – capture provenance. Data wrangling & analytics Simulations Instrument pipelines + +
  • 15. Traction + Examples Reuse behaviours Exploratory vs Production Different kinds of user / deployment Developer – User Ratios BiologistDeveloper Computational Scientist
  • 16. Existing computational research workflow systems https://github.com/common-workflow- WFMS Zoo
  • 17. Existing computational research workflow systems https://github.com/common-workflow-
  • 18. Existing computational research workflow systems s://github.com/common-workflow-language/common-workflow-language/wiki/Existing- Workflow-systems
  • 19. “Multi-scale” WFMS • Workflow Management System – Its design and reporting environment – Its execution environment • The tasks – tools, codes and services and their execution environments • Stack layer – App level, infrastructure level
  • 20. Component making Tasks loosely coupled through files, • execute on geographically distributed clusters, clouds, grids across systems • execute on multiple facilities • call host services (web / grid services) DAIC Distributed Area/Instrument Computing “Multi-scale” WFMS Tasks tightly coupled • exchanging info over memory/storage • network of supercomputers • In situ workflows – multiple tasks, same box, within fixed time HPC Interoperability Portability Granularity Maintenance
  • 22. Copernicus workflow engine for parallel adaptive molecular dynamics • Peer-to-peer distributed computing platform – high-level parallelization of statistical sampling problems • Consolidation of heterogeneous compute resources • Automatic resource matching of jobs against compute resources • Automatic fault tolerance of distributed work • Workflow execution engine to define a problem (reporting) and trace its results live (provenance) • Flexible plugin facilities – programs to be integrated to the workflow execution engine Free Energy Workflow using GROMACS http://copernicus-computing.org/
  • 23. COMPs/PyCOMPs: Programmer Productivity framework • Sequential programming – Parallelisation and distribution heavy-lifting – Dependency detection • Infrastructure unaware – Abstract application from underlying infrastructure – Portability • Standard Programming Languages – Java, Python, C/C++ • No (or few!) APIs – Standard Java
  • 24. Shield the user/programmer Exposure to the infrastructure System Design Manage/minimize data transfers
  • 25. Stop Press! GUIs not essential! • Canvas, drag-drop blocks, arrows, run button • Command-line & embedding in developer or user applications Scripts can be workflows! • WMS<->Scripts • Script vs Workflows/ASAP: – Automation: ***** – Scaling: ** – Abstraction: * – Provenance: **
  • 26. Stop Press! GUIs not essential! • Canvas, drag-drop blocks, arrows, run button • Command-line & embedding in developer or user applications Scripts can be workflows! • WMS <-> Scripts • Script vs Workflows/ASAP: – Automation: ***** – Scaling: ** – Abstraction: * – Provenance: ** Work close to a problem- specific ad-hoc data model Domain Specific Language "programming-lite" scripts • wire with declarative "makefile"-like DAG Plus • procedural scripting and expressions in languages like Javascript and Python Nextflow, SnakeMake, CommonWorkflow Language
  • 27. GUIs Are Essential  take-up by the user base
  • 28. Workflowising script software eco-systems prime example: provenance ASAP • common, interoperable provenance recording – W3C PROV ASAP • YesWorkflow.org – Annotations in script yield workflow view ASAP • Library profilers – noWorkflow • runtime provenance recorders – Sumatra, RDataTracker
  • 29. Provenance the link between computation and results W3C PROV model standard record for reporting compare diffs/discrepancies provenance analytics track changes, adapt partial repeat/reproduce carry attributions compute credits compute data quality/trust select data to keep/release optimisation and debugging Metadata propagation –where was the physical sample collected, and who should be attributed? Task-based abstractions: simplifying provenance using motifs and tool annotations “Free energy calculation” rather than 5 steps including preparation of PDB files and GROMACS execution
  • 30. Provenance the link workflow variants and workflow reuse and repurpose W3C PROV model standard? record for reporting compare diffs/discrepancies provenance analytics track changes, adapt carry attributions compute design credits versioning, forking, cloning Nested workflows functions by stealth Copy and paste fragmentation Designing for reuse Find and Go Software practices Systematic reuse Guidelines for persistently identifying software using DataCite https://epubs.stfc.ac.uk/work/24058274 https://www.force11.org/software-citation- principles
  • 31. ASAP Wfms for FAIR Science Automate: workflows, programs and services folks already use or want to use Scale: Enable computational productivity Abstract: Enable human productivity Provenance: Record and use Usability Workflow Plugged in Code Reporting Comparison Thanks to Bertram Ludascher
  • 33. ● Task-specific “mini-workflow” fragments – e.g. using Gromacs, CPMD, HADDOCK ● Packaged – EGIVM images and Docker containers ● Backed by existing registries – ELIXIR’s bio.tools and EGI App DB ● Instantiated as cloud instances – private (Open Nebula, Open Stack) – public (e.g.AmazonAWS ) Application Building Blocks BioExcel Virtualised Software Library “transversal workflow units”, higher level operations
  • 34. BioExcel Use cases ● Genomics ● Ensembl Molecular simulations ● Free Energy simulations ● Multiscale modelling of molecular basis for odor and taste ● Biomolecular recognition ● Pharmacological queries ● Virtual Screening
  • 35. Finding valid pathways through free-energy landscapes: implementation of the “string of swarms” method using Copernicus as a workflow manager, and GROMACS as a compute engine.
  • 36. Workflow Interoperability. • Common format for bioinformatics tool & workflow execution • Community based standards effort • Designed for clusters & clouds • Supports the use of containers (e.g. Docker) • Specify data dependencies between steps • Scatter/gather on steps • Nest workflows in steps • Develop your pipeline on your local computer (optionally with Docker) • Execute on your research cluster or in the cloud • Deliver to users via workbenches • EDAM ontology (ELIXIR-DK) to specify file formats and reason about them: “FASTQ Sanger” encoding is a type of FASTQ file
  • 37. Workflow Research Object Bundle researchobject.org Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects, JWeb Semantics doi:10.1016/j.websem.2015.01.003 application/vnd.wf4ever.robundle+zip
  • 38. Z. Zhao et al., “Workflow bus for e-Science”, in IEEE e-Science 2006, Amsterdam
  • 40. http://bioexcel.eu/events/bioexcel-workflow-training-for-computational-biomolecular- research/ Adam Hospital (IRB), Anna Montras (IRB), Stian Soiland-Reyes (UNIMAN), Alexandre Bonvin (UU), Adrien Melquiond (UU), Josep Lluís Gelpí (BSC), Daniele Lezzi (BSC), Steven Newhouse (EBI), Jose A. Dianes (EBI), Mark Abraham (KTH), Rossen Apostolov (KTH), Emiliano Ippoliti (Jülich), Adam Carter (UEDIN), Darren J. White (UEDIN) Slides: Bertram Ludascher, Ewa Deelman, Vasa Curcin, Paolo Missier, Pinar Alper, Susheel Varma, Rob Finn, Michael Crusoe, Rizos Sakellariou Sign up ASAP!