SlideShare a Scribd company logo
Linking Scientific Instruments and Computation:
Patterns, Technologies, Experiences
Ian Foster
The University of Chicago
Argonne National Laboratory
foster@anl.gov
Crescat scientia; vita excolatur
https://arxiv.org/abs/2204.05128
https://arxiv.org/abs/2208.09513
A new generation of
scientific instruments
New sensors produce data at high
velocities and in large volumes
New methods and structures are
required to capture and process
data, and to feed back to sensors
Increasing need to harness HPC,
cloud, edge computers
 An instrument becomes a set of
flows, overlaid on distributed
physical resources and software
Mark Boland, https://bit.ly/3cfSosk, 2017
Example: High-energy diffraction microscopy
Example: Ptychographic reconstruction
Example: Serial synchrotron crystallography
A modular, extensible approach to creating and running flows
Flows
Capture useful patterns as
sequences of actions.
Resource-independent
A modular, extensible approach to creating and running flows
Flows
Capture useful patterns as
sequences of actions.
Resource-independent
Action providers
Implement actions.
Resource-independent
Compute Action Provider: Run function at A.
Transfer Action Provider: Transfer from A to B.
Search Action Provider: Publish metadata.
…
A modular, extensible approach to creating and running flows
Flows
Capture useful patterns as
sequences of actions.
Resource-independent
Action providers
Implement actions.
Resource-independent
Fabric
Implements auth, data, and
compute APIs for
manipulating resources
Authenticate user.
Delegate credentials.
Manage file transfers.
Run jobs on computers.
Access data catalog.
…
Compute Action Provider: Run function at A.
Transfer Action Provider: Transfer from A to B.
Search Action Provider: Publish metadata.
…
Builds on
cloud-hosted
Globus
automation
services
Globus
automation
services
Triggers
Flows
Analysis
Computer
Timers
Queues
Step
Step
Step
Step
Event
Type: creation
Match: *tiff
Action
Queue
1 2 3 4
Action
Type: user selection
data: <feature extraction>
Options: approve/reject
Microscope
Step
Step
Step
Step
Flow run
Step
Step
Step
Step
Action
Type: transfer
From: microscope
To: analysis computer
https://arxiv.org/abs/2204.05128
Capture flows
in reusable
forms
In various ways:
- YAML documents
- Python “Gladier” SDK
- Web authoring
Customize
flow to
application
Capture flows
in reusable
forms
Customize
flow to
application
Specialize
flow to
resources
Capture flows
in reusable
forms
Check flow
status
Execute
specialized
flow
Customize
flow to
application
Specialize
flow to
resources
Capture flows
in reusable
forms
Execute
specialized
flow
Customize
flow to
application
Specialize
flow to
resources
Capture flows
in reusable
forms
Examine flow
actions
Execute
specialized
flow
Customize
flow to
application
Specialize
flow to
resources
Capture flows
in reusable
forms
Identify failed
actions
AI model
training
AI model
deployment
Data collection
& transfer
Cerebras
Catalog &
publish
Detector
Injector
x-ray
Target
FAIR data
Data reduction,
refine structures
Data collection
& transfer
AI accelerators, HPC
Ptychographic
reconstruction
Data collection
& transfer (raw)
Data collection &
transfer (position)
AI accelerators
Serial synchrotron crystallography
Ptychography
High energy diffraction microscopy
Flows have been developed for light source
data analysis, biomedical and materials
science data ingest, on-demand simulation, …
Determining protein structures 10-100x faster
“These data services have taken the
time to solve a structure from
weeks to days and now to hours”
Darren Sherrell, SBC beamline
scientist APS Sector 19
• Developed new automation pipeline to
collect data, analyze and visualize the data,
solve protein structure and load results into a
searchable portal for real-time feedback
• Achieved over 10-100x speed up in time to
solution of protein structures at APS beamline
• Leveraged unique DOE facilities at Advanced
Photon Source (SBC Sector 19) and ALCF
(Theta/ ThetaGPU, Petrel, and Data Portals)
Deposited first results in open repositories
Automation pipeline
(Chard, Vescovi, Foster, Blaiszik, Sherrell, Joachimiak, et al.)
ALCF Theta
ALCF Theta
ALCF Theta
Data Portals
APS
ALCF
Petrel
ALCF Theta
17
Flow invocations 2020-21 for five APS experiments
Numbers vary due to facility and experimental schedules.
We collect detailed performance data on flows
https://arxiv.org/abs/2204.05128
Transfer, compute,
and cataloging
costs for median
flows
Round-trip latencies for various action providers
• Current architecture
has ~1 sec minimum
latency due to cloud
interaction
• funcX latencies higher
due to polling strategy
• Both can be improved
as needed
We build on a universal auth, compute, & data fabric
Globus
Auth
Authentication and delegation mechanisms to control
what happens where
Run functions anywhere funcX deployed
Access data anywhere Globus Connect deployed
* See also: Integrated Research Infrastructure, computing continuum, grid
Globus
Connect
As of 4/2022
Globus hybrid “SaaS” model: Data fabric
Globus hybrid “SaaS” model: Compute fabric
funcX
agent
funcX
agent
Customer owned and
administered computer
with funcX agent
running on it
funcX service orchestrates function
execution via communication with
funcX agent
Polaris
Bebop
Cluster
Argonne
Leadership
Computing
Facility
Laboratory
Computing
Research
Center
Eagle store
APS
Computing
Orthros Cluster
APS DM system
Portal
server
Portal
server
Theta
Advanced
Photon
Source
Key: funcX agent
Globus Connect agent
API
API
API
User-defined flows
Globus-accessible
storage and
computing
(10,000s of systems)
Globus
Automation
Services
Building computationally-enhanced instruments:
There is much more to be done!
• We have worked so far with light sources and data ingest
pipelines
• We are pleased with adaptability and reliability
• Work required in capability (e.g., iteration) and performance
• Others are applying tools to microscopes and other
instruments
• New action providers are needed for instrument control
• We are eager to find partners who want to work with us on
developing and/or applying these methods and tools!
Thanks to talented colleagues!
Linking Scientific Instruments & HPC: Patterns, Technologies, Experiences
Globus Automation Services: Research process automation across the space-time continuum
Rachana
Ananthakrishnan
Josh Bryan Kyle Chard Ryan Chard Kurt McKee Jim Pruyne Brigitte Raumann
https://arxiv.org/abs/2204.05128 https://arxiv.org/abs/2208.09513
Raf Vescovi Ryan Chard Nick Saint Ben Blaiszik Jim Pruyne Tekin Bicer
Alex Lavens Zhengchun Liu Mike Papka Suresh Narayanan Nicholas Schwarz Kyle Chard
and
And sponsors
And the rest of
the ALCF, APS, &
Globus teams
Recap: Enabling
new instruments
Reusable flows
composed from an
extensible set of
actions
Built on global
auth, compute, data
fabric
Join us in applying
these methods!
https://arxiv.org/abs/2204.05128
https://arxiv.org/abs/2208.09513
https://www.globus.org/platform/services/flows
Ad

Recommended

Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
Globus
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Ian Foster
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
Ian Foster
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!
Globus
 
Data Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
Globus
 
ECP Application Development
ECP Application Development
inside-BigData.com
 
Sgg crest-presentation-final
Sgg crest-presentation-final
marpierc
 
Computation and Knowledge
Computation and Knowledge
Ian Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
Ian Foster
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Globus
 
Introduction to Globus Compute for researchers.pdf
Introduction to Globus Compute for researchers.pdf
SusanTussy1
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
Foundations for the Future of Science
Foundations for the Future of Science
Globus
 
Enduring Impact in Data-Driven Science
Enduring Impact in Data-Driven Science
Globus
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
Ian Foster
 
Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
Ian Foster
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtra
Globus
 
Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 
Globus at the United States Geological Survey
Globus at the United States Geological Survey
Globus
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening Keynote
Globus
 
Cifar
Cifar
Bill St. Arnaud
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
Ian Foster
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science Services
Ian Foster
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
Larry Smarr
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Globus
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Ian Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart Instruments
Ian Foster
 

More Related Content

Similar to Linking Scientific Instruments and Computation (20)

Computation and Knowledge
Computation and Knowledge
Ian Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
Ian Foster
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Globus
 
Introduction to Globus Compute for researchers.pdf
Introduction to Globus Compute for researchers.pdf
SusanTussy1
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
Foundations for the Future of Science
Foundations for the Future of Science
Globus
 
Enduring Impact in Data-Driven Science
Enduring Impact in Data-Driven Science
Globus
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
Ian Foster
 
Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
Ian Foster
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtra
Globus
 
Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 
Globus at the United States Geological Survey
Globus at the United States Geological Survey
Globus
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening Keynote
Globus
 
Cifar
Cifar
Bill St. Arnaud
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
Ian Foster
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science Services
Ian Foster
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
Larry Smarr
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Globus
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Ian Foster
 
Computation and Knowledge
Computation and Knowledge
Ian Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
Ian Foster
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Globus
 
Introduction to Globus Compute for researchers.pdf
Introduction to Globus Compute for researchers.pdf
SusanTussy1
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
Foundations for the Future of Science
Foundations for the Future of Science
Globus
 
Enduring Impact in Data-Driven Science
Enduring Impact in Data-Driven Science
Globus
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
Ian Foster
 
Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
Ian Foster
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtra
Globus
 
Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 
Globus at the United States Geological Survey
Globus at the United States Geological Survey
Globus
 
GlobusWorld 2019 Opening Keynote
GlobusWorld 2019 Opening Keynote
Globus
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
Ian Foster
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science Services
Ian Foster
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
Larry Smarr
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Globus
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Ian Foster
 

More from Ian Foster (20)

The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart Instruments
Ian Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
Ian Foster
 
Coding the Continuum
Coding the Continuum
Ian Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
Ian Foster
 
Learning Systems for Science
Learning Systems for Science
Ian Foster
 
Team Argon Summary
Team Argon Summary
Ian Foster
 
Thoughts on interoperability
Thoughts on interoperability
Ian Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
Ian Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
Ian Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
Ian Foster
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Ian Foster
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Ian Foster
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
Ian Foster
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
Ian Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart Instruments
Ian Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
Ian Foster
 
Coding the Continuum
Coding the Continuum
Ian Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
Ian Foster
 
Learning Systems for Science
Learning Systems for Science
Ian Foster
 
Team Argon Summary
Team Argon Summary
Ian Foster
 
Thoughts on interoperability
Thoughts on interoperability
Ian Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
Ian Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
Ian Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
Ian Foster
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Ian Foster
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Ian Foster
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
Ian Foster
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
Ian Foster
 
Ad

Recently uploaded (20)

AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Daily Lesson Log MATATAG ICT TEchnology 8
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
 
Daily Lesson Log MATATAG ICT TEchnology 8
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Enhance GitHub Copilot using MCP - Enterprise version.pdf
Nilesh Gule
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
AI VIDEO MAGAZINE - June 2025 - r/aivideo
AI VIDEO MAGAZINE - June 2025 - r/aivideo
1pcity Studios, Inc
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
 
2025_06_18 - OpenMetadata Community Meeting.pdf
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
UserCon Belgium: Honey, VMware increased my bill
UserCon Belgium: Honey, VMware increased my bill
stijn40
 
Lessons Learned from Developing Secure AI Workflows.pdf
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Quantum AI: Where Impossible Becomes Probable
Quantum AI: Where Impossible Becomes Probable
Saikat Basu
 
The Future of Product Management in AI ERA.pdf
The Future of Product Management in AI ERA.pdf
Alyona Owens
 
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
 
Ad

Linking Scientific Instruments and Computation

  • 1. Linking Scientific Instruments and Computation: Patterns, Technologies, Experiences Ian Foster The University of Chicago Argonne National Laboratory [email protected] Crescat scientia; vita excolatur https://arxiv.org/abs/2204.05128 https://arxiv.org/abs/2208.09513
  • 2. A new generation of scientific instruments New sensors produce data at high velocities and in large volumes New methods and structures are required to capture and process data, and to feed back to sensors Increasing need to harness HPC, cloud, edge computers  An instrument becomes a set of flows, overlaid on distributed physical resources and software Mark Boland, https://bit.ly/3cfSosk, 2017
  • 5. Example: Serial synchrotron crystallography
  • 6. A modular, extensible approach to creating and running flows Flows Capture useful patterns as sequences of actions. Resource-independent
  • 7. A modular, extensible approach to creating and running flows Flows Capture useful patterns as sequences of actions. Resource-independent Action providers Implement actions. Resource-independent Compute Action Provider: Run function at A. Transfer Action Provider: Transfer from A to B. Search Action Provider: Publish metadata. …
  • 8. A modular, extensible approach to creating and running flows Flows Capture useful patterns as sequences of actions. Resource-independent Action providers Implement actions. Resource-independent Fabric Implements auth, data, and compute APIs for manipulating resources Authenticate user. Delegate credentials. Manage file transfers. Run jobs on computers. Access data catalog. … Compute Action Provider: Run function at A. Transfer Action Provider: Transfer from A to B. Search Action Provider: Publish metadata. …
  • 9. Builds on cloud-hosted Globus automation services Globus automation services Triggers Flows Analysis Computer Timers Queues Step Step Step Step Event Type: creation Match: *tiff Action Queue 1 2 3 4 Action Type: user selection data: <feature extraction> Options: approve/reject Microscope Step Step Step Step Flow run Step Step Step Step Action Type: transfer From: microscope To: analysis computer https://arxiv.org/abs/2204.05128
  • 10. Capture flows in reusable forms In various ways: - YAML documents - Python “Gladier” SDK - Web authoring
  • 16. AI model training AI model deployment Data collection & transfer Cerebras Catalog & publish Detector Injector x-ray Target FAIR data Data reduction, refine structures Data collection & transfer AI accelerators, HPC Ptychographic reconstruction Data collection & transfer (raw) Data collection & transfer (position) AI accelerators Serial synchrotron crystallography Ptychography High energy diffraction microscopy Flows have been developed for light source data analysis, biomedical and materials science data ingest, on-demand simulation, …
  • 17. Determining protein structures 10-100x faster “These data services have taken the time to solve a structure from weeks to days and now to hours” Darren Sherrell, SBC beamline scientist APS Sector 19 • Developed new automation pipeline to collect data, analyze and visualize the data, solve protein structure and load results into a searchable portal for real-time feedback • Achieved over 10-100x speed up in time to solution of protein structures at APS beamline • Leveraged unique DOE facilities at Advanced Photon Source (SBC Sector 19) and ALCF (Theta/ ThetaGPU, Petrel, and Data Portals) Deposited first results in open repositories Automation pipeline (Chard, Vescovi, Foster, Blaiszik, Sherrell, Joachimiak, et al.) ALCF Theta ALCF Theta ALCF Theta Data Portals APS ALCF Petrel ALCF Theta 17
  • 18. Flow invocations 2020-21 for five APS experiments Numbers vary due to facility and experimental schedules.
  • 19. We collect detailed performance data on flows https://arxiv.org/abs/2204.05128 Transfer, compute, and cataloging costs for median flows
  • 20. Round-trip latencies for various action providers • Current architecture has ~1 sec minimum latency due to cloud interaction • funcX latencies higher due to polling strategy • Both can be improved as needed
  • 21. We build on a universal auth, compute, & data fabric Globus Auth Authentication and delegation mechanisms to control what happens where Run functions anywhere funcX deployed Access data anywhere Globus Connect deployed * See also: Integrated Research Infrastructure, computing continuum, grid Globus Connect
  • 23. Globus hybrid “SaaS” model: Data fabric
  • 24. Globus hybrid “SaaS” model: Compute fabric funcX agent funcX agent Customer owned and administered computer with funcX agent running on it funcX service orchestrates function execution via communication with funcX agent
  • 25. Polaris Bebop Cluster Argonne Leadership Computing Facility Laboratory Computing Research Center Eagle store APS Computing Orthros Cluster APS DM system Portal server Portal server Theta Advanced Photon Source Key: funcX agent Globus Connect agent API API API User-defined flows Globus-accessible storage and computing (10,000s of systems) Globus Automation Services
  • 26. Building computationally-enhanced instruments: There is much more to be done! • We have worked so far with light sources and data ingest pipelines • We are pleased with adaptability and reliability • Work required in capability (e.g., iteration) and performance • Others are applying tools to microscopes and other instruments • New action providers are needed for instrument control • We are eager to find partners who want to work with us on developing and/or applying these methods and tools!
  • 27. Thanks to talented colleagues! Linking Scientific Instruments & HPC: Patterns, Technologies, Experiences Globus Automation Services: Research process automation across the space-time continuum Rachana Ananthakrishnan Josh Bryan Kyle Chard Ryan Chard Kurt McKee Jim Pruyne Brigitte Raumann https://arxiv.org/abs/2204.05128 https://arxiv.org/abs/2208.09513 Raf Vescovi Ryan Chard Nick Saint Ben Blaiszik Jim Pruyne Tekin Bicer Alex Lavens Zhengchun Liu Mike Papka Suresh Narayanan Nicholas Schwarz Kyle Chard and And sponsors And the rest of the ALCF, APS, & Globus teams
  • 28. Recap: Enabling new instruments Reusable flows composed from an extensible set of actions Built on global auth, compute, data fabric Join us in applying these methods! https://arxiv.org/abs/2204.05128 https://arxiv.org/abs/2208.09513 https://www.globus.org/platform/services/flows

Editor's Notes

  • #3: Probe. Instrument. Meter.
  • #4: Metacomputing revisited 1010 x faster 105 x more tasks 106 x more data Link HPC, AI, instruments c still 3 x 108 m/s 
  • #5: Metacomputing revisited 1010 x faster 105 x more tasks 106 x more data Link HPC, AI, instruments c still 3 x 108 m/s 
  • #6: Metacomputing revisited 1010 x faster 105 x more tasks 106 x more data Link HPC, AI, instruments c still 3 x 108 m/s 
  • #28: Need to mention other Braid people! Eliu Huerta Bogdan Nicolae Justin Wozniak MENTION Eliu work?