SlideShare a Scribd company logo
Delivering a Campus Research Data
Service with Globus
MAGIC Meeting
Ian Foster
May 7, 2014
Give me your data,
your terabytes,
Your huddled files
yearning to
breathe free …
Building campus research
data services
“It’s deja vu all over again.”
Yogi Berra
Globus Toolkit
Globus Online
Globus
Globus
What is Globus (today)?
Big data transfer
and sharing…
…simply, securely, and fast…
…directly from your own
storage systems
Reliable, secure, high-performance
file transfer and synchronization
• “Fire-and-forget”
transfers
• Automatic fault
recovery
• Seamless security
integration
• Powerful GUI
and APIs
Data
Source
Data
Destination
User initiates
transfer
request
1
Globus
moves and
syncs files
2
Globus
notifies user
3
Simple, secure sharing off existing
storage systems
Data
Source
User A selects
file(s) to
share, selects
user or group, and
sets permissions
1
Globus tracks shared
files; no need to
move files to cloud
storage!
2
User B logs in
to Globus and
accesses
shared file
3
• Easily share large data
with any user or group
• No cloud storage
required
15,000
registered users
8,000
active endpoints
(in the past year)
3 billion
files transferred
Globus status and publication plans
Globus is enabling…
Study of the structure
and evolution of
galaxies, the nature
of dark energy, and
cosmological history
of the universe
Sloan Digital Sky Survey
Source: University of Utah
Joel Brownstein
University of Utah
Globus is enabling…
Development
of numerical
simulations of
severe storms
for improved
responsiveness
to weather
events
Weather Research and Forecasting Model
Source: UCAR
Ann Syrowski
University of Illinois
Globus is enabling…
Pediatric brain
research by
enhancing
analysis of
genetic material
in pursuit of the
underlying
cause
Communication impairment by genetic variants
Source: Wikimedia Commons
William Dobyns
U. Washington
Globus increasingly used to build
campus-wide data service
Source: University of Nebraska
Holland Computing Center
Enable campus computing
facilities to better utilize
high performance network
infrastructure
Typical deployment
Science
DMZ
+
Globus
Omaha Core
Holland Computing Center
Internet2 via GPN
East/West
Campus Networks
(firewalls + IDS)
Lincoln Core Router
2x 10 Gigabit
DYNES
Equipment
UNL Science DMZ
Campus Network
Researchers
WDM
Composit Traffic
100 Gigabit
100 Gigabit Capable
West Campus
Border Router
10x CMS Data
Transfer Nodes
Omaha
HPC
Clusters
100 Gigabit Capable
East Campus
Border Router
perfSONAR
+ BRO IDS
additions
10 Gigabit
4x 10 Gigabit
100 Gigabit
perfSONAR
Bro IDS
Future Redundant
I2 Path (2015+)
Lincoln Core Switch
(CMS and HPC clusters) Center for
Brain Imaging
and Behavior
10x 10 Gigabit
Internet2 via CIC
Composit Traffic
100 Gigabit
Source:
University of Nebraska
Holland Computing Center
Instruments are increasingly driving the
need for broader data service deployments
Next Gen
Sequencer
Light Sheet Microscope
MRI Advanced
Light Source
Globus enables users to manage data as
research requirements scale up or down
Research Computing HPC Cluster
Lab Server
Campus Home Filesystem
Desktop Workstation
Personal Laptop
XSEDE Resource
Public Cloud
Globus product
development
highlights in 2013-14
Sharing generally available
Much improved Web UI
Globus Connect Server
• Native RPM and Debian packaging
• Improved configuration management
• Multi-server setup
• OAuth support
Management console: “Flight Control”
Amazon S3 Endpoints
85
U.S. campuses
We are a non-profit, delivering a
production-grade service to the
non-profit research community
Our challenge:
Sustainability
We are a non-profit, delivering a
production-grade service to the
non-profit research community
Globus Provider Subscriptions
• Managed Endpoints
– Priority support
– Management console
– Usage reports
– Mass Storage System optimization
– Host shared endpoints
– Integration support
• Plus Subscriptions
– Create and manage shared endpoints
– Personal transfers
• Branded Web Site
• Alternate Identity Provider (InCommon is standard)
https://www.globus.org/provider-plans
NET+ Globus
• Internet2 members get discounted
Globus Provider subscriptions
• Completing “Service Validation” phase
– Sponsors:
Cornell, U.Michigan, Yale, U.Missouri, and
U.Chicago
• Available to “Early Adopters” soon
Bridging the gap to sustainability
• $500,000 from Sloan Foundation
• Recognition of what it takes to
“cross the chasm”
• Funds non-R&D
activities
– User Support
– Operations
– Marketing
Globus Behind the Scenes
Identity, Group, Profile
Management Services
…
Sharing Service
Transfer Service
Globus Toolkit
GlobusConnect
Globus Platform-as-a-Service
Identity, Group, Profile
Management Services
…
Sharing Service
Transfer Service
Globus Toolkit
GlobusAPIs
GlobusConnect
globus
genomics
Flexible, scalable, affordabl
e
genomics analysis
for all biologists
+
Data management
PaaS
Next-gen sequence
analysis SaaS
+
Scalable IaaS
Globus Genomics on AWS
Exome: $3 – $20
Whole Genome: $20 – $50
RNA-Seq: <$5
Alternatives are at 10-20x
Dobyns Lab
Exome analysis
20x speed-up
Next: 50x
Cox Lab
Consensus variant calling
134 samples; 4 days
<0.01% Mendel error rate
Next: 13,000 samples
Campus Data Service User Stories
• “I need a good place to store / backup / archive
my (big) research data, at a reasonable price.”
• “I need to easily, quickly, and reliably move or
mirror portions of my data to other places.”
• “I need a way to easily and securely share my
data with my colleagues at other institutions.”
Campus Data Service User Stories
• “I need a good place to store / backup / archive
my (big) research data, at a reasonable price.”
• “I need to easily, quickly, and reliably move or
mirror portions of my data to other places.”
• “I need a way to easily and securely share my
data with my colleagues at other institutions.”
• “I want to publish my data.”
• “I want to discover published data.”
An all-too familiar tale …
Data is:
Identified
Described
Curated
Verifiable
Accessible
Preserved
What does it mean to publish?
I can:
Search
Browse
Access
the data
What does it mean to discover?
Globus
data
publication
services
Announcing…
Metadata
Access Control
License
Storage
Curation
Workflow
Policies
Collection
Teeing Up a Few Terms …
Metadata
DataMetadata
Data
Metadata
Data
Dataset
Dataset
Dataset
Community
Argonne Storage
Univ. of Chicago Argonne IIT UIUC
Demo Scenario
3. Assemble Dataset
(Transfer Data)
Argonne Curator
2. Describe
Submission
Scientist
Shared Endpoint
4. Curate Dataset
1. Publish Data
6. Download
5. Search
Login with Campus or Globus Identity
46
Start a New Submission
47
Describe Submission
48
Dublin Core + Scientific Metadata
Assemble Dataset and Transfer to
Submission Endpoint
49
Grant Submission License
50
Recap: Globus Data Publication
• SaaS for publishing large research data
• Bring your own storage
• Extensible metadata
• Publication and curation workflows
• Public and restricted collections
• Rich discovery model
Curation Workflow
52
Submission is now Published with DOI
53
Search Published Datasets by
Collection
54
Search Published Datasets across
Collections
55
Discovering a Published Dataset
56
Find the Published Dataset
57
Download the Published Dataset
58
Locally Downloaded Dataset
59
Looking for 3-5 early adopters
Summer:
Use and
provide
feedback
on alpha
Fall:
Test beta on
your campus
Winter:
Celebrate
General
Availability
Spring:
Tell us about it
at GlobusWorld
2015!
Thank you to our sponsors!
U . S . D E P A R T M E N T O F
ENERGY

More Related Content

PDF
Campus Bridging with Globus Services
PPTX
Globus publication demo screenshots
PPT
Grid Computing July 2009
PDF
Automating Research Data Management at Scale with Globus
PPTX
NIH Data Commons Architecture Ideas
PDF
Foundations for the Future of Science
PDF
20160922 Materials Data Facility TMS Webinar
PPT
20090701 Climate Data Staging
Campus Bridging with Globus Services
Globus publication demo screenshots
Grid Computing July 2009
Automating Research Data Management at Scale with Globus
NIH Data Commons Architecture Ideas
Foundations for the Future of Science
20160922 Materials Data Facility TMS Webinar
20090701 Climate Data Staging

What's hot (20)

PPT
SomeSlides
PPTX
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
PPTX
Globus and Dataverse: Towards big Data Publication
PPTX
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
PPTX
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
PDF
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
PPTX
Globus: Beyond File Transfer
PDF
Connecting Your System to Globus (APS Workshop)
PDF
GlobusWorld 2021 Tutorial: Building with the Globus Platform
PPTX
Delivering a Campus Research Data Service with Globus
PDF
Enabling Secure Data Discoverability (SC21 Tutorial)
PPTX
Gateways 2020 Tutorial - Introduction to Globus
PPTX
Sept 24 NISO Virtual Conference: Library Data in the Cloud
PPTX
Sept 24 NISO Virtual Conference: Library Data in the Cloud
PDF
Globus: Enabling the Open Storage Network
PPTX
RDAP 15: Research Data Management Using Globus Software-as-a-Service
PDF
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
PPT
contentDM
PPTX
Materializing the Web of Linked Data
PPTX
Creating Linked Data from Relational Databases
SomeSlides
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Globus and Dataverse: Towards big Data Publication
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
Globus: Beyond File Transfer
Connecting Your System to Globus (APS Workshop)
GlobusWorld 2021 Tutorial: Building with the Globus Platform
Delivering a Campus Research Data Service with Globus
Enabling Secure Data Discoverability (SC21 Tutorial)
Gateways 2020 Tutorial - Introduction to Globus
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Globus: Enabling the Open Storage Network
RDAP 15: Research Data Management Using Globus Software-as-a-Service
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
contentDM
Materializing the Web of Linked Data
Creating Linked Data from Relational Databases
Ad

Similar to Globus status and publication plans (20)

PDF
Introduction to Globus - XSEDE14 Tutorial
PDF
Simplified Research Data Management with the Globus Platform
PPTX
Globus presentation
PPTX
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
PPTX
Globus: Research Data Management as Service and Platform - pearc17
PDF
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
PDF
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
PPTX
re:Invent 2013-foster-madduri
PPTX
Accelerating Data-driven Discovery in Energy Science
PDF
Introduction to Globus for Researchers and New Users.pdf
PDF
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
PPTX
Globus for Data Management: 2014 Joint Facility User Forum
PDF
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
PPTX
GlobusWorld 2020 Keynote
PDF
Introduction to Globus: Research Data Management Software at the ALCF
PDF
Introduction to Globus (GlobusWorld Tour - UMich)
PDF
Introduction to Data Transfer and Sharing for Researchers
PDF
Tutorial: What's New with Globus
PDF
An Introduction to Globus for Researchers
PDF
What's New in Globus - Internet2 TechEXtra
Introduction to Globus - XSEDE14 Tutorial
Simplified Research Data Management with the Globus Platform
Globus presentation
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
Globus: Research Data Management as Service and Platform - pearc17
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
re:Invent 2013-foster-madduri
Accelerating Data-driven Discovery in Energy Science
Introduction to Globus for Researchers and New Users.pdf
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Globus for Data Management: 2014 Joint Facility User Forum
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
GlobusWorld 2020 Keynote
Introduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus (GlobusWorld Tour - UMich)
Introduction to Data Transfer and Sharing for Researchers
Tutorial: What's New with Globus
An Introduction to Globus for Researchers
What's New in Globus - Internet2 TechEXtra
Ad

More from Ian Foster (20)

PPTX
Global Services for Global Science March 2023.pptx
PPTX
The Earth System Grid Federation: Origins, Current State, Evolution
PPTX
Better Information Faster: Programming the Continuum
PPTX
ESnet6 and Smart Instruments
PPTX
Linking Scientific Instruments and Computation
PPTX
Foster CRA March 2022.pptx
PPTX
Big Data, Big Computing, AI, and Environmental Science
PPTX
AI at Scale for Materials and Chemistry
PPTX
Coding the Continuum
PPTX
Data Tribology: Overcoming Data Friction with Cloud Automation
PPTX
Research Automation for Data-Driven Discovery
PPTX
Scaling collaborative data science with Globus and Jupyter
PPTX
Learning Systems for Science
PPTX
Data Automation at Light Sources
PPTX
Team Argon Summary
PPTX
Thoughts on interoperability
PPTX
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
PPTX
Going Smart and Deep on Materials at ALCF
PPTX
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
PPTX
Software Infrastructure for a National Research Platform
Global Services for Global Science March 2023.pptx
The Earth System Grid Federation: Origins, Current State, Evolution
Better Information Faster: Programming the Continuum
ESnet6 and Smart Instruments
Linking Scientific Instruments and Computation
Foster CRA March 2022.pptx
Big Data, Big Computing, AI, and Environmental Science
AI at Scale for Materials and Chemistry
Coding the Continuum
Data Tribology: Overcoming Data Friction with Cloud Automation
Research Automation for Data-Driven Discovery
Scaling collaborative data science with Globus and Jupyter
Learning Systems for Science
Data Automation at Light Sources
Team Argon Summary
Thoughts on interoperability
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Going Smart and Deep on Materials at ALCF
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Software Infrastructure for a National Research Platform

Recently uploaded (20)

PPT
6.1 High Risk New Born. Padetric health ppt
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Fluid dynamics vivavoce presentation of prakash
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
2. Earth - The Living Planet earth and life
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
The scientific heritage No 166 (166) (2025)
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPT
protein biochemistry.ppt for university classes
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Overview of calcium in human muscles.pptx
PDF
Sciences of Europe No 170 (2025)
6.1 High Risk New Born. Padetric health ppt
. Radiology Case Scenariosssssssssssssss
Fluid dynamics vivavoce presentation of prakash
2. Earth - The Living Planet Module 2ELS
2. Earth - The Living Planet earth and life
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
The scientific heritage No 166 (166) (2025)
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Introduction to Cardiovascular system_structure and functions-1
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
protein biochemistry.ppt for university classes
2Systematics of Living Organisms t-.pptx
neck nodes and dissection types and lymph nodes levels
Overview of calcium in human muscles.pptx
Sciences of Europe No 170 (2025)

Globus status and publication plans

Editor's Notes

  • #3: Review what the Globus team has done over the past year.Announce an exciting new capability.
  • #12: Joel Brownstein is the data archivist of the Sloan Digital Sky Survey-IVTransfers daily telescope observations to the University of UtahThere they have a large cluster to run their various data reduction pipelinesUsing the Globus command-line interface within their Python APIJoel has moved more than 70 TB of data so far
  • #13: Ann develops numerical simulations of severe storms using the Weather Research and Forecasting (WRF) modelUses several HPC facilities throughout the countryMoved more than 100 TB of data using Globus— 50 TB last January alone!Moves data between various XSEDE resources, NCSA&apos;s mass storage system, and PSC&apos;s data archiver
  • #14: Collects tissue samples from young patients and their families and then extracts, sequences, and analyzesthe genetic material to understand underlying cause of disease.Uses Globus to move NGS data to and from public clouds where he runs analysis pipelines.More on Bill’s work later on in this talk (under Globus Genomics)
  • #22: Can use standard tools such as apt and yum to deployUses configuration fileAllows incremental config changesMultiple I/O nodesID node (MyProxy)Web node (OAuth)
  • #23: Alllows site administrators to monitor traffic to/from their site. Ultimately will allow for control.
  • #30: Geoffrey Moore
  • #31: Highlight CI ConnectHighlight XSEDE’s planned adoption of user, group and profile management
  • #32: Highlight CI Connect; coming up in Rob Gardner’s talkHighlight XSEDE’s planned adoption of user, group and profile management
  • #36: Competitive TCOAlternatives are campus computing cores and commercial sequence analysis services
  • #45: Collection is a set of DatasetsDataset is data + metadataCollection is within a CommunityPolicies on a CollectionMetadataAccess control Curation workflowLicenseStorage
  • #46: Demo scenario:A scientist, referred to throughout as “the Scientist” and associated with the user Blaiszik, has just published a paper associated with his research on nanoscale materials. He now wants to go ahead and publish the data associated with this publication.Using the Globus publication system, he is able to select the Argonne community, and the Center for Nanoscale Materials (CNM) collection. He selects to publish his dataHe describes the submission with both publication (Dublin core) and scientific metadataThe CNM collection has been preconfigured with its own storage provided at ArgonneAs part of this submission, a unique endpoint is created for “The Scientist&quot;, the endpoint is created so that only &quot;The Scientist&quot; can write to it&quot;The Scientist&quot; assembles his dataset on this endpoint by transferring files from 1 or more locations. He can assemble this dataset over a long period of time and can return to the submission workflow when he is happy with the submission. The CNM collection has also been preconfigured with a workflow requiring that an Argonne curator must approve the submissionA curator, referred to throughout as “the Curator” and associated with the user Chard, is able to view and edit the metadata and files of the datasetOnce approved the submission is published in the CNM collection with a DOIOther users (with permission to view the collection) can then discover published datasets by their DOI or using the Globus discovery interface to find datasets by their metadataThese users can choose to browse published datasets and download datasets to other resources (including local resources)
  • #47: Users can login using any of their linked Globus identities, e.g., Campus credentials (via InCommon), Google Account, XSEDE account, ..
  • #48: The first step of submission is to select a collection. In this case &quot;The Scientist&quot; selects the “Center for Nanoscale Materials”, as this is the department through which he conducted his research. Note: &quot;The Scientist&quot; can only see collections he is allowed to publish to.
  • #49: &quot;The Scientist&quot; must first describe the dataset he is publishing. There are two types of metadata required for submission to the CNM collection: 1) Dublin core and 2) scientific metadata. These metadata requirements are defined by the collection and can be configured depending on the domain. Additional pages can also be defined. Here, &quot;The Scientist&quot; enters information about the Authors, their ORCID (a unique researcher identity), the submission title, the date of publication, the accompanying publication to which this dataset is related, and the DOI for that publication. Note: &quot;The Scientist&quot; has missed an ORCID for one of his co-authors.
  • #50: Using the familiar Globus interface, &quot;The Scientist&quot; is able to select files from multiple sources and transfer them to his unique submission endpoint (publish#submission_11).This submission endpoint is created on shared Argonne storage resources, but is initially accessible only to &quot;The Scientist&quot; The dataset may be assembled over any period of time. &quot;The Scientist&quot; can create new files and folders on the endpoint and he can arrange these files in any hierarchy. At the completion of the submission the permissions on the endpoint will be changed such that the dataset is immutable. &quot;The Scientist” will be given read access to the dataset, collection curators will also be given read access to the data so that they can view the contents.
  • #51: Having verified the submission, &quot;The Scientist&quot; must grant the submission license. This license is again configured by the collection (i.e. each collection can customize their individual licenses), and allows the submitting user to grant rights to the collection (CNM) and the Globus system to manage and disseminate the dataset based on the agreed upon policies.
  • #53: The Argonne CNM collection has defined a workflow that requires a curatorto view and approve all submissions. The curation workflow enables the curator to view the submitted files and to edit the submitted metadata.
  • #54: At this point, the dataset is now published in the collection with a unique DOI (handle in this case) for other researchers to reference this published dataset. Access to the dataset (both metadata and files) is changed to reflect the policies of the collection. Access may be restricted to particular users, or groups of users, or it may be made public for any user to access.
  • #55: “The Researcher” chooses to search for all published data in the CNM collection. The results show a brief summary of each published dataset including information about the publication time, collection, summary of number of files, name, authors, description and a set of keyword tags as well as key-value tags. Each of these fields can be used to search for a particular dataset.
  • #56: Knowing that other collections may well have datasets of interest , “The Researcher” may broaden the search context to all accessible collections and search for datasets related to “Li-ion” and “autonomic”. Here, the results show datasets from 2 collections: the CNM and the Chemical Sciences and Engineering collection (red boxes). Results are ranked according to their relevance to the search.
  • #57: Going further, “The Researcher” can use different queries such as key-value and ranges. In this case, “The Researcher” searchers for energy density &gt; 1500 and microcapsules, and finds the dataset previously published in this demo with an associated key-value pair of energy-density:2000 that fits the range query criteria.
  • #58: Having found the desired published dataset, “The Researcher”can navigate to the summary page.
  • #59: The summary page shows a summary of the dataset and the list of files. “The Researcher” can choose to download individual files, browse the dataset using Globus, or download the entire dataset. Ability to view the dataset and download files is governed by the access control on the collection and permissions associated with “The Researcher”.
  • #60: Finally,“The Researcher” can view the downloaded dataset on their desktop PC.