SlideShare a Scribd company logo
Virtual Sciencein the CloudRoy WilliamsCalifornia Institute of Technology
humanscloudssensorsbeginner to expertsharinglogins and accessclick to code to workflowpersonal storagebig data and replicationcompute and scalingsoftware as componentinteroperabiltysurvey and eventcontrol or autonomousThe New Science
Compute ServicesRegistryGetting Data
Service Oriented Architecture3. bindservicerequestrequestclientresponseresponse2. findservice contractregistry1. publishPrinciple: Click or Code
VO Data ServicesCone Searchradius+position list of objects encoded as VOTableSimple Image Access ProtocolSimple Spectrum Access Protocolspectra have subtleties  protocol more complicatedAstronomical Data Query LanguageFor database queriesCore SQL functions plus astronomy-specific extensionsSky region, XmatchTable Access ProtocolExposes relational databasesWhat tablesWhat table schemaHere is a query in ADQL
VO Compute ServicesAsynchronousMay not get immediate answerjust get a place to check backSecurityExpensive resources, big requests, sequestered dataStrong or Weak or NoneScalableGraduated path to powerful computation and big dataCloud storeVOSpaceSharable
VO Registrypublish -- find -- bindRegistry MetadataDescriptions of data collections data delivery servicesorganizations, etc.Based on Dublin Core with astronomy-specific extensionsRepresented as XML schema; extensibleContents stored in Resource Registries exchange metadata records through the Open Archives Initiative Protocol (OAI-PMH)
Distributed RegistryAstrogridCfANCSACDSESOSTScI/JHUNOAOCaltechHEASARCJapanVOOngoing harvesting March 07(CfA, ESO, NOAO soon)
Semantics & SearchIdentifiers  ivo://nasa.gsfc.gcn/SWIFT#BAT_GRB_Pos_374875-722Free tags  beard Fred pudding Controlled Vocab (UCD) phot.flux;em.irControlled Vocabinterop (SKOS)Ontology   Greek isA Man, Socrates isA Greek  Socrates isA ManData Models   Each sky position will have a circular positional error estimate ...Text markup  Outflows from <object>NGC 666</object> are irregular ...Schema  Columns are Magnitude, Position, Identifier , ...Metadata (registry) forms  Full Registry: true; ManagedAuthorities: authority, nasa.heasarcFormal service description
Cloud Based Toolscode & presentationdata
Virtual Science in the Cloud
Open SkyQuery.netVO Astronomical Crossmatch Service Query builder
 PresentationExecution Query planning
 Query execution
 WorkflowInternationalAuthorsSubscribersGCN Brokerannotation from archivesAstronomersAmateursStudentsMicrolensingOptical transientsRadio transientsX-ray transientsGamma transientsskyalert.orgEvents and annotation disseminated to subscribers in real time with intelligenceFollow-up SchedulerTelescopeTelescopeTelescope
SkyalertPush-based workflowCan be cyclicPortfolio aggregation by citationAnnotation as software componentsStream owner builds templateDjango, Python, Jquerynow 4 developers via SVN
Skyalert Stream Registry... will be VO registry
Roleshuman or robot1. browsequery, human computing, WWT/Googleskyalert.orghuman or robot2. subscribehuman or robot3. author4. annotatecontrib software componentsarchive, miningpushinjectwebportfolios dbIM/tweet/email/TCPtriggersactions
skyalert.orgCyclic workflow graphTriggerCRTS[“Geometry”][“Moon angle”] > 30and SDSS[“Photoprimary”][“g-magnitude”] < 18Actionannotatorfollowup requestdynamically loads modulerun(triggerEvent, portfolio):  <business logic>can build event and inject recursivelysend messageAlerts and event cascade18
Skyalert-LSSTskyalert.orgTest run for LSST mobile app
Data service from CRTS and Skyalert
 gets JSON event list via http
LSST building skyalert clone
 Pasadena and Tucson both get events by Jabber/XMPP
 “Unknown” is now choice ofCataclysmic Variable, Supernova, Blazar Outburst, Active Galactic Nucleus Variability, UVCeti Variable, Asteroid, Variable, Mira Variable, High Proper Motion Star, Comet, Eclipsing Variable, Gamma Ray Burst Afterglow, Microlensing, Nova, Planetary Microlensing, RRLyrae Variable, Tidal Disruption Flare
Tier1 and Tier2 Event NodesEvolving in IVOABrokeringRegistry:Tier1 Stream definitions
 Event ServersTier2AuthoringDistributionJabber/XMPPor raw socketTier1: Immediate Forwarding, Reliable?, Topology?Tier2:Subscription, Repository, Query, Portfolio, Registry, Machine Learning, Substreams etc etc
NSF Teragrid World’s largest open distributed cyberinfrastructure
 11 Resource Provider sites, >2 Petaflop HPC & >27000 CPUs, >3 Petabyte disk, >60 PB tape
 Fast network, Visualization, experiments (VMs, GPUs, FPGAs)
 For US researchers and their collaborators through national peer-review processTeragrid  2002job submission and queueing(Condor, PBS, ..)login node100s of nodesuserpurged /scratchparallel I/Oparallel file system/homeglobal file systemmetadata nodeUnix, Globus, C++, ssh, files, MPI, PBS, make
Architectures 2010Science Gateway (no architecture!)Node farm (condor)Parallel computingMessage-passing MPIShared memoryGraphics Processing Units104 independent tiny threadsData IntensiveFlash memory (TG/UCSD)Graywulf (JHU/Pannstarrs)Immediate resources
Science GatewaysBiology and Biomedicine Science GatewayOpen Life Sciences GatewayThe Telescience ProjectGrid Analysis Environment (GAE)Neutron Science Instrument GatewayTeraGrid Visualization Gateway, ANLBIRNOpen Science Grid (OSG)Special PRiority and Urgent Computing Environment (SPRUCE)National Virtual Observatory (NVO)Arroyo Adaptive OpticsLinked Environments for Atmospheric Discovery (LEAD)Computational Chemistry Grid (GridChem)Computational Science and Engineering Online (CSE-Online)GEON(GEOsciences Network)Network for Earthquake Engineering Simulation (NEES)SCEC Earthworks ProjectNetwork for Computational Nanotechnology and nanoHUBGIScience Gateway (GISolve)Gridblast Bioinformatics GatewayEarth Systems GridAstrophysical Data Repository (Cornell)Slide courtesy of Nancy Wilkins-Diehr
GPU for molecular modelling
Pannstarrs PS1computeUser facingSQL/casjobsworkbenchprivacy/sharestored queriesData valetload/validatemergecrawlreplicatelogworkflowworkflowdatahead/slicehot/warm/coldFault tolerance: multiple replication, fault workflowCost and energy carefully consideredFuture: Hadoop/Mapreduce
Cloud Supercomputing?Teragrid/Globusvs   Cloud/Amazon MIBoth ways to get wholesale computingBoth provide IaaS, Infrastructure as a ServiceVirtual Machine more popular than CTSS stackWhat about parallelism? I/O speed? GPUs? etcWatch 3leaf and ScaleMP for these
Science and Web 2.0 Easy for groups to form and collaborateIntegrates with user workspaceiGoogle and OpenSocialalongside other aspects of their livesUse existing toolsSlideShare, blogs, google gadgets, facebook, Gwave, Flickr, YouTubeSharing workspaceElectronic logProvenanceVirtual Data as “equivalent script”
Science and Web 2.0Server delivers only codeBrowser makes presentationAjax and Ajaj and Http “long poll”Jquery and Google toolkitsee WWT and GSky in Skyalert“Everything is a wiki”or a wave?Visible/editable by group/s
Adaptive Optics Gateway Adaptive optics simulations
 30-meter telescope
 Planet finding coronograph
 4-day run for 4-sec!
 Parallel  parameter sweepsproposed upgrade of the Palomar AO system to a 56x56 subaperture system
Arroyo
Arroyo Gateway Architecture1. use HTML/JS from webserver to create job definition.wholesale computing2. Daemon is polling & sees new job, makes local space for it.3. Start job on compute resource & update jpb status.daemon7. User fetches results from webserver4. Fetch &update status of running job. Repeat.5. Output to remote space.webserverDjangoMySQLjob definitions and status5. Daemon copies output from remote to local,  updates job status.local space for resultsremote space for resultsretailwholesaleRW and J. Bunn
Pegasus workflowE. Deelman
E. Deelman, G. Berriman, RW, et al
LIGO Grid Condor/DAGMan
 now 45,000 jobs per month
 Pegasus for load balancing?Asynchronous services: User needs feedback AJAJ (AJAX but with JSON)
 Detailed progress reports during run
 Strong/weak security model with certificates
Wide-area Mosaicking158 feetGriffith Observatory, Los Angeles
Citizen Science
Human VolunteersScience LayerDescribe what you see in imageEach person has level of expertiseHow to use results most effectivelyGalaxyzoo.org, citizensky.org good modelsGame LayerMakes people come backTop 10 ranking etcAnonymous partner a la gwap.com

More Related Content

What's hot (20)

PPTX
Accelerating Discovery via Science Services
Ian Foster
 
PDF
Wf4Ever: Workflow Preservation
Jose Enrique Ruiz
 
PDF
Curating and Preserving Collaborative Digital Experiments
Jose Enrique Ruiz
 
PPTX
Taming Big Data!
Ian Foster
 
PPTX
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Ian Foster
 
PPTX
Learning Systems for Science
Ian Foster
 
PPTX
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Ian Foster
 
PDF
Velocity cubes of galaxies
Jose Enrique Ruiz
 
PPTX
Sgg crest-presentation-final
marpierc
 
PPTX
Cloud com foster december 2010
Ian Foster
 
PPTX
Accelerating Data-driven Discovery in Energy Science
Ian Foster
 
PPTX
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
PDF
A Recommender Story: Improving Backend Data Quality While Reducing Costs
Databricks
 
PPTX
Data Automation at Light Sources
Ian Foster
 
PPTX
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
PPTX
CHASE-CI: A Distributed Big Data Machine Learning Platform
Larry Smarr
 
PDF
Astronomical Data Processing on the LSST Scale with Apache Spark
Databricks
 
PDF
Big Data Modeling Challenges and Machine Learning with No Code
Liana Ye
 
PPT
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
PPTX
Scaling collaborative data science with Globus and Jupyter
Ian Foster
 
Accelerating Discovery via Science Services
Ian Foster
 
Wf4Ever: Workflow Preservation
Jose Enrique Ruiz
 
Curating and Preserving Collaborative Digital Experiments
Jose Enrique Ruiz
 
Taming Big Data!
Ian Foster
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Ian Foster
 
Learning Systems for Science
Ian Foster
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Ian Foster
 
Velocity cubes of galaxies
Jose Enrique Ruiz
 
Sgg crest-presentation-final
marpierc
 
Cloud com foster december 2010
Ian Foster
 
Accelerating Data-driven Discovery in Energy Science
Ian Foster
 
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
Databricks
 
Data Automation at Light Sources
Ian Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
CHASE-CI: A Distributed Big Data Machine Learning Platform
Larry Smarr
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Databricks
 
Big Data Modeling Challenges and Machine Learning with No Code
Liana Ye
 
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
Scaling collaborative data science with Globus and Jupyter
Ian Foster
 

Similar to Virtual Science in the Cloud (20)

PDF
Cognitive Engine: Boosting Scientific Discovery
diannepatricia
 
PPT
OGCE TeraGrid 2010 Science Gateway Tutorial Intro
marpierc
 
PPT
Computing Outside The Box June 2009
Ian Foster
 
PDF
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
PPTX
Indiana University's Advanced Science Gateway Support
marpierc
 
PPT
Godiva2 Overview
jonblower
 
PPTX
WPS Application Patterns
Daniel Nüst
 
PPT
Computing Outside The Box September 2009
Ian Foster
 
PPT
Computing Outside The Box
Ian Foster
 
PPTX
Scientific
marpierc
 
PPT
Agents In An Exponential World Foster
Ian Foster
 
PPTX
Windows Azure: Lessons From The Field
Rob Gillen
 
PPTX
Azure: Lessons From The Field
Rob Gillen
 
PPT
grid mining
ARNOLD
 
PPT
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Timothy Chen
 
PPT
OGCE Overview for SciDAC 2009
marpierc
 
PPT
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
Rudolf Husar
 
PPT
060314 Ispra Htap Presentations Husar 060314 Ispra
Rudolf Husar
 
PPTX
Accelerating data-intensive science by outsourcing the mundane
Ian Foster
 
PPT
Grid Projects In The US July 2008
Ian Foster
 
Cognitive Engine: Boosting Scientific Discovery
diannepatricia
 
OGCE TeraGrid 2010 Science Gateway Tutorial Intro
marpierc
 
Computing Outside The Box June 2009
Ian Foster
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
Indiana University's Advanced Science Gateway Support
marpierc
 
Godiva2 Overview
jonblower
 
WPS Application Patterns
Daniel Nüst
 
Computing Outside The Box September 2009
Ian Foster
 
Computing Outside The Box
Ian Foster
 
Scientific
marpierc
 
Agents In An Exponential World Foster
Ian Foster
 
Windows Azure: Lessons From The Field
Rob Gillen
 
Azure: Lessons From The Field
Rob Gillen
 
grid mining
ARNOLD
 
Introduction to Apache Drill - Big Data Bellevue Meetup 20131023
Timothy Chen
 
OGCE Overview for SciDAC 2009
marpierc
 
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
Rudolf Husar
 
060314 Ispra Htap Presentations Husar 060314 Ispra
Rudolf Husar
 
Accelerating data-intensive science by outsourcing the mundane
Ian Foster
 
Grid Projects In The US July 2008
Ian Foster
 
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Machine Learning Benefits Across Industries
SynapseIndia
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
Earn Agentblazer Status with Slack Community Patna.pptx
SanjeetMishra29
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Machine Learning Benefits Across Industries
SynapseIndia
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
The Future of Artificial Intelligence (AI)
Mukul
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Earn Agentblazer Status with Slack Community Patna.pptx
SanjeetMishra29
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Ad

Virtual Science in the Cloud

  • 1. Virtual Sciencein the CloudRoy WilliamsCalifornia Institute of Technology
  • 2. humanscloudssensorsbeginner to expertsharinglogins and accessclick to code to workflowpersonal storagebig data and replicationcompute and scalingsoftware as componentinteroperabiltysurvey and eventcontrol or autonomousThe New Science
  • 4. Service Oriented Architecture3. bindservicerequestrequestclientresponseresponse2. findservice contractregistry1. publishPrinciple: Click or Code
  • 5. VO Data ServicesCone Searchradius+position list of objects encoded as VOTableSimple Image Access ProtocolSimple Spectrum Access Protocolspectra have subtleties  protocol more complicatedAstronomical Data Query LanguageFor database queriesCore SQL functions plus astronomy-specific extensionsSky region, XmatchTable Access ProtocolExposes relational databasesWhat tablesWhat table schemaHere is a query in ADQL
  • 6. VO Compute ServicesAsynchronousMay not get immediate answerjust get a place to check backSecurityExpensive resources, big requests, sequestered dataStrong or Weak or NoneScalableGraduated path to powerful computation and big dataCloud storeVOSpaceSharable
  • 7. VO Registrypublish -- find -- bindRegistry MetadataDescriptions of data collections data delivery servicesorganizations, etc.Based on Dublin Core with astronomy-specific extensionsRepresented as XML schema; extensibleContents stored in Resource Registries exchange metadata records through the Open Archives Initiative Protocol (OAI-PMH)
  • 9. Semantics & SearchIdentifiers ivo://nasa.gsfc.gcn/SWIFT#BAT_GRB_Pos_374875-722Free tags beard Fred pudding Controlled Vocab (UCD) phot.flux;em.irControlled Vocabinterop (SKOS)Ontology Greek isA Man, Socrates isA Greek  Socrates isA ManData Models Each sky position will have a circular positional error estimate ...Text markup Outflows from <object>NGC 666</object> are irregular ...Schema Columns are Magnitude, Position, Identifier , ...Metadata (registry) forms Full Registry: true; ManagedAuthorities: authority, nasa.heasarcFormal service description
  • 10. Cloud Based Toolscode & presentationdata
  • 12. Open SkyQuery.netVO Astronomical Crossmatch Service Query builder
  • 15. WorkflowInternationalAuthorsSubscribersGCN Brokerannotation from archivesAstronomersAmateursStudentsMicrolensingOptical transientsRadio transientsX-ray transientsGamma transientsskyalert.orgEvents and annotation disseminated to subscribers in real time with intelligenceFollow-up SchedulerTelescopeTelescopeTelescope
  • 16. SkyalertPush-based workflowCan be cyclicPortfolio aggregation by citationAnnotation as software componentsStream owner builds templateDjango, Python, Jquerynow 4 developers via SVN
  • 17. Skyalert Stream Registry... will be VO registry
  • 18. Roleshuman or robot1. browsequery, human computing, WWT/Googleskyalert.orghuman or robot2. subscribehuman or robot3. author4. annotatecontrib software componentsarchive, miningpushinjectwebportfolios dbIM/tweet/email/TCPtriggersactions
  • 19. skyalert.orgCyclic workflow graphTriggerCRTS[“Geometry”][“Moon angle”] > 30and SDSS[“Photoprimary”][“g-magnitude”] < 18Actionannotatorfollowup requestdynamically loads modulerun(triggerEvent, portfolio): <business logic>can build event and inject recursivelysend messageAlerts and event cascade18
  • 21. Data service from CRTS and Skyalert
  • 22. gets JSON event list via http
  • 24. Pasadena and Tucson both get events by Jabber/XMPP
  • 25. “Unknown” is now choice ofCataclysmic Variable, Supernova, Blazar Outburst, Active Galactic Nucleus Variability, UVCeti Variable, Asteroid, Variable, Mira Variable, High Proper Motion Star, Comet, Eclipsing Variable, Gamma Ray Burst Afterglow, Microlensing, Nova, Planetary Microlensing, RRLyrae Variable, Tidal Disruption Flare
  • 26. Tier1 and Tier2 Event NodesEvolving in IVOABrokeringRegistry:Tier1 Stream definitions
  • 27. Event ServersTier2AuthoringDistributionJabber/XMPPor raw socketTier1: Immediate Forwarding, Reliable?, Topology?Tier2:Subscription, Repository, Query, Portfolio, Registry, Machine Learning, Substreams etc etc
  • 28. NSF Teragrid World’s largest open distributed cyberinfrastructure
  • 29. 11 Resource Provider sites, >2 Petaflop HPC & >27000 CPUs, >3 Petabyte disk, >60 PB tape
  • 30. Fast network, Visualization, experiments (VMs, GPUs, FPGAs)
  • 31. For US researchers and their collaborators through national peer-review processTeragrid 2002job submission and queueing(Condor, PBS, ..)login node100s of nodesuserpurged /scratchparallel I/Oparallel file system/homeglobal file systemmetadata nodeUnix, Globus, C++, ssh, files, MPI, PBS, make
  • 32. Architectures 2010Science Gateway (no architecture!)Node farm (condor)Parallel computingMessage-passing MPIShared memoryGraphics Processing Units104 independent tiny threadsData IntensiveFlash memory (TG/UCSD)Graywulf (JHU/Pannstarrs)Immediate resources
  • 33. Science GatewaysBiology and Biomedicine Science GatewayOpen Life Sciences GatewayThe Telescience ProjectGrid Analysis Environment (GAE)Neutron Science Instrument GatewayTeraGrid Visualization Gateway, ANLBIRNOpen Science Grid (OSG)Special PRiority and Urgent Computing Environment (SPRUCE)National Virtual Observatory (NVO)Arroyo Adaptive OpticsLinked Environments for Atmospheric Discovery (LEAD)Computational Chemistry Grid (GridChem)Computational Science and Engineering Online (CSE-Online)GEON(GEOsciences Network)Network for Earthquake Engineering Simulation (NEES)SCEC Earthworks ProjectNetwork for Computational Nanotechnology and nanoHUBGIScience Gateway (GISolve)Gridblast Bioinformatics GatewayEarth Systems GridAstrophysical Data Repository (Cornell)Slide courtesy of Nancy Wilkins-Diehr
  • 34. GPU for molecular modelling
  • 35. Pannstarrs PS1computeUser facingSQL/casjobsworkbenchprivacy/sharestored queriesData valetload/validatemergecrawlreplicatelogworkflowworkflowdatahead/slicehot/warm/coldFault tolerance: multiple replication, fault workflowCost and energy carefully consideredFuture: Hadoop/Mapreduce
  • 36. Cloud Supercomputing?Teragrid/Globusvs Cloud/Amazon MIBoth ways to get wholesale computingBoth provide IaaS, Infrastructure as a ServiceVirtual Machine more popular than CTSS stackWhat about parallelism? I/O speed? GPUs? etcWatch 3leaf and ScaleMP for these
  • 37. Science and Web 2.0 Easy for groups to form and collaborateIntegrates with user workspaceiGoogle and OpenSocialalongside other aspects of their livesUse existing toolsSlideShare, blogs, google gadgets, facebook, Gwave, Flickr, YouTubeSharing workspaceElectronic logProvenanceVirtual Data as “equivalent script”
  • 38. Science and Web 2.0Server delivers only codeBrowser makes presentationAjax and Ajaj and Http “long poll”Jquery and Google toolkitsee WWT and GSky in Skyalert“Everything is a wiki”or a wave?Visible/editable by group/s
  • 39. Adaptive Optics Gateway Adaptive optics simulations
  • 41. Planet finding coronograph
  • 42. 4-day run for 4-sec!
  • 43. Parallel  parameter sweepsproposed upgrade of the Palomar AO system to a 56x56 subaperture system
  • 45. Arroyo Gateway Architecture1. use HTML/JS from webserver to create job definition.wholesale computing2. Daemon is polling & sees new job, makes local space for it.3. Start job on compute resource & update jpb status.daemon7. User fetches results from webserver4. Fetch &update status of running job. Repeat.5. Output to remote space.webserverDjangoMySQLjob definitions and status5. Daemon copies output from remote to local, updates job status.local space for resultsremote space for resultsretailwholesaleRW and J. Bunn
  • 47. E. Deelman, G. Berriman, RW, et al
  • 49. now 45,000 jobs per month
  • 50. Pegasus for load balancing?Asynchronous services: User needs feedback AJAJ (AJAX but with JSON)
  • 51. Detailed progress reports during run
  • 52. Strong/weak security model with certificates
  • 53. Wide-area Mosaicking158 feetGriffith Observatory, Los Angeles
  • 55. Human VolunteersScience LayerDescribe what you see in imageEach person has level of expertiseHow to use results most effectivelyGalaxyzoo.org, citizensky.org good modelsGame LayerMakes people come backTop 10 ranking etcAnonymous partner a la gwap.com
  • 56. Human Volunteer EvidenceDonalek et alarXiv:0810.4945 [astro-ph] 4 of 10 say artifact artifact
  • 57. RW and C. Donalek
  • 60. Classic Machine LearningMetric in “Feature Space”Relevance Vector Machine (Tipping)Feature VectorsLearning from Training setPicking relevant lessonsRW and J. Beck
  • 61. New Machine Learning:Information FusionData Portfoliosselected from known set of object typesEvidence objectset of class/proband prior assumptionsmay be correlated priorsAnnotator builds evidencefrom portfoliomay include other evidenceInference (= Expert System)Combines evidence with cost-benefitBuilds ImportanceAlchemy
  • 66. Influence DiagramsAutomated Decision through Tripod of DataArchivenearby radio source escalates p(blazar)nearby galaxy escalates p(supernova)HumanCrowded field? Artifact present?Can make follow-up observationMachineFuzzy center escalates p(host galaxy)Moving source escalates p(asteroid)Bobotic follow-up observationdecisionhumanarchivemachinelearning
  • 68. User Interface (wrong)and now do some science....Finally get some helpAsk for helpTranslate VOTable formatLearn to use VO RegistryRead about web servicesRead about XMLWait for accountRegister
  • 69. User interface (right)in Darwinian evolution every small change must give benefitPower userLearn the VO structurehey this is interesting ....Run bigger jobmore science....Registersome science....Web formAnonymousbe careful with complex authentication!
  • 70. Steering the ShipShort term Pragmatismuseful tools nowsimple protocols (eg cone search)“just use RA and Dec”vsLong term Architecturemodular suite of interoperable toolssophisticated protocols (egskynode)sophisticated Space-Time coordinates
  • 85. InterfacesA Data Model is a bridge fromcommunity to computers
  • 86. What is a Data Center?machinesservicesdoesn’t matter where or howtesting testing testingdo we have enough power and HVAC?
  • 87. Complex scienceComplex machinesSeparate science user from complexityMust have domain science contextMaking simple things simple butPower to scale upDrill-down if wantedMachines are not the objectiveScience through data, compute, sharing
  • 88. eScience is for People, right?Getting StartedHelp DeskForumDocumentationKnowledge BaseCalendarContact UsSocial MediaBlog/newsfeedCampus ChampionsSummer SchoolsAdvanced Supportfor DevelopersEducation