SlideShare a Scribd company logo
HPC, Grid and Cloud Computing
   - The Past, Present and Future


             Jason Shih
    Academia Sinica Grid computing
      FBI 極簡主義, Nov 3rd, 2010
Outline


 Trend in HPC
 Grid: eScience Research @ PetaScale
 Cloud Hype and Observation
 Future Exploration Path of Computing
 Summary
Max CERN/T1-ASGC Point2Point

   About ASGC
                                                                    Inbound : 9.3 Gbps!




                                                              1. Most Reliable T1: 98.83%!
                                                              2. Very Highly Performing and
                                                               most Stable Site in CCRC08!

                                                                            Asia Pacific Regional
                                                                                Operation Center
 A Worldwide Grid
 Infrastructure
 >280 sites,
 >45 countries
 >80,000 CPUs,
 >20 PetaBytes                                              100 meters underground
 >14,000 users,
 >200 VOs                                                   27km of circumstances;
 >250,000 jobs/day
                                                            locate in Geneva
                                                                 Best Demo Award of EGEE’07!
Grid Application Platform                                          Avian Flu Drug Discovery




Lightweight Problem Solving   Large Hadron Collider (LHC)
        Framework!

                                                                              21
Emerging Trend and Technologies: 2009 -2010
Hype Cycle for Storage Technologies - 2010
Trend in High Performance Computing
Ugly? Performance of HPC Cluster

 272 (52%) of world fastest clusters have efficiency lower
 than 80% (Rmax/Rpeak)
 Only 115 (18%) could drive over 90% of theoretical peak
                            Sampling from Top500 HPC cluster




                   Trend of Cluster Efficiency 2005-2009
Performance and Efficiency
 20% of Top-performed clusters contribute 60% of Total
 Computing Power (27.98PF)
 5 Clusters Eff. < 30
Impact Factor: Interconnectivity
     - Capacity and Cluster Efficiency

 Over 52% of Cluster base on GbE
   With efficiency around 50% only
 InfiniBand adopt by ~36% HPC Clusters
HPC Cluster - Interconnect Using IB
 SDR, DDR and QDR in Top500
   Promising efficiency >= 80%
   Majority of IB ready cluster adopt
   DDR (87%) (2009 Nov)
   Contribute 44% of total computing
   power
     ~28 Pflops
   Avg efficiency ~78%
Trend in HPC Interconnects: Infiniband Roadmap
Common semantics



 Programmer productivity
 Easy of deployment
 HPC filesystem are more mature, wider feature set:
   High concurrent read and write
   In the comfort zone of programmers (vs cloudFS)
 Wide support, adoption, acceptance possible
   pNFS working to be equivalent
   Reuse standard data management tools
     Backup, disaster recovery and tiering
Evolution of Processors
Trend in HPC
Some Observations & Looking for Future (I)
 Computing Paradigm
  (Almost) Free FLOPS
  (Almost) Logic Operation
  Data Access (Memory) Is A Major Bottleneck
  Synchronization Is the Most Expensive
  Data Communication Is A Big Factor in Performance
  I/O Still A Major Programming Consideration
  MPI Coding Is the Motherhood of Large Scale Computing
  Computing in Conjunction of Massive Data Management
  Finding Parallelism Is Not A Whole Issue In Programming
  Data Layout
  Data Movement
  Data Reuse
  Frequency of Interconnected Data Communication
Some Observations & Looking for Future (II)
 Emerging New Possibility
   Massive “Small” Computing Elements with On Board Memory
   Computing Node Can Be Caonfigured Dynamically (including Failure
   recovery)
   Network Switch (within on site complex) Will Nearly Match Memory
   Performance
   Parallel I/O Support for Massive Parallel System
   Asynchronous Computing/Communication Operation
   Sophisticate Data Pre-fetch Scheme (Hardware/Algorithm)
   Automate Dynamic Load Balance Method
   Very High Order Difference Scheme (also Implicit Method)
   Full Coupling of Formerly Split Operators
   Fine Numerical Computational Grid (grid number > 10,000)
   Full Simulation of Protein
   Full Coupling of Computational Model
   Grid Computing for All
Some Observations & Looking for Future (3)




          System will get more complicate &
      Computing Tool will get more sophisticated:


          Vendor Support & User Readiness?
Grid: eScience Research @ PetaScale
WLCG Computing Model
   - The Tier Structure
 Tier-0 (CERN)
   Data recording
   Initial data reconstruction
   Data distribution
 Tier-1 (11 countries)
   Permanent storage
   Re-processing
   Analysis
 Tier-2 (~130 countries)
   Simulation
   End-user analysis
Enabling Grids for E-sciencE




 Archeology
 Astronomy
 Astrophysics
 Civil Protection
 Comp. Chemistry
 Earth Sciences
 Finance
 Fusion
 Geophysics
 High Energy Physics
 Life Sciences
 Multimedia
 Material Sciences
 …

EGEE-II INFSO-RI-031688                                  EGEE07, Budapest, 1-5 October 2007   4
Objectives

 Building sustainable research and collaboration
 infrastructure
 Support research by e-Science, on data intensive
 sciences and applications require cross disciplinary
 distributed collaboration
ASGC Milestone
 Operational from the deployment of LCG0 since 2002
 ASGC CA establish on 2005 (IGTF in same year)
 Tier-1 Center responsibility start from 2005
 Federated Taiwan Tier-2 center (Taiwan Analysis Facility, TAF)
 is also collocated in ASGC
 Rep. of EGEE e-Science Asia Federation while joining EGEE
 from 2004
 Providing Asia Pacific Regional Operation Center (APROC)
 services to regional-wide WLCG/EGEE production infrastructure
 from 2005
 Initiate Avian Flu Drug Discovery Project and collaborate with
 EGEE in 2006
 Start of EUAsiaGrid Project from April 2008
LHC First Beam – Computing at the Petascale


 General Purpose, pp, heavy ions




LHCb: B-physics, CP Violation                  ALICE: Heavy ions, pp


                                      CMS: General Purpose, pp, heavy ions




      ATLAS: General Purpose, pp, heavy ions
Size of LHC Detector

                             ATLAS
      Bld. 40




 7,000 Tons                   ATLAS Detector
                       CMS
 25 Meters in Height
 45 Meters in Length
Standard Cosmology

                   Good model from 0.01 sec
                   after Big Bang




                                                                            Energy, Density, Temperature
                   Supported by considerable
                   observational evidence




                                                                    Time
               Elementary Particle Physics

               From the Standard Model into the
               unknown: towards energies of
               1 TeV and beyond: the Terascale

               Towards Quantum Gravity

               From the unknown into the
               unknown...
        http://www.damtp.cam.ac.uk/user/gr/public/bb_history.html
     UNESCO Information                                                25
Preservation debate, April 2007 -
    Jamie.Shiers@cern.ch
WLCG Timeline

 First Beam on LHC, Sep.
 10, 2008
 Severe Incident after 3w
 operation (3.5TeV)
Petabyte Scale Data Challenges

 Why Petabyte?
  Experiment Computing Model
  Comparing with conventional data management
 Challenges
  Performance: LAN and WAN activities
    Sufficient B/W between CPU Farm
    Eliminate Uplink Bottleneck (Switch Tires)
  Fast responding of Critical Events
    Fabric Infrastructure & Service Level Agreement
  Scalability and Manageability
    Robust DB engine (Oracle RAC)
    KB and Adequate Administration (Training)
Tier Model and Data Management Components
Disk Pool Configuration
     - T1 MSS (CASTOR)
Distribution of Free Capacity
     - Per Disk Servers vs. per Pool
Storage Server Generation
     - Drive vs. Net Capacity (Raid6)

                                               TB
                                     TB   21TB/DS
                                31TB/DS




                    TB          TB
               40TB/DS     15TB/DS
IDC Collocation
 Facility install complete at Mar 27th
 Tape system delay after Apr 9th
   Realignment
   RMA for faulty parts
Storage Farm
 ~ 110 raid subsystem deployed since 2003.
 Supporting both Tier1 and 2 storage fabric
 DAS connection to front-end blade server
   Flexible switching front end server upon
   performance requirement
   4-8G fiber channel connectivity
Computing/Storage System Infrastructure
Throughput of WLCG Experiments
 Throughput defined as Job Eff. x # Jobs running
 Characteristic of 4 LHC Exp. depicting in-efficiency
 is due to poor coding.
Reliability From Different View Perspective
Storage Fabric Management
     – The Challenges: Events Management
Open Cloud Consortium



Cloud Hype and Observation
Hpc, grid and cloud computing - the past, present, and future challenge
Cloud Hype
 Metacomputing (~1987, L. Smarr)
 Grid Computing (~1997, I. Foster, K. Kesselman)
 Cloud Computing (~2007, E. Schmidt?)
Type of Infrastructure

  roprietary solutions by public providers
 P
   Turnkey solutions developed internally as they own
    the software and hardware solution/tech.
  loud specific support
 C
   Developers of specific hardware and/or software
    solutions that are utilized by service providers or used
    internally when building private cloud
  raditional providers
 T
   Leverage or tweak their existing
Grid and Cloud:
     Comparison
 Cost & Performance
 Scale & Usability
 Service Mapping
 Interoperability
 Application Scenarios
Cloud Computing:
      “X” as a Service
  ype of Cloud
 T
  ayered Service Model
 L
  eference Model
 R
Virtualization is not Cloud computing

 Performance Overhead
 FV vs. PV
    Disk I/O and network throughput (VM scalability)




Ref: Linux-based virtualization for HPC clusters.
Cloud Infrastructure
     Best practical & Real world performance
  tart Up: 60 ~ 44s
 S
  estart : 30 ~ 27s
 R
  eletion: 60 ~ <5s
 D
  igrate
 M
   30 VM ~ 26.8s
   60 VM ~ 40s
    20 VM ~ 89s
    1
  top
 S
   30VM ~ 27.4s
   60VM ~ 26s
    20VM ~ 57s
    1
Cloud Infrastructure
     Best practical
Real World Performance
  tart Up: 60 ~ 44s
 S
  estart : 30 ~ 27s
 R
  eletion: 60 ~ <5s
 D
  igrate
 M
   30 VM ~ 26.8s
   60 VM ~ 40s
    20 VM ~ 89s
     1
  top
 S
   30VM ~ 27.4s
   60VM ~ 26s
    20VM ~ 57s
     1
Virtualization: HEP Best Practical
Hpc, grid and cloud computing - the past, present, and future challenge
Grid over Cloud or
     Cloud over Grid?
Power Consumption Challenge
Conclusion: My Opinion


 Future of Computing: Technology-Push & Demand-
 Pull
 Emerging of new science paradigm
 Virtualization: Promising Technology but being
 overemphasized
 Green: Cloud Service Transparency & Common
 Platform
  More Computing Power ~ Power Consumption
  Challenge
 Private Clouds Will be predominant way
  Commercial Cloud (Public) expect not evolving fast
Acknowledgment


 Thanks valuable discussion/inputs from TCloud
 (Cloud OS: Elaster)
 Professional Technical Support from Silvershine
 Tech. at beginning of the collaboration.

The interesting thing about Cloud Computing is that we’ve
defined Cloud Computing to include everything that we
already do….. I don’t understand what we would do
differently in the light of Cloud Computing other than
change the wording of some of our ads.
      Larry Ellison, quote in the Wall Street Journal, Sep 26, 2008
Issues

 Scalability?
   Infrastructure operation vs. performance
 Assessment
 Application aware – Cloud service
 Cost analysis
 Data center power usage – PUE
 Cloud Myth
 Top 10 Cloud Computing Trend
   http://www.focus.com/articles/hosting-bandwidth/
   top-10-cloud-computing-trends/
 Use Cases & Best Practical
Issues (II)


 Volunteer computing (boinc)?
   Total capacity & performance
   successful stories & research Despines
 What’s hindering cloud adoption? Try human.
   http://gigaom.com/cloud/whats-hindering-cloud-
   adoption-how-about-humans/
 Future projection?
   service readiness? Service level? Technical barriers?

More Related Content

What's hot (20)

PDF
Cognitive Engine: Boosting Scientific Discovery
diannepatricia
 
PDF
CloudLightning and the OPM-based Use Case
CloudLightning
 
PDF
Alice data acquisition
Bertalan EGED
 
PDF
ECP Application Development
inside-BigData.com
 
PDF
Coca1
Manjesh Mani
 
PDF
Gfarm Fs Tatebe Tip2004
xlight
 
PPT
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Larry Smarr
 
PDF
Machine Learning for Weather Forecasts
inside-BigData.com
 
PDF
Hpc Cloud project Overview
Floris Sluiter
 
PDF
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
inside-BigData.com
 
PDF
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
deawoo Kim
 
PDF
Paper444012-4014
saumya yuval
 
PDF
High Performance Computing - Challenges on the Road to Exascale Computing
Heiko Joerg Schick
 
KEY
Csc presentation
Almu Dena
 
PPTX
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
 
PDF
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
csandit
 
PPT
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
 
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
PPT
Creating a Planetary Scale OptIPuter
Larry Smarr
 
PPTX
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 
Cognitive Engine: Boosting Scientific Discovery
diannepatricia
 
CloudLightning and the OPM-based Use Case
CloudLightning
 
Alice data acquisition
Bertalan EGED
 
ECP Application Development
inside-BigData.com
 
Gfarm Fs Tatebe Tip2004
xlight
 
Using Photonics to Prototype the Research Campus Infrastructure of the Future...
Larry Smarr
 
Machine Learning for Weather Forecasts
inside-BigData.com
 
Hpc Cloud project Overview
Floris Sluiter
 
Scale-out AI Training on Massive Core System from HPC to Fabric-based SOC
inside-BigData.com
 
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
deawoo Kim
 
Paper444012-4014
saumya yuval
 
High Performance Computing - Challenges on the Road to Exascale Computing
Heiko Joerg Schick
 
Csc presentation
Almu Dena
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
 
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
csandit
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
Creating a Planetary Scale OptIPuter
Larry Smarr
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 

Similar to Hpc, grid and cloud computing - the past, present, and future challenge (20)

PPT
Grid computing & its applications
Alokeparna Choudhury
 
PPT
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Tracy Chen
 
PPT
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
PPT
TeraGrid Communication and Computation
Tal Lavian Ph.D.
 
PDF
The Open Science Data Cloud: Empowering the Long Tail of Science
Robert Grossman
 
PDF
Lxcloud
EuroCloud
 
PPTX
Jorge gomes
EuroCloud
 
PPTX
Jorge gomes
EuroCloud
 
PPTX
Jorge gomes
EuroCloud
 
PPTX
Session 33 - Production Grids
ISSGC Summer School
 
PDF
The World Wide Distributed Computing Architecture of the LHC Datagrid
Swiss Big Data User Group
 
PPTX
ACES QuakeSim 2011
marpierc
 
PPTX
Colloborative computing
Cisco
 
PPTX
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
PDF
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebula Project
 
PDF
LCG project description
louisponcet
 
PDF
Jarp big data_sydney_v7
Suma Pria Tunggal
 
PDF
NSCC Training Introductory Class
National Supercomputing Centre Singapore
 
PPTX
Big Data HPC Convergence and a bunch of other things
Geoffrey Fox
 
PDF
2. the grid
Dr Sandeep Kumar Poonia
 
Grid computing & its applications
Alokeparna Choudhury
 
Cloud Computing,雲端運算-中研院網格計畫主持人林誠謙
Tracy Chen
 
Cyberinfrastructure and Applications Overview: Howard University June22
marpierc
 
TeraGrid Communication and Computation
Tal Lavian Ph.D.
 
The Open Science Data Cloud: Empowering the Long Tail of Science
Robert Grossman
 
Lxcloud
EuroCloud
 
Jorge gomes
EuroCloud
 
Jorge gomes
EuroCloud
 
Jorge gomes
EuroCloud
 
Session 33 - Production Grids
ISSGC Summer School
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
Swiss Big Data User Group
 
ACES QuakeSim 2011
marpierc
 
Colloborative computing
Cisco
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
OpenNebulaConf2015 1.06 Fermilab Virtual Facility: Data-Intensive Computing i...
OpenNebula Project
 
LCG project description
louisponcet
 
Jarp big data_sydney_v7
Suma Pria Tunggal
 
NSCC Training Introductory Class
National Supercomputing Centre Singapore
 
Big Data HPC Convergence and a bunch of other things
Geoffrey Fox
 
Ad

Recently uploaded (20)

PPTX
How to Add a Custom Button in Odoo 18 POS Screen
Celine George
 
PPTX
How Physics Enhances Our Quality of Life.pptx
AngeliqueTolentinoDe
 
PDF
Genomics Proteomics and Vaccines 1st Edition Guido Grandi (Editor)
kboqcyuw976
 
PDF
TechSoup Microsoft Copilot Nonprofit Use Cases and Live Demo - 2025.06.25.pdf
TechSoup
 
PDF
The Power of Compound Interest (Stanford Initiative for Financial Decision-Ma...
Stanford IFDM
 
PPTX
Urban Hierarchy and Service Provisions.pptx
Islamic University of Bangladesh
 
PPTX
Iván Bornacelly - Presentation of the report - Empowering the workforce in th...
EduSkills OECD
 
DOCX
Lesson 1 - Nature and Inquiry of Research
marvinnbustamante1
 
PDF
Wikinomics How Mass Collaboration Changes Everything Don Tapscott
wcsqyzf5909
 
PPTX
Connecting Linear and Angular Quantities in Human Movement.pptx
AngeliqueTolentinoDe
 
PPTX
How to Setup Automatic Reordering Rule in Odoo 18 Inventory
Celine George
 
PPTX
ESP 10 Edukasyon sa Pagpapakatao PowerPoint Lessons Quarter 1.pptx
Sir J.
 
PPTX
How to Configure Taxes in Company Currency in Odoo 18 Accounting
Celine George
 
DOCX
MUSIC AND ARTS 5 DLL MATATAG LESSON EXEMPLAR QUARTER 1_Q1_W1.docx
DianaValiente5
 
PDF
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
 
PDF
COM and NET Component Services 1st Edition Juval Löwy
kboqcyuw976
 
PPTX
Parsing HTML read and write operations and OS Module.pptx
Ramakrishna Reddy Bijjam
 
PPTX
week 1-2.pptx yueojerjdeiwmwjsweuwikwswiewjrwiwkw
rebznelz
 
PPTX
The Gift of the Magi by O Henry-A Story of True Love, Sacrifice, and Selfless...
Beena E S
 
PDF
Lesson 1 - Nature of Inquiry and Research.pdf
marvinnbustamante1
 
How to Add a Custom Button in Odoo 18 POS Screen
Celine George
 
How Physics Enhances Our Quality of Life.pptx
AngeliqueTolentinoDe
 
Genomics Proteomics and Vaccines 1st Edition Guido Grandi (Editor)
kboqcyuw976
 
TechSoup Microsoft Copilot Nonprofit Use Cases and Live Demo - 2025.06.25.pdf
TechSoup
 
The Power of Compound Interest (Stanford Initiative for Financial Decision-Ma...
Stanford IFDM
 
Urban Hierarchy and Service Provisions.pptx
Islamic University of Bangladesh
 
Iván Bornacelly - Presentation of the report - Empowering the workforce in th...
EduSkills OECD
 
Lesson 1 - Nature and Inquiry of Research
marvinnbustamante1
 
Wikinomics How Mass Collaboration Changes Everything Don Tapscott
wcsqyzf5909
 
Connecting Linear and Angular Quantities in Human Movement.pptx
AngeliqueTolentinoDe
 
How to Setup Automatic Reordering Rule in Odoo 18 Inventory
Celine George
 
ESP 10 Edukasyon sa Pagpapakatao PowerPoint Lessons Quarter 1.pptx
Sir J.
 
How to Configure Taxes in Company Currency in Odoo 18 Accounting
Celine George
 
MUSIC AND ARTS 5 DLL MATATAG LESSON EXEMPLAR QUARTER 1_Q1_W1.docx
DianaValiente5
 
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
 
COM and NET Component Services 1st Edition Juval Löwy
kboqcyuw976
 
Parsing HTML read and write operations and OS Module.pptx
Ramakrishna Reddy Bijjam
 
week 1-2.pptx yueojerjdeiwmwjsweuwikwswiewjrwiwkw
rebznelz
 
The Gift of the Magi by O Henry-A Story of True Love, Sacrifice, and Selfless...
Beena E S
 
Lesson 1 - Nature of Inquiry and Research.pdf
marvinnbustamante1
 
Ad

Hpc, grid and cloud computing - the past, present, and future challenge

  • 1. HPC, Grid and Cloud Computing - The Past, Present and Future Jason Shih Academia Sinica Grid computing FBI 極簡主義, Nov 3rd, 2010
  • 2. Outline  Trend in HPC  Grid: eScience Research @ PetaScale  Cloud Hype and Observation  Future Exploration Path of Computing  Summary
  • 3. Max CERN/T1-ASGC Point2Point About ASGC Inbound : 9.3 Gbps! 1. Most Reliable T1: 98.83%! 2. Very Highly Performing and most Stable Site in CCRC08! Asia Pacific Regional Operation Center A Worldwide Grid Infrastructure >280 sites, >45 countries >80,000 CPUs, >20 PetaBytes 100 meters underground >14,000 users, >200 VOs 27km of circumstances; >250,000 jobs/day locate in Geneva Best Demo Award of EGEE’07! Grid Application Platform Avian Flu Drug Discovery Lightweight Problem Solving Large Hadron Collider (LHC) Framework! 21
  • 4. Emerging Trend and Technologies: 2009 -2010
  • 5. Hype Cycle for Storage Technologies - 2010
  • 6. Trend in High Performance Computing
  • 7. Ugly? Performance of HPC Cluster  272 (52%) of world fastest clusters have efficiency lower than 80% (Rmax/Rpeak)  Only 115 (18%) could drive over 90% of theoretical peak  Sampling from Top500 HPC cluster Trend of Cluster Efficiency 2005-2009
  • 8. Performance and Efficiency  20% of Top-performed clusters contribute 60% of Total Computing Power (27.98PF)  5 Clusters Eff. < 30
  • 9. Impact Factor: Interconnectivity - Capacity and Cluster Efficiency  Over 52% of Cluster base on GbE  With efficiency around 50% only  InfiniBand adopt by ~36% HPC Clusters
  • 10. HPC Cluster - Interconnect Using IB  SDR, DDR and QDR in Top500  Promising efficiency >= 80%  Majority of IB ready cluster adopt DDR (87%) (2009 Nov)  Contribute 44% of total computing power  ~28 Pflops  Avg efficiency ~78%
  • 11. Trend in HPC Interconnects: Infiniband Roadmap
  • 12. Common semantics  Programmer productivity  Easy of deployment  HPC filesystem are more mature, wider feature set:  High concurrent read and write  In the comfort zone of programmers (vs cloudFS)  Wide support, adoption, acceptance possible  pNFS working to be equivalent  Reuse standard data management tools  Backup, disaster recovery and tiering
  • 15. Some Observations & Looking for Future (I)  Computing Paradigm  (Almost) Free FLOPS  (Almost) Logic Operation  Data Access (Memory) Is A Major Bottleneck  Synchronization Is the Most Expensive  Data Communication Is A Big Factor in Performance  I/O Still A Major Programming Consideration  MPI Coding Is the Motherhood of Large Scale Computing  Computing in Conjunction of Massive Data Management  Finding Parallelism Is Not A Whole Issue In Programming  Data Layout  Data Movement  Data Reuse  Frequency of Interconnected Data Communication
  • 16. Some Observations & Looking for Future (II)  Emerging New Possibility  Massive “Small” Computing Elements with On Board Memory  Computing Node Can Be Caonfigured Dynamically (including Failure recovery)  Network Switch (within on site complex) Will Nearly Match Memory Performance  Parallel I/O Support for Massive Parallel System  Asynchronous Computing/Communication Operation  Sophisticate Data Pre-fetch Scheme (Hardware/Algorithm)  Automate Dynamic Load Balance Method  Very High Order Difference Scheme (also Implicit Method)  Full Coupling of Formerly Split Operators  Fine Numerical Computational Grid (grid number > 10,000)  Full Simulation of Protein  Full Coupling of Computational Model  Grid Computing for All
  • 17. Some Observations & Looking for Future (3) System will get more complicate & Computing Tool will get more sophisticated: Vendor Support & User Readiness?
  • 18. Grid: eScience Research @ PetaScale
  • 19. WLCG Computing Model - The Tier Structure  Tier-0 (CERN)  Data recording  Initial data reconstruction  Data distribution  Tier-1 (11 countries)  Permanent storage  Re-processing  Analysis  Tier-2 (~130 countries)  Simulation  End-user analysis
  • 20. Enabling Grids for E-sciencE Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … EGEE-II INFSO-RI-031688 EGEE07, Budapest, 1-5 October 2007 4
  • 21. Objectives  Building sustainable research and collaboration infrastructure  Support research by e-Science, on data intensive sciences and applications require cross disciplinary distributed collaboration
  • 22. ASGC Milestone  Operational from the deployment of LCG0 since 2002  ASGC CA establish on 2005 (IGTF in same year)  Tier-1 Center responsibility start from 2005  Federated Taiwan Tier-2 center (Taiwan Analysis Facility, TAF) is also collocated in ASGC  Rep. of EGEE e-Science Asia Federation while joining EGEE from 2004  Providing Asia Pacific Regional Operation Center (APROC) services to regional-wide WLCG/EGEE production infrastructure from 2005  Initiate Avian Flu Drug Discovery Project and collaborate with EGEE in 2006  Start of EUAsiaGrid Project from April 2008
  • 23. LHC First Beam – Computing at the Petascale  General Purpose, pp, heavy ions LHCb: B-physics, CP Violation ALICE: Heavy ions, pp CMS: General Purpose, pp, heavy ions ATLAS: General Purpose, pp, heavy ions
  • 24. Size of LHC Detector ATLAS Bld. 40 7,000 Tons ATLAS Detector CMS 25 Meters in Height 45 Meters in Length
  • 25. Standard Cosmology Good model from 0.01 sec after Big Bang Energy, Density, Temperature Supported by considerable observational evidence Time Elementary Particle Physics From the Standard Model into the unknown: towards energies of 1 TeV and beyond: the Terascale Towards Quantum Gravity From the unknown into the unknown... http://www.damtp.cam.ac.uk/user/gr/public/bb_history.html UNESCO Information 25 Preservation debate, April 2007 - [email protected]
  • 26. WLCG Timeline  First Beam on LHC, Sep. 10, 2008  Severe Incident after 3w operation (3.5TeV)
  • 27. Petabyte Scale Data Challenges  Why Petabyte?  Experiment Computing Model  Comparing with conventional data management  Challenges  Performance: LAN and WAN activities  Sufficient B/W between CPU Farm  Eliminate Uplink Bottleneck (Switch Tires)  Fast responding of Critical Events  Fabric Infrastructure & Service Level Agreement  Scalability and Manageability  Robust DB engine (Oracle RAC)  KB and Adequate Administration (Training)
  • 28. Tier Model and Data Management Components
  • 29. Disk Pool Configuration - T1 MSS (CASTOR)
  • 30. Distribution of Free Capacity - Per Disk Servers vs. per Pool
  • 31. Storage Server Generation - Drive vs. Net Capacity (Raid6) TB TB 21TB/DS 31TB/DS TB TB 40TB/DS 15TB/DS
  • 32. IDC Collocation  Facility install complete at Mar 27th  Tape system delay after Apr 9th  Realignment  RMA for faulty parts
  • 33. Storage Farm  ~ 110 raid subsystem deployed since 2003.  Supporting both Tier1 and 2 storage fabric  DAS connection to front-end blade server  Flexible switching front end server upon performance requirement  4-8G fiber channel connectivity
  • 35. Throughput of WLCG Experiments  Throughput defined as Job Eff. x # Jobs running  Characteristic of 4 LHC Exp. depicting in-efficiency is due to poor coding.
  • 36. Reliability From Different View Perspective
  • 37. Storage Fabric Management – The Challenges: Events Management
  • 38. Open Cloud Consortium Cloud Hype and Observation
  • 40. Cloud Hype  Metacomputing (~1987, L. Smarr)  Grid Computing (~1997, I. Foster, K. Kesselman)  Cloud Computing (~2007, E. Schmidt?)
  • 41. Type of Infrastructure   roprietary solutions by public providers P  Turnkey solutions developed internally as they own the software and hardware solution/tech.   loud specific support C  Developers of specific hardware and/or software solutions that are utilized by service providers or used internally when building private cloud   raditional providers T  Leverage or tweak their existing
  • 42. Grid and Cloud: Comparison  Cost & Performance  Scale & Usability  Service Mapping  Interoperability  Application Scenarios
  • 43. Cloud Computing: “X” as a Service   ype of Cloud T   ayered Service Model L   eference Model R
  • 44. Virtualization is not Cloud computing  Performance Overhead  FV vs. PV  Disk I/O and network throughput (VM scalability) Ref: Linux-based virtualization for HPC clusters.
  • 45. Cloud Infrastructure Best practical & Real world performance   tart Up: 60 ~ 44s S   estart : 30 ~ 27s R   eletion: 60 ~ <5s D   igrate M  30 VM ~ 26.8s  60 VM ~ 40s   20 VM ~ 89s 1   top S  30VM ~ 27.4s  60VM ~ 26s   20VM ~ 57s 1
  • 46. Cloud Infrastructure Best practical Real World Performance   tart Up: 60 ~ 44s S   estart : 30 ~ 27s R   eletion: 60 ~ <5s D   igrate M  30 VM ~ 26.8s  60 VM ~ 40s   20 VM ~ 89s 1   top S  30VM ~ 27.4s  60VM ~ 26s   20VM ~ 57s 1
  • 49. Grid over Cloud or Cloud over Grid?
  • 51. Conclusion: My Opinion  Future of Computing: Technology-Push & Demand- Pull  Emerging of new science paradigm  Virtualization: Promising Technology but being overemphasized  Green: Cloud Service Transparency & Common Platform  More Computing Power ~ Power Consumption Challenge  Private Clouds Will be predominant way  Commercial Cloud (Public) expect not evolving fast
  • 52. Acknowledgment  Thanks valuable discussion/inputs from TCloud (Cloud OS: Elaster)  Professional Technical Support from Silvershine Tech. at beginning of the collaboration. The interesting thing about Cloud Computing is that we’ve defined Cloud Computing to include everything that we already do….. I don’t understand what we would do differently in the light of Cloud Computing other than change the wording of some of our ads. Larry Ellison, quote in the Wall Street Journal, Sep 26, 2008
  • 53. Issues  Scalability?  Infrastructure operation vs. performance  Assessment  Application aware – Cloud service  Cost analysis  Data center power usage – PUE  Cloud Myth  Top 10 Cloud Computing Trend  http://www.focus.com/articles/hosting-bandwidth/ top-10-cloud-computing-trends/  Use Cases & Best Practical
  • 54. Issues (II)  Volunteer computing (boinc)?  Total capacity & performance  successful stories & research Despines  What’s hindering cloud adoption? Try human.  http://gigaom.com/cloud/whats-hindering-cloud- adoption-how-about-humans/  Future projection?  service readiness? Service level? Technical barriers?