SlideShare a Scribd company logo
GRID COMPUTING Faisal N. Abu-Khzam  &  Michael A. Langston University of Tennessee
Outline Hour 1: Introduction    Break Hour 2: Using the Grid Break Hour 3: Ongoing Research Q&A Session
Hour 1: Introduction What is Grid Computing? Who Needs It? An Illustrative Example Grid Users Current Grids
What is Grid Computing? Computational Grids Homogeneous (e.g., Clusters) Heterogeneous (e.g., with one-of-a-kind instruments) Cousins of Grid Computing Methods of Grid Computing
Computational Grids A network of geographically distributed resources including computers, peripherals, switches, instruments, and data. Each user should have a single login account to access all resources. Resources may be owned by diverse organizations.
Computational Grids Grids are typically managed by gridware.  Gridware can be viewed as a special type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance, availability…)
Cousins of Grid Computing Parallel Computing Distributed Computing Peer-to-Peer Computing Many others: Cluster Computing, Network Computing, Client/Server Computing, Internet Computing, etc...
Distributed Computing People often ask: Is Grid Computing a fancy new name for the concept of distributed computing? In general, the answer is “no.” Distributed Computing is most often concerned with distributing the load of a program across two or more processes.
PEER2PEER Computing Sharing of computer resources and services by direct exchange between systems. Computers can act as clients or servers depending on what role is most efficient for the network.
Methods of Grid Computing Distributed Supercomputing High-Throughput Computing On-Demand Computing Data-Intensive Computing Collaborative Computing Logistical Networking
Distributed Supercomputing Combining multiple high-capacity resources on a computational grid into a single, virtual distributed supercomputer. Tackle problems that cannot be solved on a single system.
High-Throughput Computing Uses the grid to schedule large numbers of loosely coupled or independent tasks, with the goal of putting unused processor cycles to work.
On-Demand Computing Uses grid capabilities to meet short-term requirements for resources that are not locally accessible. Models real-time computing demands.
Data-Intensive Computing The focus is on synthesizing new information from data that is maintained in geographically distributed repositories, digital libraries, and databases. Particularly useful for distributed data mining.
Collaborative Computing Concerned primarily with enabling and enhancing human-to-human interactions.  Applications are often structured in terms of a virtual shared space.
Logistical Networking Global scheduling and optimization of data movement. Contrasts with traditional networking, which does not explicitly model storage resources in the network.  Called "logistical" because of the analogy it bears with the systems of warehouses, depots, and distribution channels.
Who Needs Grid Computing? A chemist may utilize hundreds of processors to screen thousands of compounds per hour. Teams of engineers worldwide pool resources to analyze terabytes of structural data. Meteorologists seek to visualize and analyze petabytes of climate data with enormous computational demands.
An Illustrative Example Tiffany Moisan, a NASA research scientist, collected microbiological samples in the tidewaters around Wallops Island, Virginia. She needed the high-performance microscope located at the National Center for Microscopy and Imaging Research (NCMIR), University of California, San Diego.
Example (continued) She sent the samples to San Diego and used NPACI’s Telescience Grid and NASA’s Information Power Grid (IPG) to view and control the output of the microscope from her desk on Wallops Island. Thus, in addition to viewing the samples, she could move the platform holding them and make adjustments to the microscope.
Example (continued) The microscope produced a huge dataset of images. This dataset was stored using a storage resource broker on NASA’s IPG. Moisan was able to run algorithms on this very dataset while watching the results in real time.
Grid Users Grid developers Tool developers Application developers End Users System Administrators
Grid Developers Very small group. Implementers of a grid “protocol” who provides the basic services required to construct a grid.
Tool Developers Implement the programming models used by application developers. Implement basic services similar to conventional computing services: User authentication/authorization Process management Data access and communication
Tool Developers Also implement new (grid) services such as: Resource locations Fault detection Security Electronic payment
Application Developers Construct grid-enabled applications for end-users who should be able to use these applications without concern for the underlying grid. Provide programming models that are appropriate for grid environments and services that programmers can rely on when developing (higher-level) applications.
System Administrators Balance local and global concerns. Manage grid components and infrastructure. Some tasks still not well delineated  due to the high degree of sharing required.
Some Highly-Visible Grids The NSF PACI/NCSA Alliance Grid. The NSF PACI/SDSC NPACI Grid. The NASA Information Power Grid (IPG). The Distributed Terascale Facility (DTF) Project.
DTF Currently being built by NSF’s Partnerships for Advanced Computational Infrastructure (PACI) A collaboration: NCSA, SDSC, Argonne, and Caltech will work in conjunction with IBM, Intel, Quest Communications, Myricom, Sun Microsystems, and Oracle.
DTF Expectations A 40-billion-bits-per-second optical network (Called TeraGrid) is to link computers, visualization systems, and data at four sites. Performs 11.6 trillion calculations per second. Stores more than 450 trillion bytes of data.
GRID COMPUTING BREAK
Hour 2: Using the Grid Globus  Condor Harness Legion IBP NetSolve Others
Globus A collaboration of Argonne National Laboratory’s Mathematics and Computer Science Division, the University of Southern California’s Information Sciences Institute, and the University of Chicago's Distributed Systems Laboratory. Started in 1996 and is gaining popularity year after year.
Globus A project to develop the underlying technologies needed for the construction of computational grids. Focuses on execution environments for integrating widely-distributed computational platforms, data resources, displays, special instruments and so forth.
The Globus Toolkit The Globus Resource Allocation Manager (GRAM)   Creates, monitors, and manages services. Maps requests to local schedulers and computers.   The Grid Security Infrastructure (GSI) Provides authentication services.
The Globus Toolkit The Monitoring and Discovery Service (MDS) Provides information about system status, including server configurations, network status, and locations of replicated datasets, etc. Nexus and globus_io  provides communication services for heterogeneous environments.
The Globus Toolkit Global Access to Secondary Storage (GASS) Provides data movement and access mechanisms that enable remote programs to manipulate local data. Heartbeat Monitor (HBM)  Used by both system administrators and ordinary users to detect failure of system components or processes.
Condor The Condor project started in 1988 at the University of Wisconsin-Madison. The main goal is to develop tools to support High Throughput Computing on large collections of distributively owned computing resources.
Condor Runs on a cluster of workstations to glean wasted CPU cycles.  A “Condor pool” consists of any number of machines, of possibly different architectures and operating systems, that are connected by a network. Condor pools can share resources by a feature of Condor called flocking.
The Condor Pool Software Job management services: Supports requests about the job queue . Puts a job on hold. Enables the submission of new jobs. Provides information about jobs that are already finished. A machine with job management installed is called a submit machine.
The Condor Pool Software Resource management: Keeps track of available machines. Performs resource allocation and scheduling. Machines with resource management installed are called execute machines. A machine could be a “submit” and an “execute” machine simultaneously.
Condor-G A version of Condor that uses Globus to submit jobs to remote resources. Allows users to monitor jobs submitted through the Globus toolkit. Can be installed on a single machine. Thus no need to have a Condor pool installed.
Legion An object-based metasystems software project designed at the University of Virginia to support millions of hosts and trillions of objects linked together with high-speed links.  Allows groups of users to construct shared virtual work spaces, to collaborate research and exchange information.
Legion An open system designed to encourage third party development of new or updated applications, run-time library implementations, and core components. The key feature of Legion is its object-oriented approach.
Harness A Heterogeneous Adaptable Reconfigurable Networked System A collaboration between Oak Ridge National Lab, the University of Tennessee, and Emory University. Conceived as a natural successor of the PVM project.
Harness An experimental system based on a highly customizable, distributed virtual machine (DVM) that can run on anything from a Supercomputer to a PDA.  Built on three key areas of research: Parallel Plug-in Interface, Distributed Peer-to-Peer Control, and Multiple DVM Collaboration.
IBP The Internet Backplane Protocol (IBP) is a middleware for managing and using remote storage.  It was devised at the University of Tennessee to support Logistical Networking in large scale, distributed systems and applications.
IBP Named because it was designed to enable applications to treat the Internet as if it were a processor backplane.  On a processor backplane, the user has access to memory and peripherals, and can direct communication between them with DMA.
IBP IBP gives the user access to remote storage and standard Internet resources (e.g. content servers implemented with standard sockets) and can direct communication between them with the IBP API.
IBP By providing a uniform, application-independent interface to storage in the network, IBP makes it possible for applications of all kinds to use logistical networking to exploit data locality and more effectively manage buffer resources.
NetSolve A client-server-agent model. Designed for solving complex scientific problems in a loosely-coupled heterogeneous environment.
The NetSolve Agent  A “resource broker” that represents the gateway to the NetSolve system Maintains an index of the available computational resources and their characteristics, in addition to usage statistics.
The NetSolve Agent Accepts requests for computational services from the client API and dispatches them to the best-suited sever. Runs on Linux and UNIX.
The NetSolve Client Provides access to remote resources through simple and intuitive APIs. Runs on a user’s local system. Contacts the NetSolve system through the agent, which in turn returns the server that can best service the request. Runs on Linux, UNIX, and Windows.
The NetSolve Server The computational backbone of the system. A daemon process that awaits client requests. Runs on different platforms: a single workstation, cluster of workstations, symmetric multiprocessors (SMPs), or massively parallel processors (MPPs).
The NetSolve Server A key component of the server is the Problem Description File (PDF).  With the PDF, routines local to a given server are made available to clients throughout the NetSolve system.
The PDF Template PROBLEM Program Name    … LIB Supporting Library Information    … INPUT specifications    … OUTPUT specifications   … CODE
Network Weather Service Supports grid technologies. Uses sensor processes to monitor cpu loads and network traffic.  Uses statistical models on the collected data to generate a forecast of future behavior.   NetSolve is currently integrating NWS into its agent.
Gridware Collaboarations NetSolve is using Globus' "Heartbeat Monitor" to detect failed servers.  A NetSolve client is now in testing that allows access to Globus. Legion has adopted NetSolve’s client-user interface to leverage its metacomputing resources.  The NetSolve client uses Legion’s data-flow graphs to keep track of data dependencies.
Gridware Collaboarations NetSolve can access Condor pools among its computational resources. IBP-enabled clients and servers allow NetSolve to allocate and schedule storage resources as part of its resource brokering. This improves fault tolerance.
GRID COMPUTING BREAK
Hour 3: Ongoing Research Motivation. Special Projects. Ongoing work at Tennessee General Issues. Open questions of interest to the entire research community
Motivation Computer speed doubles every 18 months Network speed doubles every 9 months Graph from  Scientific American  (Jan-2001) by Cleo Vilett,  source Vined Khoslan, Kleiner, Caufield and Perkins
Special Projects The SInRG Project. Grid Service Clusters (GSCs) Data Switches Incorporating Hardware Acceleration. Unbridled Parallelism [email_address]  and  [email_address] The Vertex Cover Solver Security.
The SInRG Project
The Grid Service Cluster The basic grid building block. Each GSC will use the same software infrastructure as is now being deployed on the national Grid, but tuned to take advantage of the highly structured and controlled design of the cluster.   Some GSCs are general-purpose and some are special-purpose.
The Grid Service Cluster
An advanced data switch The components that make up a GSC must be able to access each other at very high speeds and with guaranteed Quality of Service (QoS).  Links of at least1Gbps assure QoS in many circumstances simply by over provisioning.
Computational Ecology GSC Collaboration between computer science and mathematical ecology. 8-processor Symmetric Multi-Processor (SMP). Initial in-core memory (RAM) is approximately 4 gigabytes. Out-of-core data storage unit provides a minimum of 450 gigabytes .
Medical Imaging GSC Collaboration between computer science and the medical school. High-end graphics workstations. Distinguished by the need to have these workstations attached as directly as possible to the switch to facilitate interactive manipulation of the reconstructed images.
Molecular Design GSC  Collaboration between computer science and chemical engineering. Data visualization laboratory 32 dual processors High performance switch
Machine Design GSC Collaboration between computer science and electrical engineering. 12 Unix-based CAD workstations. 8 Linux boxes with Pilchard boards. Investigating the potential of reconfigurable computing in grid environments.
Machine Design GSC
Types of Hardware General purpose hardware – can implement any function ASICs – hardware that can implement only a specific application FPGAs – reconfigurable hardware that can implement any function
The FPGA FPGAs offer reprogrammability Allows optimal logic design of each function to be implemented Hardware implementations offer acceleration over software implementations which are run on general purpose processors
The Pilchard Environment Developed at Chinese University in Hong Kong. Plugs into 133MHz RAM DIMM slot and is an example of “programmable active memory.” Pilchard is accessed through memory read/write operations. Higher bandwidth and lower latency than other environments.
Objectives Evaluate utility of NetSolve gridware. Determine effectiveness of hardware acceleration in this environment. Provide an interface for the remote use of FPGAs. Allow users to experiment and gauge whether a given problem would benefit from hardware acceleration.
Sample Implementations Fast Fourier Transform (FFT) Data Encryption Standard algorithm (DES) Image backprojection algorithm A variety of combinatorial algorithms
Implementation Techniques Two types of functions are implemented  Software version - runs on the PC’s processor Hardware version - runs in the FPGA To implement the hardware version of the function, VHDL code is needed
The Hardware Function Implemented in VHDL or some other hardware description language. The VHDL code is then mapped onto the FPGA (synthesis). CAD tools help make mapping decisions based on constraints such as: chip area, I/O pin counts, routing resources and topologies, partitioning, resource usage minimization.
The Hardware Function Result of synthesis is a configuration file (bit stream). This file defines how the FPGA is to be reprogrammed in order to implement the new desired functionality. To run, a copy of the configuration file must be loaded on the FPGA.
Behind the Scenes Software and  Hardware functions PDFs, Libraries VHDL code Configuration file VHDL programmer Server administrator Software programmer Synthesis Client NetSolve server Request Request results
Conclusions Hardware acceleration is offered to both local and remote users. Resources are available through an efficient and easy-to-use interface. A development environment is provided for devising and testing a wide variety of software, hardware and hybrid solutions.
Unbridled Parallelism Sometimes the overhead of gridware is unneeded. Well known examples include  [email_address]  and  [email_address] . We’re currently building a Vertex Cover solver with multiple levels of acceleration.
A Naked SSH Approach A bit of blasphemy: the anti-gridware paradigm    Our work begs several questions.   When does it make sense? How much efficiency are we gaining? What are its limitations?
Grid Security Algorithm complexity theory Verifiability Concealment Cryptography and checkpointing Corroboration  Scalability Voting and spot-checking Fault tolerance Reliability
Some General Issues Grid architecture. Resource management. QoS mechanisms. Performance monitoring. Fault tolerance.
References URL to these slides: http://www.cs.utk.edu/~abukhzam/grid-tutorial.htm Condor: http://www.cs.wisc.edu/condor Globus: http://www.globus.org
References NetSolve: http://icl.cs.utk.edu/netsolve Harness: NWS: http://nws.cs.ucsb.edu/ SInRG: http://icl.cs.utk.edu/sinrg
GRID COMPUTING END

More Related Content

What's hot (20)

PPTX
Cloud Computing
MANVENDRA PRIYADARSHI
 
PPTX
Cluster computing
Raja' Masa'deh
 
PDF
11. grid scheduling and resource managament
Dr Sandeep Kumar Poonia
 
PPTX
Grid protocol architecture
Pooja Dixit
 
PPTX
Trends in distributed systems
Jayanthi Radhakrishnan
 
PPTX
Fog computing technology
Nikhil Sabu
 
PPTX
Grid Computing
abhiritva
 
PPTX
Introduction of grid computing
Pooja Dixit
 
PPTX
Characteristics of cloud computing
GOVERNMENT COLLEGE OF ENGINEERING,TIRUNELVELI
 
PPT
Grid Computing
Alan Leewllyn Bivera
 
PPTX
Fog Computing
Manoj Mandava
 
PPT
Evolution of the cloud
sagaroceanic11
 
PPTX
Distributed Computing
Megha yadav
 
PPTX
Grid computing
shweta-sharma99
 
PPTX
Unit 3 cs6601 Distributed Systems
Nandakumar P
 
PPTX
Cloud computing
Karthik Sathyanarayanan
 
PPTX
Design Goals of Distributed System
Ashish KC
 
PDF
Cloud Computing System models for Distributed and cloud computing & Performan...
hrmalik20
 
Cloud Computing
MANVENDRA PRIYADARSHI
 
Cluster computing
Raja' Masa'deh
 
11. grid scheduling and resource managament
Dr Sandeep Kumar Poonia
 
Grid protocol architecture
Pooja Dixit
 
Trends in distributed systems
Jayanthi Radhakrishnan
 
Fog computing technology
Nikhil Sabu
 
Grid Computing
abhiritva
 
Introduction of grid computing
Pooja Dixit
 
Characteristics of cloud computing
GOVERNMENT COLLEGE OF ENGINEERING,TIRUNELVELI
 
Grid Computing
Alan Leewllyn Bivera
 
Fog Computing
Manoj Mandava
 
Evolution of the cloud
sagaroceanic11
 
Distributed Computing
Megha yadav
 
Grid computing
shweta-sharma99
 
Unit 3 cs6601 Distributed Systems
Nandakumar P
 
Cloud computing
Karthik Sathyanarayanan
 
Design Goals of Distributed System
Ashish KC
 
Cloud Computing System models for Distributed and cloud computing & Performan...
hrmalik20
 

Viewers also liked (16)

PDF
Proxmox
Yashar Esmaildokht
 
PPTX
SMART GRID TECHNOLOGY
asegekar18
 
PPT
Grid Computing
Arun Basil Lal
 
PPT
Grid Technologies in Disaster Management
Videoguy
 
PPTX
Grid computing
Ramraj Choudhary
 
PPTX
Grid computing
Shashwat Shriparv
 
PPT
Grid Presentation
Marielisa Peralta
 
PDF
Energy efficiency optimization in oil and gas industry
Saeed Alipour
 
PPT
Grid Computing
Senthil Kumar
 
PPTX
Wireless sensor networks
zahra khavari
 
PPTX
Cloud vs grid
Omid Sohrabi
 
PPT
Grid computing ppt 2003(done)
TASNEEM88
 
PDF
Grid technology for next gen media processing
vrt-medialab
 
PPTX
Smart Grid Introduction
Nilesh Dhage
 
PPTX
Smart grid ppt
Ravish Pandey
 
PDF
Grid Systems
Bas Leurs
 
SMART GRID TECHNOLOGY
asegekar18
 
Grid Computing
Arun Basil Lal
 
Grid Technologies in Disaster Management
Videoguy
 
Grid computing
Ramraj Choudhary
 
Grid computing
Shashwat Shriparv
 
Grid Presentation
Marielisa Peralta
 
Energy efficiency optimization in oil and gas industry
Saeed Alipour
 
Grid Computing
Senthil Kumar
 
Wireless sensor networks
zahra khavari
 
Cloud vs grid
Omid Sohrabi
 
Grid computing ppt 2003(done)
TASNEEM88
 
Grid technology for next gen media processing
vrt-medialab
 
Smart Grid Introduction
Nilesh Dhage
 
Smart grid ppt
Ravish Pandey
 
Grid Systems
Bas Leurs
 
Ad

Similar to Grid Computing (20)

PPT
Gridcomputingppt
navjasser
 
DOCX
Grid computing assiment
Huma Tariq
 
PPT
All about GridComputing-an introduction (2).ppt
lagoki2767
 
PPT
GridComputing-an introduction.ppt
NileshkuGiri
 
PPTX
Unit i introduction to grid computing
sudha kar
 
PPT
Grid1
Sonia Sharma
 
PDF
7- Grid Computing.Pdf
Brittany Allen
 
PPT
Komputasi Awan
Michael Sunggiardi
 
DOCX
Grid
dhruvnaik1112
 
PDF
Grid computing: An Emerging Technology
ijsrd.com
 
PPT
grid mining
ARNOLD
 
PPTX
3 - Grid Computing.pptx
RiazSalim1
 
PDF
Grid Computing In Israel
Guy Tel-Zur
 
PPT
grid computing
elliando dias
 
PPTX
Grid computing
Tapas Palei
 
DOC
Grid computing 12
Dhamu Harker
 
PPT
Grid Computing - Collection of computer resources from multiple locations
Dibyadip Das
 
PPT
Grid computing
Keshab Nath
 
PDF
Gc vit sttp cc december 2013
Seema Shah
 
Gridcomputingppt
navjasser
 
Grid computing assiment
Huma Tariq
 
All about GridComputing-an introduction (2).ppt
lagoki2767
 
GridComputing-an introduction.ppt
NileshkuGiri
 
Unit i introduction to grid computing
sudha kar
 
7- Grid Computing.Pdf
Brittany Allen
 
Komputasi Awan
Michael Sunggiardi
 
Grid computing: An Emerging Technology
ijsrd.com
 
grid mining
ARNOLD
 
3 - Grid Computing.pptx
RiazSalim1
 
Grid Computing In Israel
Guy Tel-Zur
 
grid computing
elliando dias
 
Grid computing
Tapas Palei
 
Grid computing 12
Dhamu Harker
 
Grid Computing - Collection of computer resources from multiple locations
Dibyadip Das
 
Grid computing
Keshab Nath
 
Gc vit sttp cc december 2013
Seema Shah
 
Ad

More from sharmili priyadarsini (20)

Recently uploaded (20)

PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Français Patch Tuesday - Juillet
Ivanti
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
Top Managed Service Providers in Los Angeles
Captain IT
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 

Grid Computing

  • 1. GRID COMPUTING Faisal N. Abu-Khzam & Michael A. Langston University of Tennessee
  • 2. Outline Hour 1: Introduction Break Hour 2: Using the Grid Break Hour 3: Ongoing Research Q&A Session
  • 3. Hour 1: Introduction What is Grid Computing? Who Needs It? An Illustrative Example Grid Users Current Grids
  • 4. What is Grid Computing? Computational Grids Homogeneous (e.g., Clusters) Heterogeneous (e.g., with one-of-a-kind instruments) Cousins of Grid Computing Methods of Grid Computing
  • 5. Computational Grids A network of geographically distributed resources including computers, peripherals, switches, instruments, and data. Each user should have a single login account to access all resources. Resources may be owned by diverse organizations.
  • 6. Computational Grids Grids are typically managed by gridware. Gridware can be viewed as a special type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance, availability…)
  • 7. Cousins of Grid Computing Parallel Computing Distributed Computing Peer-to-Peer Computing Many others: Cluster Computing, Network Computing, Client/Server Computing, Internet Computing, etc...
  • 8. Distributed Computing People often ask: Is Grid Computing a fancy new name for the concept of distributed computing? In general, the answer is “no.” Distributed Computing is most often concerned with distributing the load of a program across two or more processes.
  • 9. PEER2PEER Computing Sharing of computer resources and services by direct exchange between systems. Computers can act as clients or servers depending on what role is most efficient for the network.
  • 10. Methods of Grid Computing Distributed Supercomputing High-Throughput Computing On-Demand Computing Data-Intensive Computing Collaborative Computing Logistical Networking
  • 11. Distributed Supercomputing Combining multiple high-capacity resources on a computational grid into a single, virtual distributed supercomputer. Tackle problems that cannot be solved on a single system.
  • 12. High-Throughput Computing Uses the grid to schedule large numbers of loosely coupled or independent tasks, with the goal of putting unused processor cycles to work.
  • 13. On-Demand Computing Uses grid capabilities to meet short-term requirements for resources that are not locally accessible. Models real-time computing demands.
  • 14. Data-Intensive Computing The focus is on synthesizing new information from data that is maintained in geographically distributed repositories, digital libraries, and databases. Particularly useful for distributed data mining.
  • 15. Collaborative Computing Concerned primarily with enabling and enhancing human-to-human interactions. Applications are often structured in terms of a virtual shared space.
  • 16. Logistical Networking Global scheduling and optimization of data movement. Contrasts with traditional networking, which does not explicitly model storage resources in the network. Called "logistical" because of the analogy it bears with the systems of warehouses, depots, and distribution channels.
  • 17. Who Needs Grid Computing? A chemist may utilize hundreds of processors to screen thousands of compounds per hour. Teams of engineers worldwide pool resources to analyze terabytes of structural data. Meteorologists seek to visualize and analyze petabytes of climate data with enormous computational demands.
  • 18. An Illustrative Example Tiffany Moisan, a NASA research scientist, collected microbiological samples in the tidewaters around Wallops Island, Virginia. She needed the high-performance microscope located at the National Center for Microscopy and Imaging Research (NCMIR), University of California, San Diego.
  • 19. Example (continued) She sent the samples to San Diego and used NPACI’s Telescience Grid and NASA’s Information Power Grid (IPG) to view and control the output of the microscope from her desk on Wallops Island. Thus, in addition to viewing the samples, she could move the platform holding them and make adjustments to the microscope.
  • 20. Example (continued) The microscope produced a huge dataset of images. This dataset was stored using a storage resource broker on NASA’s IPG. Moisan was able to run algorithms on this very dataset while watching the results in real time.
  • 21. Grid Users Grid developers Tool developers Application developers End Users System Administrators
  • 22. Grid Developers Very small group. Implementers of a grid “protocol” who provides the basic services required to construct a grid.
  • 23. Tool Developers Implement the programming models used by application developers. Implement basic services similar to conventional computing services: User authentication/authorization Process management Data access and communication
  • 24. Tool Developers Also implement new (grid) services such as: Resource locations Fault detection Security Electronic payment
  • 25. Application Developers Construct grid-enabled applications for end-users who should be able to use these applications without concern for the underlying grid. Provide programming models that are appropriate for grid environments and services that programmers can rely on when developing (higher-level) applications.
  • 26. System Administrators Balance local and global concerns. Manage grid components and infrastructure. Some tasks still not well delineated due to the high degree of sharing required.
  • 27. Some Highly-Visible Grids The NSF PACI/NCSA Alliance Grid. The NSF PACI/SDSC NPACI Grid. The NASA Information Power Grid (IPG). The Distributed Terascale Facility (DTF) Project.
  • 28. DTF Currently being built by NSF’s Partnerships for Advanced Computational Infrastructure (PACI) A collaboration: NCSA, SDSC, Argonne, and Caltech will work in conjunction with IBM, Intel, Quest Communications, Myricom, Sun Microsystems, and Oracle.
  • 29. DTF Expectations A 40-billion-bits-per-second optical network (Called TeraGrid) is to link computers, visualization systems, and data at four sites. Performs 11.6 trillion calculations per second. Stores more than 450 trillion bytes of data.
  • 31. Hour 2: Using the Grid Globus Condor Harness Legion IBP NetSolve Others
  • 32. Globus A collaboration of Argonne National Laboratory’s Mathematics and Computer Science Division, the University of Southern California’s Information Sciences Institute, and the University of Chicago's Distributed Systems Laboratory. Started in 1996 and is gaining popularity year after year.
  • 33. Globus A project to develop the underlying technologies needed for the construction of computational grids. Focuses on execution environments for integrating widely-distributed computational platforms, data resources, displays, special instruments and so forth.
  • 34. The Globus Toolkit The Globus Resource Allocation Manager (GRAM) Creates, monitors, and manages services. Maps requests to local schedulers and computers. The Grid Security Infrastructure (GSI) Provides authentication services.
  • 35. The Globus Toolkit The Monitoring and Discovery Service (MDS) Provides information about system status, including server configurations, network status, and locations of replicated datasets, etc. Nexus and globus_io provides communication services for heterogeneous environments.
  • 36. The Globus Toolkit Global Access to Secondary Storage (GASS) Provides data movement and access mechanisms that enable remote programs to manipulate local data. Heartbeat Monitor (HBM) Used by both system administrators and ordinary users to detect failure of system components or processes.
  • 37. Condor The Condor project started in 1988 at the University of Wisconsin-Madison. The main goal is to develop tools to support High Throughput Computing on large collections of distributively owned computing resources.
  • 38. Condor Runs on a cluster of workstations to glean wasted CPU cycles. A “Condor pool” consists of any number of machines, of possibly different architectures and operating systems, that are connected by a network. Condor pools can share resources by a feature of Condor called flocking.
  • 39. The Condor Pool Software Job management services: Supports requests about the job queue . Puts a job on hold. Enables the submission of new jobs. Provides information about jobs that are already finished. A machine with job management installed is called a submit machine.
  • 40. The Condor Pool Software Resource management: Keeps track of available machines. Performs resource allocation and scheduling. Machines with resource management installed are called execute machines. A machine could be a “submit” and an “execute” machine simultaneously.
  • 41. Condor-G A version of Condor that uses Globus to submit jobs to remote resources. Allows users to monitor jobs submitted through the Globus toolkit. Can be installed on a single machine. Thus no need to have a Condor pool installed.
  • 42. Legion An object-based metasystems software project designed at the University of Virginia to support millions of hosts and trillions of objects linked together with high-speed links. Allows groups of users to construct shared virtual work spaces, to collaborate research and exchange information.
  • 43. Legion An open system designed to encourage third party development of new or updated applications, run-time library implementations, and core components. The key feature of Legion is its object-oriented approach.
  • 44. Harness A Heterogeneous Adaptable Reconfigurable Networked System A collaboration between Oak Ridge National Lab, the University of Tennessee, and Emory University. Conceived as a natural successor of the PVM project.
  • 45. Harness An experimental system based on a highly customizable, distributed virtual machine (DVM) that can run on anything from a Supercomputer to a PDA. Built on three key areas of research: Parallel Plug-in Interface, Distributed Peer-to-Peer Control, and Multiple DVM Collaboration.
  • 46. IBP The Internet Backplane Protocol (IBP) is a middleware for managing and using remote storage. It was devised at the University of Tennessee to support Logistical Networking in large scale, distributed systems and applications.
  • 47. IBP Named because it was designed to enable applications to treat the Internet as if it were a processor backplane. On a processor backplane, the user has access to memory and peripherals, and can direct communication between them with DMA.
  • 48. IBP IBP gives the user access to remote storage and standard Internet resources (e.g. content servers implemented with standard sockets) and can direct communication between them with the IBP API.
  • 49. IBP By providing a uniform, application-independent interface to storage in the network, IBP makes it possible for applications of all kinds to use logistical networking to exploit data locality and more effectively manage buffer resources.
  • 50. NetSolve A client-server-agent model. Designed for solving complex scientific problems in a loosely-coupled heterogeneous environment.
  • 51. The NetSolve Agent A “resource broker” that represents the gateway to the NetSolve system Maintains an index of the available computational resources and their characteristics, in addition to usage statistics.
  • 52. The NetSolve Agent Accepts requests for computational services from the client API and dispatches them to the best-suited sever. Runs on Linux and UNIX.
  • 53. The NetSolve Client Provides access to remote resources through simple and intuitive APIs. Runs on a user’s local system. Contacts the NetSolve system through the agent, which in turn returns the server that can best service the request. Runs on Linux, UNIX, and Windows.
  • 54. The NetSolve Server The computational backbone of the system. A daemon process that awaits client requests. Runs on different platforms: a single workstation, cluster of workstations, symmetric multiprocessors (SMPs), or massively parallel processors (MPPs).
  • 55. The NetSolve Server A key component of the server is the Problem Description File (PDF). With the PDF, routines local to a given server are made available to clients throughout the NetSolve system.
  • 56. The PDF Template PROBLEM Program Name … LIB Supporting Library Information … INPUT specifications … OUTPUT specifications … CODE
  • 57. Network Weather Service Supports grid technologies. Uses sensor processes to monitor cpu loads and network traffic. Uses statistical models on the collected data to generate a forecast of future behavior. NetSolve is currently integrating NWS into its agent.
  • 58. Gridware Collaboarations NetSolve is using Globus' "Heartbeat Monitor" to detect failed servers. A NetSolve client is now in testing that allows access to Globus. Legion has adopted NetSolve’s client-user interface to leverage its metacomputing resources. The NetSolve client uses Legion’s data-flow graphs to keep track of data dependencies.
  • 59. Gridware Collaboarations NetSolve can access Condor pools among its computational resources. IBP-enabled clients and servers allow NetSolve to allocate and schedule storage resources as part of its resource brokering. This improves fault tolerance.
  • 61. Hour 3: Ongoing Research Motivation. Special Projects. Ongoing work at Tennessee General Issues. Open questions of interest to the entire research community
  • 62. Motivation Computer speed doubles every 18 months Network speed doubles every 9 months Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins
  • 63. Special Projects The SInRG Project. Grid Service Clusters (GSCs) Data Switches Incorporating Hardware Acceleration. Unbridled Parallelism [email_address] and [email_address] The Vertex Cover Solver Security.
  • 65. The Grid Service Cluster The basic grid building block. Each GSC will use the same software infrastructure as is now being deployed on the national Grid, but tuned to take advantage of the highly structured and controlled design of the cluster. Some GSCs are general-purpose and some are special-purpose.
  • 66. The Grid Service Cluster
  • 67. An advanced data switch The components that make up a GSC must be able to access each other at very high speeds and with guaranteed Quality of Service (QoS). Links of at least1Gbps assure QoS in many circumstances simply by over provisioning.
  • 68. Computational Ecology GSC Collaboration between computer science and mathematical ecology. 8-processor Symmetric Multi-Processor (SMP). Initial in-core memory (RAM) is approximately 4 gigabytes. Out-of-core data storage unit provides a minimum of 450 gigabytes .
  • 69. Medical Imaging GSC Collaboration between computer science and the medical school. High-end graphics workstations. Distinguished by the need to have these workstations attached as directly as possible to the switch to facilitate interactive manipulation of the reconstructed images.
  • 70. Molecular Design GSC Collaboration between computer science and chemical engineering. Data visualization laboratory 32 dual processors High performance switch
  • 71. Machine Design GSC Collaboration between computer science and electrical engineering. 12 Unix-based CAD workstations. 8 Linux boxes with Pilchard boards. Investigating the potential of reconfigurable computing in grid environments.
  • 73. Types of Hardware General purpose hardware – can implement any function ASICs – hardware that can implement only a specific application FPGAs – reconfigurable hardware that can implement any function
  • 74. The FPGA FPGAs offer reprogrammability Allows optimal logic design of each function to be implemented Hardware implementations offer acceleration over software implementations which are run on general purpose processors
  • 75. The Pilchard Environment Developed at Chinese University in Hong Kong. Plugs into 133MHz RAM DIMM slot and is an example of “programmable active memory.” Pilchard is accessed through memory read/write operations. Higher bandwidth and lower latency than other environments.
  • 76. Objectives Evaluate utility of NetSolve gridware. Determine effectiveness of hardware acceleration in this environment. Provide an interface for the remote use of FPGAs. Allow users to experiment and gauge whether a given problem would benefit from hardware acceleration.
  • 77. Sample Implementations Fast Fourier Transform (FFT) Data Encryption Standard algorithm (DES) Image backprojection algorithm A variety of combinatorial algorithms
  • 78. Implementation Techniques Two types of functions are implemented Software version - runs on the PC’s processor Hardware version - runs in the FPGA To implement the hardware version of the function, VHDL code is needed
  • 79. The Hardware Function Implemented in VHDL or some other hardware description language. The VHDL code is then mapped onto the FPGA (synthesis). CAD tools help make mapping decisions based on constraints such as: chip area, I/O pin counts, routing resources and topologies, partitioning, resource usage minimization.
  • 80. The Hardware Function Result of synthesis is a configuration file (bit stream). This file defines how the FPGA is to be reprogrammed in order to implement the new desired functionality. To run, a copy of the configuration file must be loaded on the FPGA.
  • 81. Behind the Scenes Software and Hardware functions PDFs, Libraries VHDL code Configuration file VHDL programmer Server administrator Software programmer Synthesis Client NetSolve server Request Request results
  • 82. Conclusions Hardware acceleration is offered to both local and remote users. Resources are available through an efficient and easy-to-use interface. A development environment is provided for devising and testing a wide variety of software, hardware and hybrid solutions.
  • 83. Unbridled Parallelism Sometimes the overhead of gridware is unneeded. Well known examples include [email_address] and [email_address] . We’re currently building a Vertex Cover solver with multiple levels of acceleration.
  • 84. A Naked SSH Approach A bit of blasphemy: the anti-gridware paradigm  Our work begs several questions. When does it make sense? How much efficiency are we gaining? What are its limitations?
  • 85. Grid Security Algorithm complexity theory Verifiability Concealment Cryptography and checkpointing Corroboration Scalability Voting and spot-checking Fault tolerance Reliability
  • 86. Some General Issues Grid architecture. Resource management. QoS mechanisms. Performance monitoring. Fault tolerance.
  • 87. References URL to these slides: http://www.cs.utk.edu/~abukhzam/grid-tutorial.htm Condor: http://www.cs.wisc.edu/condor Globus: http://www.globus.org
  • 88. References NetSolve: http://icl.cs.utk.edu/netsolve Harness: NWS: http://nws.cs.ucsb.edu/ SInRG: http://icl.cs.utk.edu/sinrg

Editor's Notes

  • #63: Because network speed is increasing faster than platform speeds. It’s easier to tie platforms together rather than building new platforms.
  • #65: On-campus infrastructure, funded by NSF, to mirror technologies and interdisciplinary collaborations .
  • #67: Regarded as the basic grid building block. Some are General-purpose and some are special-purpose. GSCs will use the same software infrastructure as is now being deployed on the national Grid, but tuned to take advantage of the highly structured and controlled design of the cluster.
  • #68: Because a GSC is built around a single advanced data switch,communication services, such as advanced forms of QoS, can be easily implemented within a GSC even though similar services would be problematic across an advanced network.
  • #76: Pilchard is a sardine-like fish. Common to the Mediterranean area.
  • #82: If user requests a software version, the NetSolve agent may send the request to any server. But, if the user requests the hardware version, then we know that (as it’s currently configured) the agent will send the request to the Machine Design server. The hardware version has the potential to run several hundred times faster than the software version.
  • #86: Langston-Dongarra-Plank-Beck-Eijkhout. A big project underway at Tennessee to incorporate grid security. 1-(entire problem is concealed) 2-corroborate: verify if something is true. Send data out, put cryptography on it and check if you receive the right data. 3- more checkpointing enhances reliability and fault tolerance. (there is much overlap between all of the above. Voting: many processrs and some of them are untrusted. Give all of them the same computation and check the answers. Those that do note have the same as majority are thrown out. Scalability: all of the above are concerned with scalability 9as grids get bigger…)
  • #87: 1- What’s the right structure? How should grids be organised? Which strategy is better? Harness looks like it’s not dreally popular but Globus is extremely popular. Is it going to be like cars or like PCs? 2- like Netsove? Where everybody can sign on or ? How do you handle Trojan horses? NetSolve ignores them! Because the code runs on the server machine and is not sent to it. So it can’t be infected. Condor actually passes code. So it’s vulnerable to Trojan horses. 2- How much time does a user get on a machine? How much space? Are there clear priority structures? 3-What’s the right metric? Throughput? Elapsed time? 4- What are you monitoring? They measure how busy and loaded a machine is. So how do you monitor a machine without affecting its performance? 5- This will always be an issue in grid computing.