100% found this document useful (1 vote)

97 views

Science in The Clouds: History, Challenges, and Opportunities

This document discusses the history, challenges, and opportunities of cloud computing and distributed computing. It begins by introducing cloud computing concepts like rapid access to virtual resources and the need for flexible applications. Next, it summarizes examples of cloud providers like Amazon EC2 and their pricing models. The document then discusses how clouds relate to and improve upon earlier concepts like utility computing, grids, and clusters. It raises open questions around data security, economic models, and the risks of centralized resources. Finally, it proposes abstraction models like All-Pairs and Wavefront that can help non-experts efficiently utilize large cloud resources for tasks like biometrics research.

Uploaded by

Qamar Nangraj

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

97 views

Science in The Clouds: History, Challenges, and Opportunities

Uploaded by

Qamar Nangraj

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 59

Science in the Clouds: History, Challenges, and Opportunities

Douglas Thain University of Notre Dame GeoClouds Workshop 17 September 2009

http://www.cse.nd.edu/~ccl

The Cooperative Computing Lab

We collaborate with people who have large scale computing problems. We build new software and systems to help them achieve meaningful goals. We run a production computing system used by people at ND and elsewhere. We conduct computer science research, informed by real world experience, with an impact upon problems that matter.
3

Clouds in the Hype Cycle

Gartner Hype Cycle Report, 2009

What is cloud computing?

A cloud provides rapid, metered access to a virtually unlimited set of resources. Two significant impact on users:
End users must have an economic model for the work that they want to accomplish. Apps must be flexible enough to work with an arbitrary number and kind of resources.
5

Example: Amazon EC2 Sep 2009

(simplified slightly for discussion) Small: 1 core, 1.7GB RAM, 160GB disk
10 cents/hour

Large: 2 cores, 7.5GB RAM, 850GB disk

40 cents/hour

Extra Large: 4 cores, 15 GB, 1690GB disk

80 cents/hour

And the Simple Storage Service:

15 cents per GB-month stored 17 cents per GB transferred (outside of EC2) 1 cent per 1000 write operations 1 cent per 10000 read operations
6

Is Cloud Computing New?

Not entirely, but a combination of the old ideas of utility computing and distributed computing.
1960 MULTICS 1980 The Cambridge Ring 1987 Condor Distributed Batch System 1989 Seti@Home 1990s Clusters, Beowulf, MPI, NOW 1995 Globus, Grid Computing 2001 TeraGrid 2004 Sun Rents CPUs at $1/hour 2006 Amazon EC2 and S3
7

Clouds Trade CapEx for OpEx

Cost

2X
OpEx of Ownership
Capital Expense of Ownership OpEx of Cloud Computing

Time
8

What about grid computing?

A vision much like clouds:
A worldwide framework that would make massive scale computing as easy to use as an electrical socket.

The more modest realization:

A means for accessing remote computing facilities in their native form, usually for CPUintensive tasks.

The social context:

Large collaborative efforts between computer scientists and computer-savvy fields, particularly physics and astronomy. 9

Clouds vs Grids
Grids provide a job execution interface:
Run program P on input A, return the output. Allows the system to maximize utilization and hide failures, but provides few performance guarantees and inaccurate metering.

Clouds provide resource allocation:

Create a VM with 2GB of RAM for 7 days. Gives predictable performance and accurate metering, but exposes problems to the user. Can be used to build interactive services. How do I run 1M jobs on 100 servers?
10

Submit 1M Jobs

Allocate 100 CPUs

Grid Computing Layer Provides Job Execution Dispatch Jobs Manage Load

Cloud Computing Layer Provides Resource Allocation

Create a Condor Pool with 100 Nodes

Allocate 100 Cores

Run 1M Jobs

Clouds Solve Some Grid Problems

Application compatibility is simplified.
You provide a VM for Linux 2.3.4.1.2.

Performance is reasonably predictable.

10% variations rather than orders of mag.

Fewer administrative headaches for the lone user.

A credit card swipe instead of a certificate.

But, Problems New and Old:

How do I reliably execute 1M jobs? Can I share resources and data with others in the cloud? How do I authenticate others in the cloud? Unfortunately, location still matters. Can we make applications efficiently span multiple cloud providers? Can we join existing centers with clouds? (These are all problems contemplated by grid.)
14

Of course, not all science fits into the Map-Reduce model!

Example: Biometrics Research

Goal: Design robust face comparison function.

0.97

0.05
19

Similarity Matrix Construction

1.0

0.8 1.0

0.1 0.0 1.0

0.0 0.1 0.0 1.0

0.0 0.1 0.1 0.0 1.0

0.1 0.0 0.3 0.0 0.1

Challenge Workload: 60,000 iris images 1MB each .02s per F 833 CPU-days 600 TB of I/O

1.0
20

I have 60,000 iris images acquired in my research lab. I want to reduce each one to a feature space, and then compare all of them to each other. I want to spend my time doing science, not struggling with computers.
I own a few machines I have a laptop. I can buy time from Amazon or TeraGrid.

Now What?

Non-Expert User Using 500 CPUs

Try 1: Each F is a batch job. Failure: Dispatch latency >> F runtime. Try 2: Each row is a batch job. Failure: Too many small ops on FS.

CPU CPU CPU CPU CPU F F F F F

F F F F F F F F F F F CPU CPU CPU CPU F F F F CPU F F F F F

Try 3: Bundle all files into one package. Failure: Everyone loads 1GB at once.

Try 4: User gives up and attempts to solve an easier or smaller problem.

F F F F F F F F F F F CPU CPU CPU CPU F F F F CPU F F F F F

HN
25

Observation
In a given field of study, many people repeat the same pattern of work many times, making slight changes to the data and algorithms. If the system knows the overall pattern in advance, then it can do a better job of executing it reliably and efficiently. If the user knows in advance what patterns are allowed, then they have a better idea of how to construct their workloads.
26

Abstractions for Distributed Computing

Abstraction: a declarative specification of the computation and data of a workload. A restricted pattern, not meant to be a general purpose programming language. Uses data structures instead of files. Provide users with a bright path. Regular structure makes it tractable to model and predict performance.
27

Working with Abstractions

A1 A2 An A1 A2 Bn F

AllPairs( A, B, F )

Custom Workflow Engine

Compact Data Structure

Cloud or Grid

All-Pairs Abstraction
AllPairs( set A, set B, function F ) returns matrix M where M[i][j] = F( A[i], B[j] ) for all i,j
A1 A1 An
B1 B1 Bn F B3 F F F
29

allpairs A B F.exe
B1 AllPairs(A,B,F) B2 F F F F F F

How Does the Abstraction Help?

The custom workflow engine:
Chooses right data transfer strategy. Chooses the right number of resources. Chooses blocking of functions into jobs. Recovers from a larger number of failures. Predicts overall runtime accurately.

All of these tasks are nearly impossible for arbitrary workloads, but are tractable (not trivial) to solve for a specific abstraction.
30

Choose the Right # of CPUs

Resources Consumed

All-Pairs in Production
Our All-Pairs implementation has provided over 57 CPU-years of computation to the ND biometrics research group over the last year. Largest run so far: 58,396 irises from the Face Recognition Grand Challenge. The largest experiment ever run on publically available data. Competing biometric research relies on samples of 100-1000 images, which can miss important population effects. Reduced computation time from 833 days to 10 days, making it feasible to repeat multiple times for 34 a graduate thesis. (We can go faster yet.)

Are there other abstractions?

Wavefront( matrix M, function F(x,y,d) ) returns matrix M such that M[i,j] = F( M[i-1,j], M[I,j-1], M[i-1,j-1] )
M[0,4] M[0,3]
x d

F
y

M[2,4] M[3,4] M[4,4]

M
Wavefront(M,F)

F
d y

M[3,2] M[4,3]
x d x d

M[0,2] M[0,1]

x
d

F
y

x d x d

F
y

M[4,2]
x d

x d

F
y

M[0,0] M[1,0] M[2,0] M[3,0] M[4,0] 37

Applications of Wavefront
Bioinformatics:
Compute the alignment of two large DNA strings in order to find similarities between species. Existing tools do not scale up to complete DNA strings.

Economics:
Simulate the interaction between two competing firms, each of which has an effect on resource consumption and market price. E.g. When will we run out of oil?

Applies to any kind of optimization problem solvable with dynamic programming.

Problem: Dispatch Latency

Even with an infinite number of CPUs, dispatch latency controls the total execution time: O(n) in the best case. However, job dispatch latency in an unloaded grid is about 30 seconds, which may outweigh the runtime of F. Things get worse when queues are long! Solution: Build a lightweight task dispatch system. (Idea from Falkon@UC)
39

worker worker worker worker worker worker

1000s of workers Dispatched to the cloud

queue tasks wavefront engine tasks done work queue

put F.exe put in.txt exec F.exe <in.txt >out.txt get out.txt

worker

In.txt

out.txt
40

Problem: Performance Variation

Tasks can be delayed for many reasons:
Heterogeneous hardware. Interference with disk/network. Policy based suspension.

Any delayed task in Wavefront has a cascading effect on the rest of the workload. Solution - Fast Abort: Keep statistics on task runtimes, and abort those that lie significantly outside the mean. Prefer to assign jobs to machines with a fast history.
41

500x500 Wavefront on ~200 CPUs

Wavefront on a 200-CPU Cluster

Wavefront on a 32-Core CPU

The Genome Assembly Problem

AGTCGATCGATCGATAATCGATCCTAGCTAGCTACGA Chemical Sequencing

AGTCGATCGATCGAT TCGATAATCGATCCTAGCTA AGCTAGCTACGA

Millions of reads 100s bytes long.

Computational Assembly

AGTCGATCGATCGAT TCGATAATCGATCCTAGCTA AGCTAGCTACGA

Sample Genomes
Reads
A. gambiae scaffold A. gambiae complete S. Bicolor simulated 101K

Data
80MB

Sequential Pairs Time 738K 12M 84M 12 hours 6 days 30 days

180K 1.4GB 7.9M 5.7GB

Some-Pairs Abstraction
SomePairs( set A, list (i,j), function F(x,y) ) returns list of F( A[i], A[j] )
A1 A1 An (1,2) (2,1) (2,3) (3,3) F A1

SomePairs(A,L,F)
A2 F F

F
47

Distributed Genome Assembly

A1 A1 An (1,2) (2,1) (2,3) (3,3) F

100s of workers dispatched to worker Notre Dame, worker worker Purdue, and worker worker Wisconsin worker

queue tasks somepairs master tasks done work queue

detail of a single worker:

put align.exe put in.txt exec F.exe <in.txt >out.txt get out.txt worker

in.txt

out.txt

Small Genome (101K reads)

Medium Genome (180K reads)

Large Genome (7.9M)

Whats the Upshot?

We can do full-scale assemblies as a routine matter on existing conventional machines. Our solution is faster (wall-clock time) than the next faster assembler run on 1024x BG/L. You could almost certainly do better with a dedicated cluster and a fast interconnect, but such systems are not universally available. Our solution opens up research in assembly to labs with NASCAR instead of Formula-One hardware.
52

What if your application doesnt fit a regular pattern?

Makeflow
part1 part2 part3: input.data split.py ./split.py input.data out1: part1 mysim.exe ./mysim.exe part1 >out1 out2: part2 mysim.exe ./mysim.exe part2 >out2 out3: part3 mysim.exe ./mysim.exe part3 >out3

result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

Makeflow Implementation
bfile: afile prog prog afile >bfile
worker worker worker worker worker worker

100s of workers dispatched to the cloud

queue tasks makeflow master tasks done work queue

detail of a single worker:

put prog put afile exec prog afile > bfile get bfile worker

Two optimizations: Cache inputs and output. Dispatch tasks to nodes with data.

afile

prog

bfile

Experience with Makeflow

Still in initial deployment, so no big results to show just yet. Easy to test and debug on a desktop machine or a multicore server. The workload says nothing about the distributed system. (This is good.) Graduate students in bioinformatics running codes at production speeds on hundreds of nodes in less than a week.
56

Abstractions as a Social Tool

Collaboration with outside groups is how we encounter the most interesting, challenging, and important problems, in computer science. However, often neither side understands which details are essential or non-essential:
Can you deal with files that have upper case letters? Oh, by the way, we have 10TB of input, is that ok? (A little bit of an exaggeration.)

An abstraction is an excellent chalkboard tool:

Accessible to anyone with a little bit of mathematics. Makes it easy to see what must be plugged in. Forces out essential details: data size, execution time.
57

Conclusion
Grids, clouds, and clusters provide enormous computing power, but are very challenging to use effectively. An abstraction provides a robust, scalable solution to a narrow category of problems; each requires different kinds of optimizations. Limiting expressive power, results in systems that are usable, predictable, and reliable. Is there a menu of abstractions that would satisfy many consumers of clouds?
58

Acknowledgments
Cooperative Computing Lab
http://www.cse.nd.edu/~ccl
Faculty:
Patrick Flynn Nitesh Chawla Kenneth Judd Scott Emrich

Grad Students
Chris Moretti Hoang Bui Li Yu Mike Olson Michael Albrecht

Undergrads
Mike Kelly Rory Carmichael Mark Pasquier Christopher Lyon Jared Bulosan

NSF Grants CCF-0621434, CNS-0643229

BECS-C29 Owner's Manual Version 2013-03 PDF
No ratings yet
BECS-C29 Owner's Manual Version 2013-03 PDF
198 pages
Computer Basics Worksheet
No ratings yet
Computer Basics Worksheet
8 pages
Principles of School Management
100% (5)
Principles of School Management
9 pages
Educational Measurement, Assessment and Evaluation
100% (13)
Educational Measurement, Assessment and Evaluation
53 pages
PDC ASS1 Reg No 21mdbcs116 Sec A
No ratings yet
PDC ASS1 Reg No 21mdbcs116 Sec A
7 pages
Definition: Higher Speed, or Solving Problems Faster
No ratings yet
Definition: Higher Speed, or Solving Problems Faster
4 pages
(HPC) Pratik
No ratings yet
(HPC) Pratik
8 pages
Scimakelatex 25942 A B C D
No ratings yet
Scimakelatex 25942 A B C D
4 pages
Comparing Redundancy and Multi-Processors: L and Qualal Grammar
No ratings yet
Comparing Redundancy and Multi-Processors: L and Qualal Grammar
7 pages
2013luv Supercomputers
No ratings yet
2013luv Supercomputers
12 pages
report
No ratings yet
report
9 pages
Pembahasan Soal Capter 1 Arsitektur Komputer
No ratings yet
Pembahasan Soal Capter 1 Arsitektur Komputer
5 pages
Consistent Hashing Considered Harmful: Anon and Mous
No ratings yet
Consistent Hashing Considered Harmful: Anon and Mous
6 pages
Parallel Computing An Introduction
No ratings yet
Parallel Computing An Introduction
40 pages
Nimble@Itcecnogrid Novel Toolkit For Computing Weather Forecasting, Pi and Factorization Intensive Problems
No ratings yet
Nimble@Itcecnogrid Novel Toolkit For Computing Weather Forecasting, Pi and Factorization Intensive Problems
20 pages
PDC DA 2 19BCE0660
No ratings yet
PDC DA 2 19BCE0660
29 pages
The Relationship Between Gigabit Switches and Replication: R. Vasudevan and R.Prasanna Venkatesh
No ratings yet
The Relationship Between Gigabit Switches and Replication: R. Vasudevan and R.Prasanna Venkatesh
3 pages
External Memory Algorithms Using A Coarse Grained Paradigm: Apport de Recherche
No ratings yet
External Memory Algorithms Using A Coarse Grained Paradigm: Apport de Recherche
21 pages
Scimakelatex 71594 James+Smith Margaret+St +james Michael+Walker Mark+Williams Susan+Brown
No ratings yet
Scimakelatex 71594 James+Smith Margaret+St +james Michael+Walker Mark+Williams Susan+Brown
5 pages
CSCE 313 Embedded Systems Design
No ratings yet
CSCE 313 Embedded Systems Design
8 pages
lv02 - 25 Nuklir
No ratings yet
lv02 - 25 Nuklir
7 pages
Architecting Massive Multiplayer Online Role-Playing Games Using Metamorphic Information
No ratings yet
Architecting Massive Multiplayer Online Role-Playing Games Using Metamorphic Information
7 pages
5 Hfihreu
No ratings yet
5 Hfihreu
6 pages
Information Retrieval Systems Considered Harmful
No ratings yet
Information Retrieval Systems Considered Harmful
5 pages
BacaanMinggu2 - Terminologi Komputasi Paralel
No ratings yet
BacaanMinggu2 - Terminologi Komputasi Paralel
38 pages
The Performance and Energy Consumption of Three Embedded Real-Time Operating Systems
No ratings yet
The Performance and Energy Consumption of Three Embedded Real-Time Operating Systems
8 pages
Scimakelatex 1174 John+Mcartney Nain+steley George+stasgh Georgia+lambda Paul+smith
No ratings yet
Scimakelatex 1174 John+Mcartney Nain+steley George+stasgh Georgia+lambda Paul+smith
4 pages
Scimakelatex 30439 Boe+Gus
No ratings yet
Scimakelatex 30439 Boe+Gus
6 pages
The Next Wave Vol. 20 No. 2
No ratings yet
The Next Wave Vol. 20 No. 2
46 pages
Detecting Bottlenecks in Parallel DAG-based Data Flow Programs
No ratings yet
Detecting Bottlenecks in Parallel DAG-based Data Flow Programs
10 pages
HPC Unit 1
100% (1)
HPC Unit 1
12 pages
Deconstructing Online Algorithms With ULEMA
No ratings yet
Deconstructing Online Algorithms With ULEMA
3 pages
Refinement of Symmetric Encryption
No ratings yet
Refinement of Symmetric Encryption
6 pages
Scheme Considered Harmful: Bstract
No ratings yet
Scheme Considered Harmful: Bstract
4 pages
Rpcs No Longer Considered Harmful: Qualal Grammar
No ratings yet
Rpcs No Longer Considered Harmful: Qualal Grammar
7 pages
Another Example of SCIgen Paper Generator
No ratings yet
Another Example of SCIgen Paper Generator
5 pages
Scimakelatex 18225 XXX
No ratings yet
Scimakelatex 18225 XXX
4 pages
Study of Information Retrieval Systems: I. C. Wiener
No ratings yet
Study of Information Retrieval Systems: I. C. Wiener
6 pages
A Methodology For The Construction of 802.11B: D. Person, F. Person and E. Person
No ratings yet
A Methodology For The Construction of 802.11B: D. Person, F. Person and E. Person
7 pages
Lesson 1 Overview and History: Operating Systems Study Notes
No ratings yet
Lesson 1 Overview and History: Operating Systems Study Notes
4 pages
VMaa Ai TCC
No ratings yet
VMaa Ai TCC
5 pages
Deployment of Neural Networks: John Haven Emerson
No ratings yet
Deployment of Neural Networks: John Haven Emerson
6 pages
Term Paper About Computer Architecture
100% (1)
Term Paper About Computer Architecture
6 pages
Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures
No ratings yet
Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures
10 pages
Celery Ipython Mpi4py PDF
No ratings yet
Celery Ipython Mpi4py PDF
8 pages
Scimakelatex 1740 John Doe Jane Doe
No ratings yet
Scimakelatex 1740 John Doe Jane Doe
3 pages
Decoupling Redundancy From Object-Oriented Languages in Smalltalk
No ratings yet
Decoupling Redundancy From Object-Oriented Languages in Smalltalk
6 pages
Object-Oriented Programming, C++ and Power System Simulation
No ratings yet
Object-Oriented Programming, C++ and Power System Simulation
10 pages
Sheet3 Cms
No ratings yet
Sheet3 Cms
4 pages
Cacheable, Linear-Time Symmetries
No ratings yet
Cacheable, Linear-Time Symmetries
6 pages
On The Deployment of Boolean Logic: Vaina
No ratings yet
On The Deployment of Boolean Logic: Vaina
4 pages
Decoupling Access Points From Moore's Law in Voice-over-IP: John Babtiste
No ratings yet
Decoupling Access Points From Moore's Law in Voice-over-IP: John Babtiste
4 pages
The World Wide Web Considered Harmful
No ratings yet
The World Wide Web Considered Harmful
4 pages
Butyl: Pseudorandom Models
No ratings yet
Butyl: Pseudorandom Models
7 pages
Pip: Certifiable, Replicated Symmetries
No ratings yet
Pip: Certifiable, Replicated Symmetries
5 pages
Towards The Emulation of Symmetric Encryption
No ratings yet
Towards The Emulation of Symmetric Encryption
7 pages
The Effect of Client-Server Information On Algorithms: Martuchis
No ratings yet
The Effect of Client-Server Information On Algorithms: Martuchis
5 pages
Comparing Ipv7 and The Memory Bus With Wirymust: Pepe Garz and Loquillo
No ratings yet
Comparing Ipv7 and The Memory Bus With Wirymust: Pepe Garz and Loquillo
5 pages
Scimakelatex 19148 XXX
No ratings yet
Scimakelatex 19148 XXX
4 pages
Red Black Trees Delores+Descare
No ratings yet
Red Black Trees Delores+Descare
6 pages
The Data Locality of Work Stealing: Theory of Computing Systems
No ratings yet
The Data Locality of Work Stealing: Theory of Computing Systems
27 pages
The Landscape of Parallel Computing Research: A View From Berkeley
No ratings yet
The Landscape of Parallel Computing Research: A View From Berkeley
56 pages
Extreme Programming Considered Harmful
No ratings yet
Extreme Programming Considered Harmful
7 pages
Multi-Accelerator Systems
From Everand
Multi-Accelerator Systems
Kai Turing
No ratings yet
Computer Networking Short Questions and Answers
67% (3)
Computer Networking Short Questions and Answers
81 pages
CS602 Short Questions Answer
100% (1)
CS602 Short Questions Answer
13 pages
Mcqs All Computer
No ratings yet
Mcqs All Computer
88 pages
Unit II and III - WT
No ratings yet
Unit II and III - WT
54 pages
MCQ C Plus Plus First Set
0% (1)
MCQ C Plus Plus First Set
13 pages
CS506 HandOuts
No ratings yet
CS506 HandOuts
491 pages
Software Engineering MCQ Exam
100% (2)
Software Engineering MCQ Exam
8 pages
Psychology: Growth & Development
No ratings yet
Psychology: Growth & Development
13 pages
Classroom Management and Organization: Jennelyn Castro
No ratings yet
Classroom Management and Organization: Jennelyn Castro
20 pages
Software Engineering MCQ Exam
100% (2)
Software Engineering MCQ Exam
8 pages
Bank Examination Question Papers - Probationary - Clerical - Specialist Officers
No ratings yet
Bank Examination Question Papers - Probationary - Clerical - Specialist Officers
10 pages
What Is Democratic Education
No ratings yet
What Is Democratic Education
5 pages
Software Testing Interview Questions: Explain The PDCA Cycle
No ratings yet
Software Testing Interview Questions: Explain The PDCA Cycle
5 pages
Definition and Purpose of School Discipline
67% (3)
Definition and Purpose of School Discipline
2 pages
Multiple Choice Questions C Language
100% (1)
Multiple Choice Questions C Language
7 pages
Mcqs All Computer
No ratings yet
Mcqs All Computer
88 pages
Pak Affairs Mcqs 2005 To 2011
No ratings yet
Pak Affairs Mcqs 2005 To 2011
19 pages
Mcqs All Computer
No ratings yet
Mcqs All Computer
88 pages
ASP Net Questions
No ratings yet
ASP Net Questions
12 pages
Selenium Interview Questions
No ratings yet
Selenium Interview Questions
20 pages
Getting Started With Projects Based On Dual-Core Stm32Wl Microcontrollers in Stm32Cubeide
No ratings yet
Getting Started With Projects Based On Dual-Core Stm32Wl Microcontrollers in Stm32Cubeide
25 pages
B g431b Esc1
No ratings yet
B g431b Esc1
6 pages
Getting Started With STM32L476
No ratings yet
Getting Started With STM32L476
13 pages
Inter: iRMX® System Debugger Reference Manual
No ratings yet
Inter: iRMX® System Debugger Reference Manual
148 pages
Matrix Solutions - : Ics Senegal
No ratings yet
Matrix Solutions - : Ics Senegal
48 pages
Uru 4500 en
No ratings yet
Uru 4500 en
2 pages
Step by Step Guide To Reset Epson L200 Ink Counter
No ratings yet
Step by Step Guide To Reset Epson L200 Ink Counter
7 pages
DIT Diploma in Information Technology PDF
No ratings yet
DIT Diploma in Information Technology PDF
2 pages
Data Structure Lab Final
No ratings yet
Data Structure Lab Final
11 pages
Liquid Office Install and Admin Guide
100% (1)
Liquid Office Install and Admin Guide
86 pages
A11 RF Hardware Trouble Shooting Guide 20131223 PDF
No ratings yet
A11 RF Hardware Trouble Shooting Guide 20131223 PDF
35 pages
Implementation of Lexical Analyser Using C
No ratings yet
Implementation of Lexical Analyser Using C
11 pages
A Presentation ON Software & Hardware Characteristics OF Computer System
No ratings yet
A Presentation ON Software & Hardware Characteristics OF Computer System
18 pages
Testing Advance Techniques
No ratings yet
Testing Advance Techniques
154 pages
Manual BabyWare English
No ratings yet
Manual BabyWare English
8 pages
Cloud Computing Bootcamp
No ratings yet
Cloud Computing Bootcamp
22 pages
These Interview Questions Test The Knowledge of x86 Intel Architecture and 8086 Microprocessor Specifically
No ratings yet
These Interview Questions Test The Knowledge of x86 Intel Architecture and 8086 Microprocessor Specifically
4 pages
SPG Live Handheld Gimbal For Smartphone Manual
No ratings yet
SPG Live Handheld Gimbal For Smartphone Manual
4 pages
File Security System Abstract
100% (1)
File Security System Abstract
3 pages
Magellan™ Control Panels - Datasheet
No ratings yet
Magellan™ Control Panels - Datasheet
8 pages
TSDK Implementation Guide
No ratings yet
TSDK Implementation Guide
16 pages
DL05 User Manual PDF
No ratings yet
DL05 User Manual PDF
478 pages
DBMS Notes by Dinudinesh
No ratings yet
DBMS Notes by Dinudinesh
18 pages
The CFES or The Computerized Faculty Evaluation System Will Give Way To Easy Collection and More Accurate Data Analysis of Faculty Evaluation in Lesser Time
No ratings yet
The CFES or The Computerized Faculty Evaluation System Will Give Way To Easy Collection and More Accurate Data Analysis of Faculty Evaluation in Lesser Time
6 pages
4 - UiPath Advance Certification UIARD Certification Latest - Udemy
No ratings yet
4 - UiPath Advance Certification UIARD Certification Latest - Udemy
46 pages
Course Syllabus: Week One: Programming Fundamentals
No ratings yet
Course Syllabus: Week One: Programming Fundamentals
14 pages
Moxa Managed Ethernet Switch Ui 2.0 FW 5.x Manual v1.6 PDF
No ratings yet
Moxa Managed Ethernet Switch Ui 2.0 FW 5.x Manual v1.6 PDF
122 pages
PEDpro Manual
No ratings yet
PEDpro Manual
16 pages

Science in The Clouds: History, Challenges, and Opportunities

Uploaded by

Science in The Clouds: History, Challenges, and Opportunities

Uploaded by

Science in the Clouds: History, Challenges, and Opportunities

Douglas Thain University of Notre Dame GeoClouds Workshop 17 September 2009

The Cooperative Computing Lab

Clouds in the Hype Cycle

Gartner Hype Cycle Report, 2009

What is cloud computing?

Example: Amazon EC2 Sep 2009

Large: 2 cores, 7.5GB RAM, 850GB disk

Extra Large: 4 cores, 15 GB, 1690GB disk

And the Simple Storage Service:

Is Cloud Computing New?

Clouds Trade CapEx for OpEx

What about grid computing?

The more modest realization:

The social context:

Clouds provide resource allocation:

Allocate 100 CPUs

Cloud Computing Layer Provides Resource Allocation

Create a Condor Pool with 100 Nodes

Allocate 100 Cores

Clouds Solve Some Grid Problems

Performance is reasonably predictable.

Fewer administrative headaches for the lone user.

But, Problems New and Old:

More Open Questions

Of course, not all science fits into the Map-Reduce model!

Example: Biometrics Research

Similarity Matrix Construction

0.1 0.0 1.0

0.0 0.1 0.0 1.0

0.0 0.1 0.1 0.0 1.0

0.1 0.0 0.3 0.0 0.1

Non-Expert User Using 500 CPUs

CPU CPU CPU CPU CPU F F F F F

F F F F F F F F F F F CPU CPU CPU CPU F F F F CPU F F F F F

Try 4: User gives up and attempts to solve an easier or smaller problem.

F F F F F F F F F F F CPU CPU CPU CPU F F F F CPU F F F F F

Abstractions for Distributed Computing

Working with Abstractions

Custom Workflow Engine

Compact Data Structure

How Does the Abstraction Help?

Choose the Right # of CPUs

Are there other abstractions?

M[2,4] M[3,4] M[4,4]

M[0,0] M[1,0] M[2,0] M[3,0] M[4,0] 37

Applies to any kind of optimization problem solvable with dynamic programming.

Problem: Dispatch Latency

worker worker worker worker worker worker

1000s of workers Dispatched to the cloud

queue tasks wavefront engine tasks done work queue

Problem: Performance Variation

500x500 Wavefront on ~200 CPUs

Wavefront on a 200-CPU Cluster

Wavefront on a 32-Core CPU

The Genome Assembly Problem

AGTCGATCGATCGAT TCGATAATCGATCCTAGCTA AGCTAGCTACGA

Millions of reads 100s bytes long.

AGTCGATCGATCGAT TCGATAATCGATCCTAGCTA AGCTAGCTACGA

Sequential Pairs Time 738K 12M 84M 12 hours 6 days 30 days

180K 1.4GB 7.9M 5.7GB

Distributed Genome Assembly

queue tasks somepairs master tasks done work queue

detail of a single worker:

Small Genome (101K reads)

Medium Genome (180K reads)

Large Genome (7.9M)

Whats the Upshot?

What if your application doesnt fit a regular pattern?

100s of workers dispatched to the cloud

queue tasks makeflow master tasks done work queue

detail of a single worker:

Experience with Makeflow

Abstractions as a Social Tool

An abstraction is an excellent chalkboard tool:

NSF Grants CCF-0621434, CNS-0643229