SlideShare a Scribd company logo
Microsoft
Large Databases
and
Grid Computing
Jim Gray
Microsoft Research
Gray@Microsoft.com
http://research.Microsoft.com/~gray
Presentation to
Kaiser Information Management Briefing
21 May 2003
About me
• in Microsoft research (located in San Francisco)
• A database researcher
– IBM, Tandem, DEC, Microsoft
• Work on Scalable Systems
– Building supercomputers
from commodity components.
• Do academic/government things too
– PITAC, GriPhyn TAB, NSF/CISE,
Library of Congress, …
• For the last 4 years,
been working with the astronomy community
to build the World Wide Telescope.
Agenda
• TerraServer
– What it is
– What we learned
– What we are doing now.
• SkyServer / WWT
– What it is
– What we learned
– What we are doing now
• Grid Computing
– General comments
– Build a web service
TerraServer
TerraService.net
• A photo of the United States
– 1 meter resolution (photographic/topographic)
– USGS data
– Some demographic data (BestPlaces.net)
– Home sales data
– Linked to Encarta Encyclopedia
• 15 TB raw, 6 TB cooked (grows 10GB/w)
• Point, Pan, zoom interface
• Among top 1,000 websites
– 40k visitors/day
– 4M queries/day
– 3 B page views (in 5 years)
• All in an SQL database
TerraServer Statistics
June ‘98 Jan ‘99 Jan ‘00 May ‘00 Sept ’01 Dec ‘02
SQL 7.0
1.0 TB Db
SQL 2000
1.0 TB Db
SQL 2000
1.2 TB Db
SQL 2000
1.4 TB Db
SQL 2000
2.0 TB Db
SQL 2000
2.0 TB Db
SQL 2000
2.0 TB Db
1 Server / Win NT 4.0 EE 2nd Server / Win 2k DataCenter 4 Node / Win2k Datacenter
Failover Cluster
SQL 7.0
1.0 TB Db
217 m Rows
SQL 7.0
1 Server
1.5 TB Db
SQL 2000
1 Server
.8 TB Db
298 m Rows
SQL 7.0
.75 TB Db
173 m Rows
755m
Rows
SQL 2000
.8 TB Db
231 m Rows
900 m Rows
Unique Users
Page Views
Image Tiles
Db Queries
Bytes Xfered
Daily
Average
40,011
1,266,838
3,735,789
4,484,089
70 gb
Peak
Day
277,292
12,388,104
10,475,674
163 gb
2,401,209
June 1998 -
Oct, 2002
63,656,904
2,015,539,605
5,943,641,024
7,134,186,170
108tb
TerraServer Cluster
SQLInst1
SQLInst2
F G
L
K
P Q
E E
J J
O O
I
H
M N
R S
One SQL database per rack
Each rack contains 4.5 TB
1 rack not in picture
18.0 TB total
Meta Data
Stored on 101 GB
“Fast, Small Disks”
(18 x 18.2 GB)
Imagery Data
Stored on 4 339 GB
“Slow, Big Disks”
(15 x 73.8 GB)
Added 90 72.8 GB
Disks in Feb 2001
to create 18 TB SAN
8 Compaq DL360 “Photon” Web Servers
Fiber SAN
Switches
4 Compaq ProLiant 8500 Db Servers
Cluster Configuration
1
Compaq
SAN
switch
by Brocade
Communications
Compaq
StorageWorks
MA8000/HSG80
Controllers (3) 2
3
Compaq
ProLiant 8500
(4)
Internet
Internet
Microsoft
Corporat
e LAN
Extreme
Networks
Summit 48
Switch
Summit 7i
Switch (2)
Cisco 12000
Internet Router
Compaq DL360 (6)
(Windows 2000 Web Servers)
TerraServer.microsoft.com
Compaq DL360 (10)
Database
Cluster
ADIC
LTO
Tape
Library
TerraServer SAN
TerraServer Becomes a Web Service
TerraServer.net -> TerraService.Net
• Web server is for people.
• Web Service is for programs
– The end of screen scraping
– No faking a URL:
pass real parameters.
– No parsing the answer:
data formatted into your
address space.
• Hundreds of users but a
specific example:
– US Department of Agriculture
And now.. 4 slides from the “customer”
who built a portal using TerraService
Data Gateway Functional
Overview
Navigation
Service
Catalog
Service
Ship
Service
<<Requests Products>>
Item
Broker
Customer Orders
Data
XML
Order
Placer
Listen for OrderPlacer Raised
Event
Select sequenced Item
Output XML
rasie event : stats.delivery start
validate (dtd)
Insert into SQL
@@Identity / GUID to client
return est time
raise OrderMgr.event
Order
Database
Selects from
XML Request for data
Logger
Called by anyone
rasies to stats svc'
ASP
XML
XML
Soil Data
Viewer
3
9
.
3
2
7
.
5
2
7
.
3
2
1
.
7
1
5
.
9
8
.
9
1
2
.
0
1
1
.
5
1
1
.
3
6
.
9
5
.
3
4
.
8
4
.
6
2
.
9
1
.
6
0
.
9
9
1
0
B
1
0
1
2
3
3
1
4 1
8
2
9
5
A
2
4
2
6
2
1
2
2
2
7
6
A
2
5
1
7
2
0
1
1
2
8
1
9
1
6
3
1
9
C
9
A
1
3
1
3
A
3
2
3
0
3
1
A
2
2
A
2
8
A
1
6
A
3
0
A
2
5
A
L
a
n
d
u
n
i
t
s
F
i
e
l
d
s
W
i
t
h
i
n
B
u
f
f
e
r
B
u
f
f
e
r
A
r
e
a
W
i
t
h
i
n
F
i
e
l
d
s
5
A
6
A
1
0
B
1
8
2
0
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
0
A
3
1
3
1
A
3
2
P
i
p
e
l
i
n
e
s
9
7
2
0
0
0 0 2
0
0
0 4
0
0
0
F
e
e
t
N
B
u
f
f
e
r
A
r
e
a
W
i
t
h
i
n
F
i
e
l
d
s
U
S
D
A
1
:
1
5
8
4
0
N
R
C
S
Geospatial
Data
Acknowledges item ready for delivery
Data
Services
Package
Service
Send order info
FTP
Services
Rimage
CD
Service
Product Catalog Updates
Billing
Services
NCGC - Fort Worth, Texas
ITC - Fort Collins, Colorado
Terra
Service
Custom End Product
Web Soil Data Viewer
XML Soil Report
Soil Interpretation Map
ESRI
Spatial Data Engine
WebSDV
ArcIMS Connector
Connects to
ArcIMS;
communication is
done through
ArcIMS XML (AXL)
Retrieves and
processes Soils
Data from the
NASIS relational
Database
Image Retriever
IMSNavigator
Generates maps
(JPGs) using
ArcIMS
Retrieves
imagery from
the Microsoft
TerraServer
Terraserver
Geospatial
Data
Business
Rules
National Soils
Data
Database Server - Microsoft SQL Server
Database Server - ESRI Spatial Data Server
Web Server - COM+ Applications
Microsoft Terraserver
Brief tour of TerraService
• Show map service
• Show some methods
• See
TerraService.NET:
An Introduction to Web Services
Tom Barclay; Jim Gray; Eric Strand; Steve Ekblad; Jeffrey Richter,
MSR TR 2002-53, pp 13, June 2002
What We Learned
• You can build and manage a very popular website
with relatively little effort
(if you do it right and have Tom Barclay)
• Loading 20 TB takes a lot of energy
• And you get to do it many times -- automate
• Tape and tape software are problematic
• Triplex and snap-shot disks works
(we have never had to use it, but..)
• The internet gives you 2-9’s
Servers can run at 4 9’s easily, 5 9’s with effort.
What we are doing now.
• Building with 3K$ 2TB bricks
• 4 bricks = 1 backend
• Triplexing systems
• Duplexing sites.
• 4*3*2 = 24k$ for Geoplex
• Very simple operations model
• See:
• “TeraScale SneakerNet:
Using Inexpensive Disks for Backup, Archiving, and Data Exchange,”
Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan Vandenberg, pp. 1-8, May 2002
Agenda
• TerraServer
– What it is
– What we learned
– What we are doing now.
• SkyServer / WWT
– What it is
– What we learned
– What we are doing now
• Grid Computing
– General comments
– Build a web service
SkyServer
SkyServer.SDSS.org
• Like the TerraServer,
but looking the other way:
a picture of ¼ of the
universe
• Pixels +
Data Mining
• Astronomers get about 400
attributes for each “object”
• Get Spectrograms
for 1% of the objects
Why Astronomy Data?
•It has no commercial value
–No privacy concerns
–Can freely share results with others
–Great for experimenting with algorithms
•It is real and well documented
–High-dimensional data (with confidence intervals)
–Spatial data
–Temporal data
•Many different instruments from
many different places and
many different times
•Federation is a goal
•The questions are interesting
–How did the universe form?
•There is a lot of it (petabytes)
IRAS 100m
ROSAT ~keV
DSS Optical
2MASS 2m
IRAS 25m
NVSS 20cm
WENSS 92cm
GB 6cm
Demo of SkyServer
• Shows standard web server
• Pixel/image data
• Point and click
• Explore one object
• Explore sets of objects (data mining)
Virtual Observatory
http://www.astro.caltech.edu/nvoconf/
http://www.voforum.org/
Premise: Most data is (or could be online)
So, the Internet is the world’s best telescope:
– It has data on every part of the sky
– In every measured spectral band: optical, x-ray, radio..
– As deep as the best instruments (2 years ago).
– It is up when you are up.
The “seeing” is always great
(no working at night, no clouds no moons no..).
– It’s a smart telescope:
links objects and data to literature on them.
Time and Spectral Dimensions
The Multiwavelength Crab Nebulae
X-ray,
optical,
infrared, and
radio
views of the nearby
Crab Nebula, which is
now in a state of
chaotic expansion after
a supernova explosion
first sighted in 1054
A.D. by Chinese
Astronomers.
Slide courtesy of Robert Brunner @ CalTech.
Crab star
1053 AD
Federation
Data Federations of Web Services
• Massive datasets live near their owners:
– Near the instrument’s software pipeline
– Near the applications
– Near data knowledge and curation
– Super Computer centers become Super Data Centers
• Each Archive publishes a web service
– Schema: documents the data
– Methods on objects (queries)
• Scientists get “personalized” extracts
• Uniform access to multiple Archives
– A common global schema
Grid and Web Services Synergy
• I believe the Grid will be many web services
share data (computrons are free)
• IETF standards Provide
– Naming
– Authorization / Security / Privacy
– Distributed Objects
Discovery, Definition, Invocation, Object Model
– Higher level services: workflow, transactions, DB,..
• Synergy: commercial Internet & Grid tools
Web Services: The Key?
• Web SERVER:
– Given a url + parameters
– Returns a web page (often dynamic)
• Web SERVICE:
– Given a XML document (soap msg)
– Returns an XML document
– Tools make this look like an RPC.
• F(x,y,z) returns (u, v, w)
– Distributed objects for the web.
– + naming, discovery, security,..
• Internet-scale
distributed computing
Your
program
Data
In your
address
space
Web
Service
Your
program Web
Server
SkyQuery: a prototype
• Defining Astronomy Objects and Methods.
• Federated 3 Web Services (fermilab/sdss, jhu/first, Cal Tech/dposs)
multi-survey cross-match
Distributed query optimization (T. Malik, T. Budavari, Alex Szalay @ JHU)
http://SkyQuery.net/
• My first web service (cutout + annotated SDSS images) online
– http://skyservice.pha.jhu.edu/devel/ImgCutout/chart.asp
• WWT is a great Web Services (.Net) application
– Federating heterogeneous data sources.
– Cooperating organizations
– An Information At Your Fingertips challenge.
Demo of Image Cutout Service
• Shows image cutout
• Show project and debugging project
• Show hello World
• Show “theAnswer” method
SkyQuery (http://skyquery.net/)
• Distributed Query tool using a set of services
• Feasibility study, built in 6 weeks from scratch
– Tanu Malik (JHU CS grad student)
– Tamas Budavari (JHU astro postdoc)
• Implemented in C# and .NET
• Allows queries like:
SELECT o.objId, o.r, o.type, t.objId
FROM SDSS:PhotoPrimary o,
TWOMASS:PhotoPrimary t
WHERE XMATCH(o,t)<3.5
AND AREA(181.3,-0.76,6.5)
AND o.type=3 and (o.I - t.m_j)>2
SkyNode Basic Web Services
• Metadata information about resources
– Waveband
– Sky coverage
– Translation of names to universal dictionary (UCD)
• Simple search patterns on the resources
– Cone Search
– Image mosaic
– Unit conversions
• Simple filtering, counting, histogramming
• On-the-fly recalibrations
Portals: Higher Level Services
• Built on Atomic Services
• Perform more complex tasks
• Examples
– Automated resource discovery
– Cross-identifications
– Photometric redshifts
– Outlier detections
– Visualization facilities
• Goal:
– Build custom portals in days from existing building blocks
(like today in IRAF or IDL)
Architecture Image cutout
SkyNode
SDSS
SkyNode
2Mass
SkyNode
First
SkyQuery
Web Page
Summary So Far
• Some real web services deployed today
• Easy to build & deploy
• Services publish data, Portals unify it
• Tools really work!
• I’m using C# and foundation classes of
VisualStudio, a great! Tool
• A nice book explaining the ideas:
(.Net Framework Essentials, Thai, Lam isbn 0-596-00302-1)
Possible Relevance to You
• This web service stuff is REAL
• If you have a class,
It is a way to publish data:
Internet
Intranet
• It is a way to find data
data comes with schema
no more screen scraping/parsing
• Business model unclear
– Your ideas go here.
Your
program
Data
In your
address
space
Web
Service
What We Learned
• Web services really are a breakthrough.
• Data mining worked beautifully. See
Data Mining the SDSS SkyServer Database,”
J. Gray, D. Slutz, A. Szalay, A. Thakar, P. Kuntz, C. Stoughton, MSR TR 2002-1, pp1-40,
2002.
• You can operate a system in Chicago
from San Francisco –
Terminal Server is wonderful.
• The Internet gives you 2 9’s of availability
• TeraScale SneakerNet works well
What we are doing now.
• Loading more data (next data release)
• Preparing for the next generation
• Building the WWT
• Web Services for the Virtual Observatory,
Alexander S. Szalay, Tamás Budavária, Tanu Malika, Jim Gray, and Ani Thakar,
SPIE Astronomy Telescopes and Instruments, 22-28 August 2002, Waikoloa,
Hawaii,
• Petabyte Scale Data Mining: Dream or Reality?,
Alexander S. Szalay; Jim Gray; Jan vandenBerg, SIPE Astronomy Telescopes
and Instruments, 22-28 August 2002, Waikoloa, Hawaii,
• Online Scientific Data Curation, Publication, and Archiving
Jim Gray; Alexander S. Szalay; Ani R. Thakar; Christopher Stoughton; Jan
vandenBerg, SPIE Astronomy Telescopes and Instruments, 22-28 August 2002,
Waikoloa, Hawaii,
Agenda
• TerraServer
– What it is
– What we learned
– What we are doing now.
• SkyServer / WWT
– What it is
– What we learned
– What we are doing now
• Grid Computing
– General comments
– Build a web service
The Grid
• Computation Grid: harvest Internet cpus.
• Data Grid: Share files
• Application Grid: Web services
• Access Grid: teleconferencing
The Microsoft View
• Web Services will subsume the Grid
–The Grid will be data and services
not renting cycles
• OGSA: evolution of Globus Toolkit to Web
services concepts and technologies…
• Lots of encouragement from
Microsoft, IBM, Oracle, Sun
• GGF as forum for discussion
Engagement with Grid Community
• Goal: GXA as infrastructure for Grids
• Working with Globus & GGF
– Funding work at Argonne National Lab (Globus)
– Globus Toolkit 3, and CondorG on Windows
• http://www.globus.org/win-alpha/ (we sponsored this)
– OGSA for .NET (prototyping)
• http://www.globus.org/ogsa/
– Also OGSI.NET at U. VA is very interesting
• http://www.cs.virginia.edu/~gsw2c/ogsi.net.html
– GGF
• Active membershp
• HPC .net kit – see http://www.microsoft.com/HPC
– Part of .net server scale out development
– Includes MPI-CH 1.2.4, distributed job scheduler,…
– Thomas Sterling, Beowulf on Windows, MIT Press 2001
What’s Microsoft Doing
• Mostly .NET, W3C standards, web services, …
• I think SkyQuery is the best web service (grid
app) in GriPhyN today.
• My stuff is grid computing
• But…
• Globus (GT3), OGSA, and CondorG ported to
Windows (we sponsored it)
• We have a HPC toolkit: MPI-CH 1.2.4
• See
http://www.microsoft.com/windows2000/hpc/ for
many useful links
I Can Talk About Computing on
Demand But… Best to read
• Distributed Computing Economics,
Jim Gray, MSR-TR-2003-24, March 2003
• The slides that follow are based on that
paper.
Distributed Computing Economics
• Why is Seti@Home a great idea
• Why is Napster a great deal?
• Why is the Computational Grid uneconomic
• When does computing on demand work?
• What is the “right” level of abstraction
• Is the Access Grid the real killer app?
Computing is Free
• Computers cost 1k$ (if you shop right)
• So 1 cpu day == 1$
• If you pay the phone bill (and I do)
Internet bandwidth costs 50 … 500$/mbps/m
(not including routers and management).
• So 1GB costs 1$ to send and 1$ to receive
Why is Seti@Home a Good Deal?
• Send 300 KB for costs 3e-4$
• User computes for ½ day: benefit .5e-1$
• ROI: 1500:1
Why is Napster a Good Deal?
• Send 5 MB costs 5e-3$
• ½ a penny per song
• Both sender and receiver can afford it.
• Same logic powers web sites (Yahoo!...):
– 1e-3$/page view advertising revenue
– 1e-5$/page view cost of serving web page
– 100:1 ROI
The Cost of Computing:
Computers are NOT free!
• Capital Cost of a TpcC
system is mostly
storage and
storage software (database)
• IBM 32 cpu, 512 GB ram
2,500 disks, 43 TB
(680,613 tpmC @ 11.13 $/tpmc available 11/08/03)
http://www.tpc.org/results/individual_results/IBM/IBMp690es_05092003.pdf
• A 7.5M$ super-computer
• Total Data Center Cost:
40% capital &facilities
60% staff
(includes app development)
TpcC Cost Components DB2/AIX
http://www.tpc.org/results/individual_results/IBM /IBM p690es_05092003.pdf
cpu/mem
29%
storage
61%
software
10%
Computing Equivalents
1 $ buys
• 1 day of cpu time
• 4 GB ram for a day
• 1 GB of network bandwidth
• 1 GB of disk storage
• 10 M database accesses
• 10 TB of disk access (sequential)
• 10 TB of LAN bandwidth (bulk)
Some consequences
• Beowulf networking is
10,000x cheaper than WAN networking
factors of 105 matter.
• The cheapest and fastest way to move a
Terabyte cross country is sneakernet.
24 hours = 4 MB/s
50$ shipping vs 1,000$ wan cost.
• Sending 10PB CERN data via network
is silly:
buy disk bricks in Geneva,
fill them,
ship them – one way.
TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange
Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBerg
Microsoft Technical Report may 2002, MSR-TR-2002-54
http://research.microsoft.com/research/pubs/view.aspx?tr_id=569
How Do You Move A Terabyte?
14 minutes
617
200
1,920,000
9600
OC 192
2.2 hours
1000
Gbps
1 day
100
100 Mpbs
14 hours
976
316
49,000
155
OC3
2 days
2,010
651
28,000
43
T3
2 months
2,469
800
1,200
1.5
T1
5 months
360
117
70
0.6
Home DSL
6 years
3,086
1,000
40
0.04
Home phone
Time/TB
$/TB
Sent
$/Mbps
Rent
$/month
Speed
Mbps
Context
Computational Grid Economics
• To the extent that computational grid is like
Seti@Home or ZetaNet or Folding@home
or… it is a great thing
• The extent that the computational grid is MPI
or data analysis, it fails on economic grounds:
move the programs to the data, not the data to
the programs.
• The Internet is NOT the cpu backplane.
• The USG should not hide this economic fact
from the academic/scientific research
community.
Computing on Demand
• Was called outsourcing / service bureaus in my
youth. CSC and IBM did it.
• Payroll is standard outsource.
• Now we have Hotmail, Salesforce.com,
Oracle.com,….
• Works for standard apps.
• Airlines outsource reservations.
Banks outsource ATMs.
• But Amazon, Amex, Wal-Mart, ...
Can’t outsource their core competence.
• So, COD works for commoditized services.
• It is not a new way of doing things: think payroll.
What’s the right abstraction level for
Internet Scale Distributed Computing?
• Disk block? No too low.
• File? No too low.
• Database? No too low.
• Application? Yes, of course.
– Blast search
– Google search
– Send/Get eMail
– Portals that federate astronomy archives
(http://skyQuery.Net/)
• Web Services (.NET, EJB, OGSA)
give this abstraction level.
Access Grid
• Q: What comes after the telephone?
• A: eMail?
• A: Instant messaging?
• Both seem retro technology: text & emotons.
• Access Grid
could revolutionize human communication.
• But, it needs a new idea.
• Q: What comes after the telephone?
Distributed Computing Economics
• Why is Seti@Home a great idea?
• Why is Napster a great deal?
• Why is the Computational Grid uneconomic
• When does computing on demand work?
• What is the “right” level of abstraction?
• Is the Access Grid the real killer app?
Based on: Distributed Computing Economics,
Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24
http://research.microsoft.com/research/pubs/view.aspx?tr_id=655
Agenda
• TerraServer
– What it is
– What we learned
– What we are doing now.
• SkyServer / WWT
– What it is
– What we learned
– What we are doing now
• Grid Computing
– General comments
– Build a web service

More Related Content

PPT
My GIS Timeline
jeffhobbs
 
PPT
Computing Outside The Box
Ian Foster
 
PPTX
GeoServer Feature Frenzy
Jody Garnett
 
PDF
Using GeoServer for spatio-temporal data management with examples for MetOc a...
GeoSolutions
 
PDF
State of GeoServer 2013 (FOSS4G)
Jody Garnett
 
PPTX
Virtual Science in the Cloud
thetfoot
 
KEY
NASA SensorWeb Enterprise Services
Pat Cappelaere
 
PDF
NCGIC The Geospatial Revolution
Peter Batty
 
My GIS Timeline
jeffhobbs
 
Computing Outside The Box
Ian Foster
 
GeoServer Feature Frenzy
Jody Garnett
 
Using GeoServer for spatio-temporal data management with examples for MetOc a...
GeoSolutions
 
State of GeoServer 2013 (FOSS4G)
Jody Garnett
 
Virtual Science in the Cloud
thetfoot
 
NASA SensorWeb Enterprise Services
Pat Cappelaere
 
NCGIC The Geospatial Revolution
Peter Batty
 

Similar to WebServices_Grid.ppt (20)

PDF
Building a Server with FOSS4G Software
Randal Hale
 
PDF
GIS in the Rockies Geospatial Revolution
Peter Batty
 
PDF
GeoServer Orientation
Jody Garnett
 
PDF
GeoServer Past Present Future 2009
Justin Deoliveira
 
PPTX
UnderstandingServersforcommunication.pptx
piyushkumawat25
 
PPTX
02-UnderstandingServers.pptx
Vipin Singhal
 
PDF
Oct 2011 JPT DataStorage
JANERB
 
PPT
Open Source Databases And Gis
Kudos S.A.S
 
PDF
Mastering Geoserver Colin Henderson Henderson Colin
alwayssn
 
PPTX
Comprehensive Overview of the Geoweb
N/A
 
PPT
grid mining
ARNOLD
 
PDF
Web services based workflows to deal with 3D data
Jose Enrique Ruiz
 
PDF
GeoServer Feature FRENZY
GeoSolutions
 
PPT
Satellite Image Data Service
Gail Millin-Chalabi
 
PDF
MarineMap Technology
Colin Ebert
 
PDF
Big Data @ Bodensee Barcamp 2010
c1sc0
 
PDF
Impact of user concurrency in commonly used OGC map server implementations
Paula Díaz
 
PPTX
Making the Most of Raster Analysis with Living Atlas Data - Esri UC 2018
Aileen Buckley
 
PDF
Demystifying the Distributed Database Landscape (DevOps) (1).pdf
ScyllaDB
 
PPT
Uniting traditional GIS and mainstream IT
gssg
 
Building a Server with FOSS4G Software
Randal Hale
 
GIS in the Rockies Geospatial Revolution
Peter Batty
 
GeoServer Orientation
Jody Garnett
 
GeoServer Past Present Future 2009
Justin Deoliveira
 
UnderstandingServersforcommunication.pptx
piyushkumawat25
 
02-UnderstandingServers.pptx
Vipin Singhal
 
Oct 2011 JPT DataStorage
JANERB
 
Open Source Databases And Gis
Kudos S.A.S
 
Mastering Geoserver Colin Henderson Henderson Colin
alwayssn
 
Comprehensive Overview of the Geoweb
N/A
 
grid mining
ARNOLD
 
Web services based workflows to deal with 3D data
Jose Enrique Ruiz
 
GeoServer Feature FRENZY
GeoSolutions
 
Satellite Image Data Service
Gail Millin-Chalabi
 
MarineMap Technology
Colin Ebert
 
Big Data @ Bodensee Barcamp 2010
c1sc0
 
Impact of user concurrency in commonly used OGC map server implementations
Paula Díaz
 
Making the Most of Raster Analysis with Living Atlas Data - Esri UC 2018
Aileen Buckley
 
Demystifying the Distributed Database Landscape (DevOps) (1).pdf
ScyllaDB
 
Uniting traditional GIS and mainstream IT
gssg
 
Ad

Recently uploaded (20)

PPTX
Skill Development Program For Physiotherapy Students by SRY.pptx
Prof.Dr.Y.SHANTHOSHRAJA MPT Orthopedic., MSc Microbiology
 
PDF
High Ground Student Revision Booklet Preview
jpinnuck
 
PDF
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
DOCX
UPPER GASTRO INTESTINAL DISORDER.docx
BANDITA PATRA
 
PPTX
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
RAKESH SAJJAN
 
PPTX
NOI Hackathon - Summer Edition - GreenThumber.pptx
MartinaBurlando1
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PDF
The Picture of Dorian Gray summary and depiction
opaliyahemel
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
Understanding operators in c language.pptx
auteharshil95
 
PDF
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
Sandeep Swamy
 
PPTX
PREVENTIVE PEDIATRIC. pptx
AneetaSharma15
 
Skill Development Program For Physiotherapy Students by SRY.pptx
Prof.Dr.Y.SHANTHOSHRAJA MPT Orthopedic., MSc Microbiology
 
High Ground Student Revision Booklet Preview
jpinnuck
 
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
UPPER GASTRO INTESTINAL DISORDER.docx
BANDITA PATRA
 
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
RAKESH SAJJAN
 
NOI Hackathon - Summer Edition - GreenThumber.pptx
MartinaBurlando1
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
The Picture of Dorian Gray summary and depiction
opaliyahemel
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
Understanding operators in c language.pptx
auteharshil95
 
3.The-Rise-of-the-Marathas.pdfppt/pdf/8th class social science Exploring Soci...
Sandeep Swamy
 
PREVENTIVE PEDIATRIC. pptx
AneetaSharma15
 
Ad

WebServices_Grid.ppt

  • 1. Microsoft Large Databases and Grid Computing Jim Gray Microsoft Research [email protected] http://research.Microsoft.com/~gray Presentation to Kaiser Information Management Briefing 21 May 2003
  • 2. About me • in Microsoft research (located in San Francisco) • A database researcher – IBM, Tandem, DEC, Microsoft • Work on Scalable Systems – Building supercomputers from commodity components. • Do academic/government things too – PITAC, GriPhyn TAB, NSF/CISE, Library of Congress, … • For the last 4 years, been working with the astronomy community to build the World Wide Telescope.
  • 3. Agenda • TerraServer – What it is – What we learned – What we are doing now. • SkyServer / WWT – What it is – What we learned – What we are doing now • Grid Computing – General comments – Build a web service
  • 4. TerraServer TerraService.net • A photo of the United States – 1 meter resolution (photographic/topographic) – USGS data – Some demographic data (BestPlaces.net) – Home sales data – Linked to Encarta Encyclopedia • 15 TB raw, 6 TB cooked (grows 10GB/w) • Point, Pan, zoom interface • Among top 1,000 websites – 40k visitors/day – 4M queries/day – 3 B page views (in 5 years) • All in an SQL database
  • 5. TerraServer Statistics June ‘98 Jan ‘99 Jan ‘00 May ‘00 Sept ’01 Dec ‘02 SQL 7.0 1.0 TB Db SQL 2000 1.0 TB Db SQL 2000 1.2 TB Db SQL 2000 1.4 TB Db SQL 2000 2.0 TB Db SQL 2000 2.0 TB Db SQL 2000 2.0 TB Db 1 Server / Win NT 4.0 EE 2nd Server / Win 2k DataCenter 4 Node / Win2k Datacenter Failover Cluster SQL 7.0 1.0 TB Db 217 m Rows SQL 7.0 1 Server 1.5 TB Db SQL 2000 1 Server .8 TB Db 298 m Rows SQL 7.0 .75 TB Db 173 m Rows 755m Rows SQL 2000 .8 TB Db 231 m Rows 900 m Rows Unique Users Page Views Image Tiles Db Queries Bytes Xfered Daily Average 40,011 1,266,838 3,735,789 4,484,089 70 gb Peak Day 277,292 12,388,104 10,475,674 163 gb 2,401,209 June 1998 - Oct, 2002 63,656,904 2,015,539,605 5,943,641,024 7,134,186,170 108tb
  • 6. TerraServer Cluster SQLInst1 SQLInst2 F G L K P Q E E J J O O I H M N R S One SQL database per rack Each rack contains 4.5 TB 1 rack not in picture 18.0 TB total Meta Data Stored on 101 GB “Fast, Small Disks” (18 x 18.2 GB) Imagery Data Stored on 4 339 GB “Slow, Big Disks” (15 x 73.8 GB) Added 90 72.8 GB Disks in Feb 2001 to create 18 TB SAN 8 Compaq DL360 “Photon” Web Servers Fiber SAN Switches 4 Compaq ProLiant 8500 Db Servers
  • 7. Cluster Configuration 1 Compaq SAN switch by Brocade Communications Compaq StorageWorks MA8000/HSG80 Controllers (3) 2 3 Compaq ProLiant 8500 (4) Internet Internet Microsoft Corporat e LAN Extreme Networks Summit 48 Switch Summit 7i Switch (2) Cisco 12000 Internet Router Compaq DL360 (6) (Windows 2000 Web Servers) TerraServer.microsoft.com Compaq DL360 (10) Database Cluster ADIC LTO Tape Library TerraServer SAN
  • 8. TerraServer Becomes a Web Service TerraServer.net -> TerraService.Net • Web server is for people. • Web Service is for programs – The end of screen scraping – No faking a URL: pass real parameters. – No parsing the answer: data formatted into your address space. • Hundreds of users but a specific example: – US Department of Agriculture
  • 9. And now.. 4 slides from the “customer” who built a portal using TerraService
  • 10. Data Gateway Functional Overview Navigation Service Catalog Service Ship Service <<Requests Products>> Item Broker Customer Orders Data XML Order Placer Listen for OrderPlacer Raised Event Select sequenced Item Output XML rasie event : stats.delivery start validate (dtd) Insert into SQL @@Identity / GUID to client return est time raise OrderMgr.event Order Database Selects from XML Request for data Logger Called by anyone rasies to stats svc' ASP XML XML Soil Data Viewer 3 9 . 3 2 7 . 5 2 7 . 3 2 1 . 7 1 5 . 9 8 . 9 1 2 . 0 1 1 . 5 1 1 . 3 6 . 9 5 . 3 4 . 8 4 . 6 2 . 9 1 . 6 0 . 9 9 1 0 B 1 0 1 2 3 3 1 4 1 8 2 9 5 A 2 4 2 6 2 1 2 2 2 7 6 A 2 5 1 7 2 0 1 1 2 8 1 9 1 6 3 1 9 C 9 A 1 3 1 3 A 3 2 3 0 3 1 A 2 2 A 2 8 A 1 6 A 3 0 A 2 5 A L a n d u n i t s F i e l d s W i t h i n B u f f e r B u f f e r A r e a W i t h i n F i e l d s 5 A 6 A 1 0 B 1 8 2 0 2 4 2 5 2 6 2 7 2 8 2 9 3 0 3 0 A 3 1 3 1 A 3 2 P i p e l i n e s 9 7 2 0 0 0 0 2 0 0 0 4 0 0 0 F e e t N B u f f e r A r e a W i t h i n F i e l d s U S D A 1 : 1 5 8 4 0 N R C S Geospatial Data Acknowledges item ready for delivery Data Services Package Service Send order info FTP Services Rimage CD Service Product Catalog Updates Billing Services NCGC - Fort Worth, Texas ITC - Fort Collins, Colorado Terra Service
  • 11. Custom End Product Web Soil Data Viewer XML Soil Report Soil Interpretation Map
  • 12. ESRI Spatial Data Engine WebSDV ArcIMS Connector Connects to ArcIMS; communication is done through ArcIMS XML (AXL) Retrieves and processes Soils Data from the NASIS relational Database Image Retriever IMSNavigator Generates maps (JPGs) using ArcIMS Retrieves imagery from the Microsoft TerraServer Terraserver Geospatial Data Business Rules National Soils Data Database Server - Microsoft SQL Server Database Server - ESRI Spatial Data Server Web Server - COM+ Applications Microsoft Terraserver
  • 13. Brief tour of TerraService • Show map service • Show some methods • See TerraService.NET: An Introduction to Web Services Tom Barclay; Jim Gray; Eric Strand; Steve Ekblad; Jeffrey Richter, MSR TR 2002-53, pp 13, June 2002
  • 14. What We Learned • You can build and manage a very popular website with relatively little effort (if you do it right and have Tom Barclay) • Loading 20 TB takes a lot of energy • And you get to do it many times -- automate • Tape and tape software are problematic • Triplex and snap-shot disks works (we have never had to use it, but..) • The internet gives you 2-9’s Servers can run at 4 9’s easily, 5 9’s with effort.
  • 15. What we are doing now. • Building with 3K$ 2TB bricks • 4 bricks = 1 backend • Triplexing systems • Duplexing sites. • 4*3*2 = 24k$ for Geoplex • Very simple operations model • See: • “TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange,” Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan Vandenberg, pp. 1-8, May 2002
  • 16. Agenda • TerraServer – What it is – What we learned – What we are doing now. • SkyServer / WWT – What it is – What we learned – What we are doing now • Grid Computing – General comments – Build a web service
  • 17. SkyServer SkyServer.SDSS.org • Like the TerraServer, but looking the other way: a picture of ¼ of the universe • Pixels + Data Mining • Astronomers get about 400 attributes for each “object” • Get Spectrograms for 1% of the objects
  • 18. Why Astronomy Data? •It has no commercial value –No privacy concerns –Can freely share results with others –Great for experimenting with algorithms •It is real and well documented –High-dimensional data (with confidence intervals) –Spatial data –Temporal data •Many different instruments from many different places and many different times •Federation is a goal •The questions are interesting –How did the universe form? •There is a lot of it (petabytes) IRAS 100m ROSAT ~keV DSS Optical 2MASS 2m IRAS 25m NVSS 20cm WENSS 92cm GB 6cm
  • 19. Demo of SkyServer • Shows standard web server • Pixel/image data • Point and click • Explore one object • Explore sets of objects (data mining)
  • 20. Virtual Observatory http://www.astro.caltech.edu/nvoconf/ http://www.voforum.org/ Premise: Most data is (or could be online) So, the Internet is the world’s best telescope: – It has data on every part of the sky – In every measured spectral band: optical, x-ray, radio.. – As deep as the best instruments (2 years ago). – It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). – It’s a smart telescope: links objects and data to literature on them.
  • 21. Time and Spectral Dimensions The Multiwavelength Crab Nebulae X-ray, optical, infrared, and radio views of the nearby Crab Nebula, which is now in a state of chaotic expansion after a supernova explosion first sighted in 1054 A.D. by Chinese Astronomers. Slide courtesy of Robert Brunner @ CalTech. Crab star 1053 AD
  • 22. Federation Data Federations of Web Services • Massive datasets live near their owners: – Near the instrument’s software pipeline – Near the applications – Near data knowledge and curation – Super Computer centers become Super Data Centers • Each Archive publishes a web service – Schema: documents the data – Methods on objects (queries) • Scientists get “personalized” extracts • Uniform access to multiple Archives – A common global schema
  • 23. Grid and Web Services Synergy • I believe the Grid will be many web services share data (computrons are free) • IETF standards Provide – Naming – Authorization / Security / Privacy – Distributed Objects Discovery, Definition, Invocation, Object Model – Higher level services: workflow, transactions, DB,.. • Synergy: commercial Internet & Grid tools
  • 24. Web Services: The Key? • Web SERVER: – Given a url + parameters – Returns a web page (often dynamic) • Web SERVICE: – Given a XML document (soap msg) – Returns an XML document – Tools make this look like an RPC. • F(x,y,z) returns (u, v, w) – Distributed objects for the web. – + naming, discovery, security,.. • Internet-scale distributed computing Your program Data In your address space Web Service Your program Web Server
  • 25. SkyQuery: a prototype • Defining Astronomy Objects and Methods. • Federated 3 Web Services (fermilab/sdss, jhu/first, Cal Tech/dposs) multi-survey cross-match Distributed query optimization (T. Malik, T. Budavari, Alex Szalay @ JHU) http://SkyQuery.net/ • My first web service (cutout + annotated SDSS images) online – http://skyservice.pha.jhu.edu/devel/ImgCutout/chart.asp • WWT is a great Web Services (.Net) application – Federating heterogeneous data sources. – Cooperating organizations – An Information At Your Fingertips challenge.
  • 26. Demo of Image Cutout Service • Shows image cutout • Show project and debugging project • Show hello World • Show “theAnswer” method
  • 27. SkyQuery (http://skyquery.net/) • Distributed Query tool using a set of services • Feasibility study, built in 6 weeks from scratch – Tanu Malik (JHU CS grad student) – Tamas Budavari (JHU astro postdoc) • Implemented in C# and .NET • Allows queries like: SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o, TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2
  • 28. SkyNode Basic Web Services • Metadata information about resources – Waveband – Sky coverage – Translation of names to universal dictionary (UCD) • Simple search patterns on the resources – Cone Search – Image mosaic – Unit conversions • Simple filtering, counting, histogramming • On-the-fly recalibrations
  • 29. Portals: Higher Level Services • Built on Atomic Services • Perform more complex tasks • Examples – Automated resource discovery – Cross-identifications – Photometric redshifts – Outlier detections – Visualization facilities • Goal: – Build custom portals in days from existing building blocks (like today in IRAF or IDL)
  • 31. Summary So Far • Some real web services deployed today • Easy to build & deploy • Services publish data, Portals unify it • Tools really work! • I’m using C# and foundation classes of VisualStudio, a great! Tool • A nice book explaining the ideas: (.Net Framework Essentials, Thai, Lam isbn 0-596-00302-1)
  • 32. Possible Relevance to You • This web service stuff is REAL • If you have a class, It is a way to publish data: Internet Intranet • It is a way to find data data comes with schema no more screen scraping/parsing • Business model unclear – Your ideas go here. Your program Data In your address space Web Service
  • 33. What We Learned • Web services really are a breakthrough. • Data mining worked beautifully. See Data Mining the SDSS SkyServer Database,” J. Gray, D. Slutz, A. Szalay, A. Thakar, P. Kuntz, C. Stoughton, MSR TR 2002-1, pp1-40, 2002. • You can operate a system in Chicago from San Francisco – Terminal Server is wonderful. • The Internet gives you 2 9’s of availability • TeraScale SneakerNet works well
  • 34. What we are doing now. • Loading more data (next data release) • Preparing for the next generation • Building the WWT • Web Services for the Virtual Observatory, Alexander S. Szalay, Tamás Budavária, Tanu Malika, Jim Gray, and Ani Thakar, SPIE Astronomy Telescopes and Instruments, 22-28 August 2002, Waikoloa, Hawaii, • Petabyte Scale Data Mining: Dream or Reality?, Alexander S. Szalay; Jim Gray; Jan vandenBerg, SIPE Astronomy Telescopes and Instruments, 22-28 August 2002, Waikoloa, Hawaii, • Online Scientific Data Curation, Publication, and Archiving Jim Gray; Alexander S. Szalay; Ani R. Thakar; Christopher Stoughton; Jan vandenBerg, SPIE Astronomy Telescopes and Instruments, 22-28 August 2002, Waikoloa, Hawaii,
  • 35. Agenda • TerraServer – What it is – What we learned – What we are doing now. • SkyServer / WWT – What it is – What we learned – What we are doing now • Grid Computing – General comments – Build a web service
  • 36. The Grid • Computation Grid: harvest Internet cpus. • Data Grid: Share files • Application Grid: Web services • Access Grid: teleconferencing
  • 37. The Microsoft View • Web Services will subsume the Grid –The Grid will be data and services not renting cycles • OGSA: evolution of Globus Toolkit to Web services concepts and technologies… • Lots of encouragement from Microsoft, IBM, Oracle, Sun • GGF as forum for discussion
  • 38. Engagement with Grid Community • Goal: GXA as infrastructure for Grids • Working with Globus & GGF – Funding work at Argonne National Lab (Globus) – Globus Toolkit 3, and CondorG on Windows • http://www.globus.org/win-alpha/ (we sponsored this) – OGSA for .NET (prototyping) • http://www.globus.org/ogsa/ – Also OGSI.NET at U. VA is very interesting • http://www.cs.virginia.edu/~gsw2c/ogsi.net.html – GGF • Active membershp • HPC .net kit – see http://www.microsoft.com/HPC – Part of .net server scale out development – Includes MPI-CH 1.2.4, distributed job scheduler,… – Thomas Sterling, Beowulf on Windows, MIT Press 2001
  • 39. What’s Microsoft Doing • Mostly .NET, W3C standards, web services, … • I think SkyQuery is the best web service (grid app) in GriPhyN today. • My stuff is grid computing • But… • Globus (GT3), OGSA, and CondorG ported to Windows (we sponsored it) • We have a HPC toolkit: MPI-CH 1.2.4 • See http://www.microsoft.com/windows2000/hpc/ for many useful links
  • 40. I Can Talk About Computing on Demand But… Best to read • Distributed Computing Economics, Jim Gray, MSR-TR-2003-24, March 2003 • The slides that follow are based on that paper.
  • 41. Distributed Computing Economics • Why is Seti@Home a great idea • Why is Napster a great deal? • Why is the Computational Grid uneconomic • When does computing on demand work? • What is the “right” level of abstraction • Is the Access Grid the real killer app?
  • 42. Computing is Free • Computers cost 1k$ (if you shop right) • So 1 cpu day == 1$ • If you pay the phone bill (and I do) Internet bandwidth costs 50 … 500$/mbps/m (not including routers and management). • So 1GB costs 1$ to send and 1$ to receive
  • 43. Why is Seti@Home a Good Deal? • Send 300 KB for costs 3e-4$ • User computes for ½ day: benefit .5e-1$ • ROI: 1500:1
  • 44. Why is Napster a Good Deal? • Send 5 MB costs 5e-3$ • ½ a penny per song • Both sender and receiver can afford it. • Same logic powers web sites (Yahoo!...): – 1e-3$/page view advertising revenue – 1e-5$/page view cost of serving web page – 100:1 ROI
  • 45. The Cost of Computing: Computers are NOT free! • Capital Cost of a TpcC system is mostly storage and storage software (database) • IBM 32 cpu, 512 GB ram 2,500 disks, 43 TB (680,613 tpmC @ 11.13 $/tpmc available 11/08/03) http://www.tpc.org/results/individual_results/IBM/IBMp690es_05092003.pdf • A 7.5M$ super-computer • Total Data Center Cost: 40% capital &facilities 60% staff (includes app development) TpcC Cost Components DB2/AIX http://www.tpc.org/results/individual_results/IBM /IBM p690es_05092003.pdf cpu/mem 29% storage 61% software 10%
  • 46. Computing Equivalents 1 $ buys • 1 day of cpu time • 4 GB ram for a day • 1 GB of network bandwidth • 1 GB of disk storage • 10 M database accesses • 10 TB of disk access (sequential) • 10 TB of LAN bandwidth (bulk)
  • 47. Some consequences • Beowulf networking is 10,000x cheaper than WAN networking factors of 105 matter. • The cheapest and fastest way to move a Terabyte cross country is sneakernet. 24 hours = 4 MB/s 50$ shipping vs 1,000$ wan cost. • Sending 10PB CERN data via network is silly: buy disk bricks in Geneva, fill them, ship them – one way. TeraScale SneakerNet: Using Inexpensive Disks for Backup, Archiving, and Data Exchange Jim Gray; Wyman Chong; Tom Barclay; Alex Szalay; Jan vandenBerg Microsoft Technical Report may 2002, MSR-TR-2002-54 http://research.microsoft.com/research/pubs/view.aspx?tr_id=569
  • 48. How Do You Move A Terabyte? 14 minutes 617 200 1,920,000 9600 OC 192 2.2 hours 1000 Gbps 1 day 100 100 Mpbs 14 hours 976 316 49,000 155 OC3 2 days 2,010 651 28,000 43 T3 2 months 2,469 800 1,200 1.5 T1 5 months 360 117 70 0.6 Home DSL 6 years 3,086 1,000 40 0.04 Home phone Time/TB $/TB Sent $/Mbps Rent $/month Speed Mbps Context
  • 49. Computational Grid Economics • To the extent that computational grid is like Seti@Home or ZetaNet or Folding@home or… it is a great thing • The extent that the computational grid is MPI or data analysis, it fails on economic grounds: move the programs to the data, not the data to the programs. • The Internet is NOT the cpu backplane. • The USG should not hide this economic fact from the academic/scientific research community.
  • 50. Computing on Demand • Was called outsourcing / service bureaus in my youth. CSC and IBM did it. • Payroll is standard outsource. • Now we have Hotmail, Salesforce.com, Oracle.com,…. • Works for standard apps. • Airlines outsource reservations. Banks outsource ATMs. • But Amazon, Amex, Wal-Mart, ... Can’t outsource their core competence. • So, COD works for commoditized services. • It is not a new way of doing things: think payroll.
  • 51. What’s the right abstraction level for Internet Scale Distributed Computing? • Disk block? No too low. • File? No too low. • Database? No too low. • Application? Yes, of course. – Blast search – Google search – Send/Get eMail – Portals that federate astronomy archives (http://skyQuery.Net/) • Web Services (.NET, EJB, OGSA) give this abstraction level.
  • 52. Access Grid • Q: What comes after the telephone? • A: eMail? • A: Instant messaging? • Both seem retro technology: text & emotons. • Access Grid could revolutionize human communication. • But, it needs a new idea. • Q: What comes after the telephone?
  • 53. Distributed Computing Economics • Why is Seti@Home a great idea? • Why is Napster a great deal? • Why is the Computational Grid uneconomic • When does computing on demand work? • What is the “right” level of abstraction? • Is the Access Grid the real killer app? Based on: Distributed Computing Economics, Jim Gray, Microsoft Tech report, March 2003, MSR-TR-2003-24 http://research.microsoft.com/research/pubs/view.aspx?tr_id=655
  • 54. Agenda • TerraServer – What it is – What we learned – What we are doing now. • SkyServer / WWT – What it is – What we learned – What we are doing now • Grid Computing – General comments – Build a web service