SlideShare a Scribd company logo
when web services go bad steve loughran hp laboratories March 2002 notes from the field
global distribution global load expectations of “web speed” development http at the bottom integration with remote callers service level agreements covering QoS a new set of a problems! why is it so hard? web service development =
image storage and SVG rendering service XML-RPC POST SVG in payload render to JPEG return URL user fetch or print on press +1TB image store for customer photos
constraints SLA: high availability render times: 2s proof render 15s production basic asset store done: -SQL server IIS/ASP COM/MTS->Win2K/COM+  “real soon now” XML-RPC agreed on RPC mechanism wildly optimistic timescales
SVG into XML-RPC <?xml version=&quot;1.0&quot;?><methodCall version=&quot;1.0&quot;> <methodName>render_svg</methodName> <params> <param><value><string> 110 </string></value></param> <param><value> <string> PD94bWwgdmVyc2lvbj0iMS4wIiBzdGFuZGFsb25lPSJubyIgPz4NCjwhRE9DVFlQRSBzdmcgUFVC TElDICItLy9XM0MvL0RURCBTVkcgMjAwMDExMDIvL0VOIg0KICJodHRwOi8vd3d3LnczLm9yZy9U Ui8yMDAwL0NSLVNWRy0yMDAwMTEwMi9EVEQvc3ZnLTIwMDAxMTAyLmR0ZCI+DQo8c3ZnIHdpZHRo PSIxMDI0cHQiIGhlaWdodD0iMTAyNHB0Ij4NCiA8ZyBpZD0icGljdHVyZSIgPjxpbWFnZSB3aWR0 aD0iMTAyNHB0IiBoZWlnaHQ9Ijc2OHB0Ig0KICAgeGxpbms6aHJlZj0iaHR0cDovL3N0ZWx2aW86 ODA4MC9zdW5zZXQuanBnIi8+PC9nPg0KPHRleHQgeD0iMjlwdCIgeT0iODVwdCINCiBzdHlsZT0i Zm9udC1mYW1pbHk6SGVsdmV0aWNhO2ZvbnQtc2l6ZTozNnB0O2ZpbGw6I2QwZDBkMDsiDQo+a2Vp dGggaXMgKG1heWJlKSBzZXh5PC90ZXh0Pg0KPC9zdmc+DQoNCg0K</ string></value> </param> <param><value><int> 72 </int></value></param> </params></methodCall> <?xml version=&quot;1.0&quot; standalone=&quot;no&quot; ?> <!DOCTYPE svg PUBLIC &quot;-//W3C//DTD SVG 20001102//EN&quot; &quot;http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/svg-20001102.dtd&quot;> <svg width=&quot;1024pt&quot; height=&quot;768pt&quot;> <g id=&quot;picture&quot; ><image width=“1024pt&quot; height=“768pt&quot; xlink:href=&quot;http://stelvio:8080/sunset.jpg &quot;/></g> <text x=&quot;29pt&quot; y=&quot;85pt&quot; style=&quot;font-family:Helvetica;font-size:36pt;fill:#d0d0d0;“> Having an excellent holiday! </text> </svg>
Response <?xml version=&quot;1.0&quot;?> <methodResponse> <params> <param><value><string>2453</string></value></param> <param><value><string> http://stelvio:8080/hpsvgservlet/fetch?ref=_2_ec5132765b_4553.jpg-110 </string></value> </param> </params> </methodResponse>
what we ended up with ArrowPoint Apache Apache Bluestone x4 Bluestone x4 IIS/ASP IIS/ASP IIS/ASP PDC + DNS file store SQL server SQL server remote renderer store
what worked? Web Service Model! XML based RPC HTTP for transport SVG for request ANT build & deploy  JUnit for unit tests incremental development
(nearly) no more WORKSFORME all server-side problems could be replicated log all failures server side for easier post mortem SEH catching of all win32 renderer errors phase II: SEH errors => auto email to support alias just integration… networking, client code issues (proxies, wrong URL, authentication)
ANT + JUnit fully automated build process integrated JUnit testing automated deployment to local server stacks
security cookie authentication in common domain encrypt user, session in cookie restricted ports on beta site admin pages by IP addr sanitize incoming SVG XML: -catch file: references -downgrade http: access  -test XML &includes;
configuration ASP: config in source: brittle, scaling issues Java: per cluster config files in the WAR +kept under SCM; cleaner -delays to change, scaling  Use a database ? Use a directory service?
what didn’t work: turning a web site into a web service
what went wrong FedEx cabling raid controller unreliable switch router config forgotten passwords IP address issues accidental deletion of 8GB test data (!) COM+ authentication DNS JVM ‘lockup’  time differences between servers cluster race conditions boot race conditions MP thread safety resource leakage errors in JSP/ASP show up after deployment
operations paranoia R&D not allowed near production boxes response to any security issue is “no live” when things don’t work, then they call us  escalated minor security niggles into blocking issues effect #1: threat to weekends effect #2: threat to schedule effect #3: we stopped telling them of “issues”
error messages java.io.NoRoute ToHostException if you don’t have Java/C# engineers on the ops team, you get called in for every  message fix #1: “defect” tracking of operations issues, from early days of the app fix #2: always identify the errant system in fault codes
HTTP Content-length is all you get how do you propagate a transient failure E_TRANSIENT_FAILURE =>retry with exponential backoff -how you really test this? -what about integrity?
firefighting it takes 10 minutes to deploy it takes 30s to report a defect therefore, it must take 11 minutes to fix a bug and deploy the update … stay in control by never updating public servers more frequently than nightly
complexity is the enemy availability is the primary casualty of complexity availability  A(X)  ≝ P(X)  is working 0  ≤ A(X) ≤ 1 service S depends on services  s 1..n A(S) = A(node)*A(s 1  )* A(s 2  )…A(s n  ) redundancy: A(n 1 +n 2 )=A(n 1 )+A(n 2 )-A(n 1 )A(n 2 ) cost of redundancy depends on scalability of service: O(1), O(n), O(n 2 ), O(2 n ) …
a modest proposal: deployment-centric software processes
operations  use cases/ XP stories update live server add new fonts bind to new database change account/passwords backup/restore system partition cluster multi-home the server diagnose intermittent failure these need support in software and/or process
deployment and operations  test cases probe for needed exes, COM objects validate remote sites visible check configuration treat all deployment issues as defects to track and to have test cases run the test cases at install time, and from the load balancing router
operations  issues  are defects treat all deployment issues as defects to track don’t just fix it once, at it will crop up again.  you  need  regression tests you  need  a repository of defects for easy searching.  or they will phone you at 3am
deploy early  deploy often involve operations in system design give them regular builds to deploy during early development use a tool like CruiseControl for continual integration and local deployment provide 5x6 support at this stage maintain slow bug fix times
instrument with JMX (Java) WMI (.NET) a public  class ServiceManager   implements ServiceManagerMBean  { protected Service _owner; protected int _serviceTransactionCount; protected int _serviceUseCount; protected double _sellThroughDollars; public synchronized  void bookSales(int count, double dollars) { _sellThroughDollars+=dollars; _serviceUseCount+=count; _serviceTransactionCount++; } public Double  getSellThroughDollars() { return new Double(_sellThroughDollars); }
current work SmartFrog   Distributed deployment framework Configuration  is  Deployment declare desired state of machines Runtime handles it www.smartfrog.org
How to host big applications across distributed resources Automatically / Repeatably Dynamically Correctly Securely How to manage them from installation to removal How to make grid fabrics useful for classic server-side apps HPLabs research
SmartFrog Deployment Engine SmartFrog Node SmartFrog Components Description / Code Repositories RMI RMI Deploy Descriptions SmartFrog Daemon SmartFrog Node SmartFrog Components SmartFrog Daemon SmartFrog Node SmartFrog Components SmartFrog Daemon RMI / (SOAP) declare desired state of the system runtime(s) instantiate components components configure, start and stop the apps
summary web services are more complex than web sites or intranet applications to deploy  developers:  deployment is (nearly) everything operations:  developers are not your enemy management: recognize the new problems; address them strive for simplicity
Ad

More Related Content

What's hot (20)

Automated Performance Testing With J Meter And Maven
Automated  Performance  Testing With  J Meter And  MavenAutomated  Performance  Testing With  J Meter And  Maven
Automated Performance Testing With J Meter And Maven
PerconaPerformance
 
Asynchronous programming in .net 4.5 with c#
Asynchronous programming in .net 4.5 with c#Asynchronous programming in .net 4.5 with c#
Asynchronous programming in .net 4.5 with c#
Binu Bhasuran
 
Loadrunner vs Jmeter
Loadrunner vs JmeterLoadrunner vs Jmeter
Loadrunner vs Jmeter
Atul Pant
 
C# 5 deep drive into asynchronous programming
C# 5 deep drive into asynchronous programmingC# 5 deep drive into asynchronous programming
C# 5 deep drive into asynchronous programming
Praveen Prajapati
 
FreeRTOS basics (Real time Operating System)
FreeRTOS basics (Real time Operating System)FreeRTOS basics (Real time Operating System)
FreeRTOS basics (Real time Operating System)
Naren Chandra
 
Jmeter
JmeterJmeter
Jmeter
Sun Technlogies
 
Reliability and clock synchronization
Reliability and clock synchronizationReliability and clock synchronization
Reliability and clock synchronization
Sri Manakula Vinayagar Engineering College
 
Salt conf15 presentation-william-cannon
Salt conf15 presentation-william-cannonSalt conf15 presentation-william-cannon
Salt conf15 presentation-william-cannon
William Cannon
 
QA. Load Testing
QA. Load TestingQA. Load Testing
QA. Load Testing
Alex Galkin
 
Load Runner
Load RunnerLoad Runner
Load Runner
Shama Ahsan
 
Load Runner
Load RunnerLoad Runner
Load Runner
Vladimir Soghoyan
 
Asynchronous t sql
Asynchronous t sqlAsynchronous t sql
Asynchronous t sql
Remus Rusanu
 
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
JOLLUSUDARSHANREDDY
 
PHP North-East - Automated Deployment
PHP North-East - Automated DeploymentPHP North-East - Automated Deployment
PHP North-East - Automated Deployment
Michael Peacock
 
How to Simplify Load Testing: JMeter and Beyond
How to Simplify Load Testing: JMeter and BeyondHow to Simplify Load Testing: JMeter and Beyond
How to Simplify Load Testing: JMeter and Beyond
Andrey Pokhilko
 
LoadRunner walkthrough
LoadRunner walkthroughLoadRunner walkthrough
LoadRunner walkthrough
Bhuvaneswari Subramani
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
Docker, Inc.
 
Async-await best practices in 10 minutes
Async-await best practices in 10 minutesAsync-await best practices in 10 minutes
Async-await best practices in 10 minutes
Paulo Morgado
 
Introduction to Reactive Java
Introduction to Reactive JavaIntroduction to Reactive Java
Introduction to Reactive Java
Tomasz Kowalczewski
 
Reactive Java (GeeCON 2014)
Reactive Java (GeeCON 2014)Reactive Java (GeeCON 2014)
Reactive Java (GeeCON 2014)
Tomasz Kowalczewski
 
Automated Performance Testing With J Meter And Maven
Automated  Performance  Testing With  J Meter And  MavenAutomated  Performance  Testing With  J Meter And  Maven
Automated Performance Testing With J Meter And Maven
PerconaPerformance
 
Asynchronous programming in .net 4.5 with c#
Asynchronous programming in .net 4.5 with c#Asynchronous programming in .net 4.5 with c#
Asynchronous programming in .net 4.5 with c#
Binu Bhasuran
 
Loadrunner vs Jmeter
Loadrunner vs JmeterLoadrunner vs Jmeter
Loadrunner vs Jmeter
Atul Pant
 
C# 5 deep drive into asynchronous programming
C# 5 deep drive into asynchronous programmingC# 5 deep drive into asynchronous programming
C# 5 deep drive into asynchronous programming
Praveen Prajapati
 
FreeRTOS basics (Real time Operating System)
FreeRTOS basics (Real time Operating System)FreeRTOS basics (Real time Operating System)
FreeRTOS basics (Real time Operating System)
Naren Chandra
 
Salt conf15 presentation-william-cannon
Salt conf15 presentation-william-cannonSalt conf15 presentation-william-cannon
Salt conf15 presentation-william-cannon
William Cannon
 
QA. Load Testing
QA. Load TestingQA. Load Testing
QA. Load Testing
Alex Galkin
 
Asynchronous t sql
Asynchronous t sqlAsynchronous t sql
Asynchronous t sql
Remus Rusanu
 
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...                                RTOS  CASE STUDY OF CODING FOR SENDING APPLIC...
RTOS CASE STUDY OF CODING FOR SENDING APPLIC...
JOLLUSUDARSHANREDDY
 
PHP North-East - Automated Deployment
PHP North-East - Automated DeploymentPHP North-East - Automated Deployment
PHP North-East - Automated Deployment
Michael Peacock
 
How to Simplify Load Testing: JMeter and Beyond
How to Simplify Load Testing: JMeter and BeyondHow to Simplify Load Testing: JMeter and Beyond
How to Simplify Load Testing: JMeter and Beyond
Andrey Pokhilko
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
Docker, Inc.
 
Async-await best practices in 10 minutes
Async-await best practices in 10 minutesAsync-await best practices in 10 minutes
Async-await best practices in 10 minutes
Paulo Morgado
 

Viewers also liked (19)

Hadoop & Hep
Hadoop & HepHadoop & Hep
Hadoop & Hep
Steve Loughran
 
Benchmarking
BenchmarkingBenchmarking
Benchmarking
Steve Loughran
 
Deploying On EC2
Deploying On EC2Deploying On EC2
Deploying On EC2
Steve Loughran
 
Beyond Unit Testing
Beyond Unit TestingBeyond Unit Testing
Beyond Unit Testing
Steve Loughran
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
Steve Loughran
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!
Steve Loughran
 
Testing
TestingTesting
Testing
Steve Loughran
 
The Wondrous Curse of Interoperability
The Wondrous Curse of InteroperabilityThe Wondrous Curse of Interoperability
The Wondrous Curse of Interoperability
Steve Loughran
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
Steve Loughran
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
Steve Loughran
 
Hadoop Futures
Hadoop FuturesHadoop Futures
Hadoop Futures
Steve Loughran
 
New Roles In The Cloud
New Roles In The CloudNew Roles In The Cloud
New Roles In The Cloud
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
Steve Loughran
 
Application Architecture For The Cloud
Application Architecture For The CloudApplication Architecture For The Cloud
Application Architecture For The Cloud
Steve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
Hadoop gets Groovy
Hadoop gets GroovyHadoop gets Groovy
Hadoop gets Groovy
Steve Loughran
 
HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
Steve Loughran
 
Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!Help! My Hadoop doesn't work!
Help! My Hadoop doesn't work!
Steve Loughran
 
The Wondrous Curse of Interoperability
The Wondrous Curse of InteroperabilityThe Wondrous Curse of Interoperability
The Wondrous Curse of Interoperability
Steve Loughran
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
Steve Loughran
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
Steve Loughran
 
New Roles In The Cloud
New Roles In The CloudNew Roles In The Cloud
New Roles In The Cloud
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
Steve Loughran
 
Application Architecture For The Cloud
Application Architecture For The CloudApplication Architecture For The Cloud
Application Architecture For The Cloud
Steve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
Ad

Similar to When Web Services Go Bad (20)

Open Source XMPP for Cloud Services
Open Source XMPP for Cloud ServicesOpen Source XMPP for Cloud Services
Open Source XMPP for Cloud Services
mattjive
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
Aman Balutia
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Puppet
 
086 Microsoft Application Platform 2009 2010
086 Microsoft Application Platform 2009 2010086 Microsoft Application Platform 2009 2010
086 Microsoft Application Platform 2009 2010
GeneXus
 
How to Configure the CA Workload Automation System Agent agentparm.txt File
How to Configure the CA Workload Automation System Agent agentparm.txt FileHow to Configure the CA Workload Automation System Agent agentparm.txt File
How to Configure the CA Workload Automation System Agent agentparm.txt File
CA Technologies
 
North east user group tour
North east user group tourNorth east user group tour
North east user group tour
10n Software, LLC
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
KAI CHU CHUNG
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
Mullaiselvan Mohan
 
.net Framework
.net Framework.net Framework
.net Framework
Rishu Mehra
 
SAMKUMAR- Sr.Linux SystemAdministrator (1)
SAMKUMAR- Sr.Linux SystemAdministrator (1)SAMKUMAR- Sr.Linux SystemAdministrator (1)
SAMKUMAR- Sr.Linux SystemAdministrator (1)
gandi samkumar
 
Auto sre with keptn
Auto sre with keptnAuto sre with keptn
Auto sre with keptn
LibbySchulze
 
Spring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasSpring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour Dallas
VMware Tanzu
 
Surekha_haoop_exp
Surekha_haoop_expSurekha_haoop_exp
Surekha_haoop_exp
surekhakadi
 
NodeJS guide for beginners
NodeJS guide for beginnersNodeJS guide for beginners
NodeJS guide for beginners
Enoch Joshua
 
MS Cloud Day - Deploying and monitoring windows azure applications
MS Cloud Day - Deploying and monitoring windows azure applicationsMS Cloud Day - Deploying and monitoring windows azure applications
MS Cloud Day - Deploying and monitoring windows azure applications
Spiffy
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC Products
Baiju P.S.
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET Journal
 
Azure migration
Azure migrationAzure migration
Azure migration
Arnon Rotem-Gal-Oz
 
Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
Steve Loughran
 
Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)
Sujee Maniyam
 
Open Source XMPP for Cloud Services
Open Source XMPP for Cloud ServicesOpen Source XMPP for Cloud Services
Open Source XMPP for Cloud Services
mattjive
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Puppet
 
086 Microsoft Application Platform 2009 2010
086 Microsoft Application Platform 2009 2010086 Microsoft Application Platform 2009 2010
086 Microsoft Application Platform 2009 2010
GeneXus
 
How to Configure the CA Workload Automation System Agent agentparm.txt File
How to Configure the CA Workload Automation System Agent agentparm.txt FileHow to Configure the CA Workload Automation System Agent agentparm.txt File
How to Configure the CA Workload Automation System Agent agentparm.txt File
CA Technologies
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
KAI CHU CHUNG
 
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
10 years in Network Protocol testing L2 L3 L4-L7 Tcl Python Manual and Automa...
Mullaiselvan Mohan
 
SAMKUMAR- Sr.Linux SystemAdministrator (1)
SAMKUMAR- Sr.Linux SystemAdministrator (1)SAMKUMAR- Sr.Linux SystemAdministrator (1)
SAMKUMAR- Sr.Linux SystemAdministrator (1)
gandi samkumar
 
Auto sre with keptn
Auto sre with keptnAuto sre with keptn
Auto sre with keptn
LibbySchulze
 
Spring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasSpring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour Dallas
VMware Tanzu
 
Surekha_haoop_exp
Surekha_haoop_expSurekha_haoop_exp
Surekha_haoop_exp
surekhakadi
 
NodeJS guide for beginners
NodeJS guide for beginnersNodeJS guide for beginners
NodeJS guide for beginners
Enoch Joshua
 
MS Cloud Day - Deploying and monitoring windows azure applications
MS Cloud Day - Deploying and monitoring windows azure applicationsMS Cloud Day - Deploying and monitoring windows azure applications
MS Cloud Day - Deploying and monitoring windows azure applications
Spiffy
 
Transcend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC ProductsTranscend Automation's Kepware OPC Products
Transcend Automation's Kepware OPC Products
Baiju P.S.
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET Journal
 
Taming Deployment With Smart Frog
Taming Deployment With Smart FrogTaming Deployment With Smart Frog
Taming Deployment With Smart Frog
Steve Loughran
 
Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)Iphone client-server app with Rails backend (v3)
Iphone client-server app with Rails backend (v3)
Sujee Maniyam
 
Ad

More from Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
Steve Loughran
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
Steve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
Steve Loughran
 
Testing
TestingTesting
Testing
Steve Loughran
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
Steve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
Steve Loughran
 
YARN Services
YARN ServicesYARN Services
YARN Services
Steve Loughran
 
Datacentre stack
Datacentre stackDatacentre stack
Datacentre stack
Steve Loughran
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
Steve Loughran
 
2014 01-02-patching-workflow
2014 01-02-patching-workflow2014 01-02-patching-workflow
2014 01-02-patching-workflow
Steve Loughran
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
Steve Loughran
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
Steve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
Steve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
Steve Loughran
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
Steve Loughran
 
2014 01-02-patching-workflow
2014 01-02-patching-workflow2014 01-02-patching-workflow
2014 01-02-patching-workflow
Steve Loughran
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
Steve Loughran
 

Recently uploaded (20)

Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 

When Web Services Go Bad

  • 1. when web services go bad steve loughran hp laboratories March 2002 notes from the field
  • 2. global distribution global load expectations of “web speed” development http at the bottom integration with remote callers service level agreements covering QoS a new set of a problems! why is it so hard? web service development =
  • 3. image storage and SVG rendering service XML-RPC POST SVG in payload render to JPEG return URL user fetch or print on press +1TB image store for customer photos
  • 4. constraints SLA: high availability render times: 2s proof render 15s production basic asset store done: -SQL server IIS/ASP COM/MTS->Win2K/COM+ “real soon now” XML-RPC agreed on RPC mechanism wildly optimistic timescales
  • 5. SVG into XML-RPC <?xml version=&quot;1.0&quot;?><methodCall version=&quot;1.0&quot;> <methodName>render_svg</methodName> <params> <param><value><string> 110 </string></value></param> <param><value> <string> PD94bWwgdmVyc2lvbj0iMS4wIiBzdGFuZGFsb25lPSJubyIgPz4NCjwhRE9DVFlQRSBzdmcgUFVC TElDICItLy9XM0MvL0RURCBTVkcgMjAwMDExMDIvL0VOIg0KICJodHRwOi8vd3d3LnczLm9yZy9U Ui8yMDAwL0NSLVNWRy0yMDAwMTEwMi9EVEQvc3ZnLTIwMDAxMTAyLmR0ZCI+DQo8c3ZnIHdpZHRo PSIxMDI0cHQiIGhlaWdodD0iMTAyNHB0Ij4NCiA8ZyBpZD0icGljdHVyZSIgPjxpbWFnZSB3aWR0 aD0iMTAyNHB0IiBoZWlnaHQ9Ijc2OHB0Ig0KICAgeGxpbms6aHJlZj0iaHR0cDovL3N0ZWx2aW86 ODA4MC9zdW5zZXQuanBnIi8+PC9nPg0KPHRleHQgeD0iMjlwdCIgeT0iODVwdCINCiBzdHlsZT0i Zm9udC1mYW1pbHk6SGVsdmV0aWNhO2ZvbnQtc2l6ZTozNnB0O2ZpbGw6I2QwZDBkMDsiDQo+a2Vp dGggaXMgKG1heWJlKSBzZXh5PC90ZXh0Pg0KPC9zdmc+DQoNCg0K</ string></value> </param> <param><value><int> 72 </int></value></param> </params></methodCall> <?xml version=&quot;1.0&quot; standalone=&quot;no&quot; ?> <!DOCTYPE svg PUBLIC &quot;-//W3C//DTD SVG 20001102//EN&quot; &quot;http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/svg-20001102.dtd&quot;> <svg width=&quot;1024pt&quot; height=&quot;768pt&quot;> <g id=&quot;picture&quot; ><image width=“1024pt&quot; height=“768pt&quot; xlink:href=&quot;http://stelvio:8080/sunset.jpg &quot;/></g> <text x=&quot;29pt&quot; y=&quot;85pt&quot; style=&quot;font-family:Helvetica;font-size:36pt;fill:#d0d0d0;“> Having an excellent holiday! </text> </svg>
  • 6. Response <?xml version=&quot;1.0&quot;?> <methodResponse> <params> <param><value><string>2453</string></value></param> <param><value><string> http://stelvio:8080/hpsvgservlet/fetch?ref=_2_ec5132765b_4553.jpg-110 </string></value> </param> </params> </methodResponse>
  • 7. what we ended up with ArrowPoint Apache Apache Bluestone x4 Bluestone x4 IIS/ASP IIS/ASP IIS/ASP PDC + DNS file store SQL server SQL server remote renderer store
  • 8. what worked? Web Service Model! XML based RPC HTTP for transport SVG for request ANT build & deploy JUnit for unit tests incremental development
  • 9. (nearly) no more WORKSFORME all server-side problems could be replicated log all failures server side for easier post mortem SEH catching of all win32 renderer errors phase II: SEH errors => auto email to support alias just integration… networking, client code issues (proxies, wrong URL, authentication)
  • 10. ANT + JUnit fully automated build process integrated JUnit testing automated deployment to local server stacks
  • 11. security cookie authentication in common domain encrypt user, session in cookie restricted ports on beta site admin pages by IP addr sanitize incoming SVG XML: -catch file: references -downgrade http: access -test XML &includes;
  • 12. configuration ASP: config in source: brittle, scaling issues Java: per cluster config files in the WAR +kept under SCM; cleaner -delays to change, scaling Use a database ? Use a directory service?
  • 13. what didn’t work: turning a web site into a web service
  • 14. what went wrong FedEx cabling raid controller unreliable switch router config forgotten passwords IP address issues accidental deletion of 8GB test data (!) COM+ authentication DNS JVM ‘lockup’ time differences between servers cluster race conditions boot race conditions MP thread safety resource leakage errors in JSP/ASP show up after deployment
  • 15. operations paranoia R&D not allowed near production boxes response to any security issue is “no live” when things don’t work, then they call us escalated minor security niggles into blocking issues effect #1: threat to weekends effect #2: threat to schedule effect #3: we stopped telling them of “issues”
  • 16. error messages java.io.NoRoute ToHostException if you don’t have Java/C# engineers on the ops team, you get called in for every message fix #1: “defect” tracking of operations issues, from early days of the app fix #2: always identify the errant system in fault codes
  • 17. HTTP Content-length is all you get how do you propagate a transient failure E_TRANSIENT_FAILURE =>retry with exponential backoff -how you really test this? -what about integrity?
  • 18. firefighting it takes 10 minutes to deploy it takes 30s to report a defect therefore, it must take 11 minutes to fix a bug and deploy the update … stay in control by never updating public servers more frequently than nightly
  • 19. complexity is the enemy availability is the primary casualty of complexity availability A(X) ≝ P(X) is working 0 ≤ A(X) ≤ 1 service S depends on services s 1..n A(S) = A(node)*A(s 1 )* A(s 2 )…A(s n ) redundancy: A(n 1 +n 2 )=A(n 1 )+A(n 2 )-A(n 1 )A(n 2 ) cost of redundancy depends on scalability of service: O(1), O(n), O(n 2 ), O(2 n ) …
  • 20. a modest proposal: deployment-centric software processes
  • 21. operations use cases/ XP stories update live server add new fonts bind to new database change account/passwords backup/restore system partition cluster multi-home the server diagnose intermittent failure these need support in software and/or process
  • 22. deployment and operations test cases probe for needed exes, COM objects validate remote sites visible check configuration treat all deployment issues as defects to track and to have test cases run the test cases at install time, and from the load balancing router
  • 23. operations issues are defects treat all deployment issues as defects to track don’t just fix it once, at it will crop up again. you need regression tests you need a repository of defects for easy searching. or they will phone you at 3am
  • 24. deploy early deploy often involve operations in system design give them regular builds to deploy during early development use a tool like CruiseControl for continual integration and local deployment provide 5x6 support at this stage maintain slow bug fix times
  • 25. instrument with JMX (Java) WMI (.NET) a public class ServiceManager implements ServiceManagerMBean { protected Service _owner; protected int _serviceTransactionCount; protected int _serviceUseCount; protected double _sellThroughDollars; public synchronized void bookSales(int count, double dollars) { _sellThroughDollars+=dollars; _serviceUseCount+=count; _serviceTransactionCount++; } public Double getSellThroughDollars() { return new Double(_sellThroughDollars); }
  • 26. current work SmartFrog Distributed deployment framework Configuration is Deployment declare desired state of machines Runtime handles it www.smartfrog.org
  • 27. How to host big applications across distributed resources Automatically / Repeatably Dynamically Correctly Securely How to manage them from installation to removal How to make grid fabrics useful for classic server-side apps HPLabs research
  • 28. SmartFrog Deployment Engine SmartFrog Node SmartFrog Components Description / Code Repositories RMI RMI Deploy Descriptions SmartFrog Daemon SmartFrog Node SmartFrog Components SmartFrog Daemon SmartFrog Node SmartFrog Components SmartFrog Daemon RMI / (SOAP) declare desired state of the system runtime(s) instantiate components components configure, start and stop the apps
  • 29. summary web services are more complex than web sites or intranet applications to deploy developers: deployment is (nearly) everything operations: developers are not your enemy management: recognize the new problems; address them strive for simplicity

Editor's Notes

  • #2: 06/10/09 This is meant to conjure up the vision of some late night cable TV show “we take you behind the scenes of colocation sites, finding the worst web services in existence”, interviewing the people using them, managing them, integrating them, before finally catching up with the developer team on their doorsteps, asking them “why did you produce such a nightmare?” If such a show existed, would you be on it? I am going to tell you how to avoid that, without getting the government to give you a new identity under the Developer Relocation Program. Would I be on it? I would have liked to have been on it before the project was over. This is a photo of me 6000 foot up one of the cascade peaks on a technical spring mountaineering weekend, and I was getting voicemail about config problems. It would have been nice to have had a new identity then. As it is I had feign cellphone coverage failure