SlideShare a Scribd company logo
11
Make it part of your Agile Delivery
2
3
4
5
Testing is Important – and gives Confidence
6
But are we ready for “The Real” world?
7
Measure Performance during the game
Ball Possession: 40 : 60
Fouls: 0 : 0
Score: 0 : 0
Minute 1 - 5
8
Measure Performance during the game
Minute 6 - 35
Ball Possession: 80 : 20
Fouls: 2 : 12
Score: 0 : 0
9
Deep Dive Analysis
10
Options “To Fix” the situation
11
Not always a happy ending 
Minute 90
Ball Possession: 80 : 20
Fouls: 4 : 25
Score: 3 : 0
12
FRUSTRATED FANS!!
12
13
How does that
relate to
Software?
1414
From Deploy to …
Deploy Promotion/Event Problems Ops Playbook War Room
Timeline
1515
The “War Room” – back then
'Houston, we have a problem‘
NASA Mission Control Center, Apollo 13, 1970
1616
The “War Room” – NOW
Facebook – December 2012
1717
Problem: Unclear End User Problem Descriptions
1818
Statuc Quo: Ops Runbook – System Unresponsive
1919
Problem: Unclear Ops Problem Descriptions
2020
Status Quo: Ops Runbook – High Resource Usage
2121
Lack of data?
2222
23
Answers to the right questions
2424
What are the real questions?
Individual Users? ALL users?
Is it the APP? Or Delivery Chain?
Code problem? Infrastructure?
One transaction? ALL transactions?
In AppServer? In Virtual Machine?
2525
Problem: What Devs would like to have
2626
Problem: What Devs would like to have
Top Contributor is related to
String handling
99% of that time comes from
RegEx Pattern Matching
Page Rendering is the main component
2727
Its getting this …Its like getting this …
28
… when you need to see this!
2929
Problem: Attitudes like this don’t help either
Image taken from https://www.scriptrock.com/blog/devops-whats-hype-about/
Shopzilla CIO (in 2010): “… when they get in the war room - the developers and ops teams
describe the problem as the enemy, not each other”
3030
Problem: Very “expensive” to work on these issues
~80% of problems
caused by ~20% patterns
YES we know this
80%Dev Time in Bug Fixing
$60BDefect Costs
BUT
3131
TOP PROBLEM
PATTERNS
• Taken From Production
Environments
3232
Top Problem Patterns: Resource Pools
3333
Top Problem Patterns: Resource Pools
3434
Deployment Mistakes lead to internal Exceptions
3535
Deployment Mistakes lead to high logging overhead
3636
Production Deployment leads to Log SYNC Issues
3737
Long running SQL with Production Data
3838
N+1 Query Problem
4040
Memory Leaks in Cache Layer with Production Data
Still crashes
Problem fixed!Fixed Version Deployed
4242
BLOATED Web Sites
17! JS Files – 1.7MB in Size
Useless Information!
Even might be a security risk!
4343
Missing or incorrect configured browser caches
62! Resources not cached
49! Resources with short expiration
4444
SLOW or Failing 3rd Party Content
4545
Want MORE of these and more details?
http://apmblog.compuware.com
4646
Lots of Problems that could have been avoided
• BUT WHY are they still making it to Production?
4747
Missing Focus on Performance
4848
Different Goals for Dev and Ops
4949
Disconnected Teams despite “Shared Responsibility”
5050
Solution: DevOps + Performance Focus
5151
BEST PRACTICES
5252
Culture Become ONE Team
5353
Culture Testability
5454
Automate & Measure …Performance
5555
Automate & Measure …Scalability
5656
Automate Deployment
5757
How? Performance Focus in Test Automation
12 0 120ms
3 1 68ms
Build 20 testPurchase OK
testSearch OK
Build 17 testPurchase OK
testSearch OK
Build 18 testPurchase FAILED
testSearch OK
Build 19 testPurchase OK
testSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms
3 1 68ms
12 5 60ms
3 1 68ms
75 0 230ms
3 1 68ms
Test Framework Results Architectural Data
We identified a regresesion
Problem solved
Lets look behind the
scenes
Exceptions probably reason
for failed tests
Problem fixed but now we have an
architectural regression
Problem fixed but now we have an
architectural regression
Now we have the functional and
architectural confidence
5858
How? Performance Focus in Test Automation
Analyzing All Unit /
Performance Tests
Analyze Perf
Metrics
Identify
Regressions
5959
How? Performance Focus in Test Automation
Cross Impact of KPIs
6060
How? Performance Focus in Test Automation
Embed your Architectural
Results in Jenkins
6161
Share Tools
6262
Share Results
6363
Getting control over your weekend again …
Enjoy a beer with friends?
Instead of pizza and soda
with your colleagues?
64
6565
YOU HAVE TIME FOR THE REAL …
6666
DevOps Automation In-Action
• Automate Load Test Analysis and Regression Detection
6767
DevOps Automation In-Action
• Automate Load Test Analysis and Regression Detection
6868
DevOps: Actionable Data to Ops
• Input for Capacity and Deployment Planning
Number of Requests on The
App Server we will need to
handle
Might need to tune GC Settings
to reduce GC Overhead
CPU is going to be tight with
these machines – also
impacted by GC Activity
Input on Thread Pool
Configuration
Memory Usage for expected
load still provides enough
“headroom”
6969
IF WE DO ALL THAT
80% Dev Time for Bug Fixing
$60B Costs by Defects
7070
Want MORE of these and more details?
http://apmblog.compuware.com
71
© 2011 Compuware Corporation — All Rights Reserved
Simply Smarter

More Related Content

What's hot (20)

PPTX
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Andreas Grabner
 
PDF
Distributed Release Management
Mike Brittain
 
PPTX
Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015
Andreas Grabner
 
PPTX
Introduction to Continuous Delivery (BBWorld/DevCon 2013)
Mike McGarr
 
PPTX
Java Performance Mistakes
Andreas Grabner
 
PPTX
HSPS 2015 - SharePoint Performance Santiy Checks
Andreas Grabner
 
PPTX
Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...
Andreas Grabner
 
PDF
Dan Cuellar
CodeFest
 
PDF
Principles and Practices in Continuous Deployment at Etsy
Mike Brittain
 
PDF
Continuous Delivery Testing @HiQ
Tomas Riha
 
PDF
2012 - A Release Odyssey
Ernest Mueller
 
PDF
Continuous Delivery Distilled
Matt Callanan
 
PPTX
Tis The Season: Load Testing Tips and Checklist for Retail Seasonal Readiness
SOASTA
 
PPTX
Using JMeter in CloudTest for Continuous Testing
SOASTA
 
PPTX
Using Automation to Meet Demands for Performance and Quality
Neotys
 
PDF
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
Matt Tesauro
 
PPTX
Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Engineering
 
PPTX
Continuous Delivery
Mike McGarr
 
PDF
OWASP DefectDojo - Open Source Security Sanity
Matt Tesauro
 
PDF
Continuous Security: Using Automation to Expand Security's Reach
Matt Tesauro
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Andreas Grabner
 
Distributed Release Management
Mike Brittain
 
Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015
Andreas Grabner
 
Introduction to Continuous Delivery (BBWorld/DevCon 2013)
Mike McGarr
 
Java Performance Mistakes
Andreas Grabner
 
HSPS 2015 - SharePoint Performance Santiy Checks
Andreas Grabner
 
Performance Metrics for your Build Pipeline - presented at Vienna WebPerf Oct...
Andreas Grabner
 
Dan Cuellar
CodeFest
 
Principles and Practices in Continuous Deployment at Etsy
Mike Brittain
 
Continuous Delivery Testing @HiQ
Tomas Riha
 
2012 - A Release Odyssey
Ernest Mueller
 
Continuous Delivery Distilled
Matt Callanan
 
Tis The Season: Load Testing Tips and Checklist for Retail Seasonal Readiness
SOASTA
 
Using JMeter in CloudTest for Continuous Testing
SOASTA
 
Using Automation to Meet Demands for Performance and Quality
Neotys
 
Making Continuous Security a Reality with OWASP’s AppSec Pipeline - Matt Tesa...
Matt Tesauro
 
Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Engineering
 
Continuous Delivery
Mike McGarr
 
OWASP DefectDojo - Open Source Security Sanity
Matt Tesauro
 
Continuous Security: Using Automation to Expand Security's Reach
Matt Tesauro
 

Viewers also liked (20)

PDF
Hum2220 0915 syllabus
ProfWillAdams
 
PPTX
Turkey
IESCAComenius
 
RTF
My Day by Alba
lledocursotic
 
PPTX
What DevOps can learn from Oktoberfest
Andreas Grabner
 
PPTX
Lemak final
21 Memento
 
PDF
Hum2310 sp2015 proust questionnaire
ProfWillAdams
 
PPTX
Kemungkinan
Jeneng Omega
 
PDF
Autismo
Pamela Rosales
 
PDF
Tsahim 2
mongoo_8301
 
PPTX
Daily routines by Valerie
lledocursotic
 
PPT
Lec sequential
UmAir AsgHar
 
PPT
6 evaluation product scs environmental services chia
mvvillanueva720
 
DOCX
8 instrumento autoeficacia grupo 8
Luis Aracas
 
PPTX
How to keep you out of the News: Web and End-to-End Performance Tips
Andreas Grabner
 
PDF
2010 annual report
Direct Relief
 
ODP
Lengua anuncio
franky226
 
PPT
Proposal for creation of mhadei tiger reserve by rajendra kerkar
tallulahdsilva
 
PDF
Poultry Planner July 2012
Manish Arora
 
PPTX
MorenoMassip_Avi
alanmorenomassip
 
PDF
Microinvest Warehouse Open
OpenFest team
 
Hum2220 0915 syllabus
ProfWillAdams
 
My Day by Alba
lledocursotic
 
What DevOps can learn from Oktoberfest
Andreas Grabner
 
Lemak final
21 Memento
 
Hum2310 sp2015 proust questionnaire
ProfWillAdams
 
Kemungkinan
Jeneng Omega
 
Tsahim 2
mongoo_8301
 
Daily routines by Valerie
lledocursotic
 
Lec sequential
UmAir AsgHar
 
6 evaluation product scs environmental services chia
mvvillanueva720
 
8 instrumento autoeficacia grupo 8
Luis Aracas
 
How to keep you out of the News: Web and End-to-End Performance Tips
Andreas Grabner
 
2010 annual report
Direct Relief
 
Lengua anuncio
franky226
 
Proposal for creation of mhadei tiger reserve by rajendra kerkar
tallulahdsilva
 
Poultry Planner July 2012
Manish Arora
 
MorenoMassip_Avi
alanmorenomassip
 
Microinvest Warehouse Open
OpenFest team
 
Ad

Similar to StarWest 2013 Performance is not an afterthought – make it a part of your Agile Delivery (20)

PPTX
JavaOne - Performance Focused DevOps to Improve Cont Delivery
Andreas Grabner
 
PPTX
OOP 2014 - Lifecycle By Design
Wolfgang Gottesheim
 
PDF
AgileDC15 I'm Using Chef So I'm DevOps Right?
Rob Brown
 
PPTX
London web perfug_performancefocused_devops_feb2014
Andreas Grabner
 
PPTX
DevOps evolution architecting the modern software factory - cloud expo east 2017
Anand Akela
 
PDF
Not a DevOps talk - Coté
DevOpsDaysJKT
 
PDF
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Splunk
 
PPTX
Digital transformation just means creating better software
Michael Coté
 
PPTX
Measuring Performance: See the Science of DevOps Measurement in Action
XebiaLabs
 
PDF
DevOps: Find Solutions, Not More Defects
TechWell
 
PPTX
Works on my machine, your problem now? - QCon 2014
Wolfgang Gottesheim
 
PDF
Doing DevOps? Better base it on performance - DevNexus 2014
Stephen Wilson
 
PDF
(True DevOps is all about) creating better software
Software Guru
 
PDF
DevOps2018 Singapore Eliminating the dev versus ops mentality
Mirco Hering
 
PPTX
Agile, DevOps & Test
Qualitest
 
PPTX
Agile-plus-DevOps Testing for Packaged Applications
Worksoft
 
PDF
AppSphere 15 - Transforming the Business: The Role of DevOps
AppDynamics
 
PPTX
Continuous Performance Testing and Monitoring in Agile Development
Neotys
 
PPTX
AUG NYC June 12 - Event Presentations
Madhusudhan Matrubai
 
PDF
Atagg2015 Where testing is moving in agile cloud world!
Agile Testing Alliance
 
JavaOne - Performance Focused DevOps to Improve Cont Delivery
Andreas Grabner
 
OOP 2014 - Lifecycle By Design
Wolfgang Gottesheim
 
AgileDC15 I'm Using Chef So I'm DevOps Right?
Rob Brown
 
London web perfug_performancefocused_devops_feb2014
Andreas Grabner
 
DevOps evolution architecting the modern software factory - cloud expo east 2017
Anand Akela
 
Not a DevOps talk - Coté
DevOpsDaysJKT
 
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Splunk
 
Digital transformation just means creating better software
Michael Coté
 
Measuring Performance: See the Science of DevOps Measurement in Action
XebiaLabs
 
DevOps: Find Solutions, Not More Defects
TechWell
 
Works on my machine, your problem now? - QCon 2014
Wolfgang Gottesheim
 
Doing DevOps? Better base it on performance - DevNexus 2014
Stephen Wilson
 
(True DevOps is all about) creating better software
Software Guru
 
DevOps2018 Singapore Eliminating the dev versus ops mentality
Mirco Hering
 
Agile, DevOps & Test
Qualitest
 
Agile-plus-DevOps Testing for Packaged Applications
Worksoft
 
AppSphere 15 - Transforming the Business: The Role of DevOps
AppDynamics
 
Continuous Performance Testing and Monitoring in Agile Development
Neotys
 
AUG NYC June 12 - Event Presentations
Madhusudhan Matrubai
 
Atagg2015 Where testing is moving in agile cloud world!
Agile Testing Alliance
 
Ad

More from Andreas Grabner (20)

PPTX
KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
Andreas Grabner
 
PPTX
OpenTelemetry For GitOps: Tracing Deployments from Git Commit to Production
Andreas Grabner
 
PPTX
Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps Deployments
Andreas Grabner
 
PPTX
Observability and Orchestration of your GitOps Deployments with Keptn
Andreas Grabner
 
PPTX
Release Readiness Validation with Keptn for Austrian Online Banking Software
Andreas Grabner
 
PPTX
Adding Security to your SLO-based Release Validation with Keptn
Andreas Grabner
 
PPTX
A Guide to Event-Driven SRE-inspired DevOps
Andreas Grabner
 
PPTX
Jenkins Online Meetup - Automated SLI based Build Validation with Keptn
Andreas Grabner
 
PPTX
Continuous Delivery and Automated Operations on k8s with keptn
Andreas Grabner
 
PPTX
Keptn - Automated Operations & Continuous Delivery for k8s
Andreas Grabner
 
PPTX
Shipping Code like a keptn: Continuous Delivery & Automated Operations on k8s
Andreas Grabner
 
PPTX
Top Performance Problems in Distributed Architectures
Andreas Grabner
 
PPTX
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Andreas Grabner
 
PPTX
Monitoring as a Self-Service in Atlassian DevOps Toolchain
Andreas Grabner
 
PPTX
How to explain DevOps to your mom
Andreas Grabner
 
PPTX
DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys
Andreas Grabner
 
PPTX
AWS Summit - Trends in Advanced Monitoring for AWS environments
Andreas Grabner
 
PPTX
DevOps Transformation at Dynatrace and with Dynatrace
Andreas Grabner
 
PPTX
DevOps Pipelines and Metrics Driven Feedback Loops
Andreas Grabner
 
PPTX
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Andreas Grabner
 
KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
Andreas Grabner
 
OpenTelemetry For GitOps: Tracing Deployments from Git Commit to Production
Andreas Grabner
 
Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps Deployments
Andreas Grabner
 
Observability and Orchestration of your GitOps Deployments with Keptn
Andreas Grabner
 
Release Readiness Validation with Keptn for Austrian Online Banking Software
Andreas Grabner
 
Adding Security to your SLO-based Release Validation with Keptn
Andreas Grabner
 
A Guide to Event-Driven SRE-inspired DevOps
Andreas Grabner
 
Jenkins Online Meetup - Automated SLI based Build Validation with Keptn
Andreas Grabner
 
Continuous Delivery and Automated Operations on k8s with keptn
Andreas Grabner
 
Keptn - Automated Operations & Continuous Delivery for k8s
Andreas Grabner
 
Shipping Code like a keptn: Continuous Delivery & Automated Operations on k8s
Andreas Grabner
 
Top Performance Problems in Distributed Architectures
Andreas Grabner
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Andreas Grabner
 
Monitoring as a Self-Service in Atlassian DevOps Toolchain
Andreas Grabner
 
How to explain DevOps to your mom
Andreas Grabner
 
DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys
Andreas Grabner
 
AWS Summit - Trends in Advanced Monitoring for AWS environments
Andreas Grabner
 
DevOps Transformation at Dynatrace and with Dynatrace
Andreas Grabner
 
DevOps Pipelines and Metrics Driven Feedback Loops
Andreas Grabner
 
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Andreas Grabner
 

Recently uploaded (20)

PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PDF
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PDF
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Home Cleaning App Development Services.pdf
V3cube
 
PDF
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Home Cleaning App Development Services.pdf
V3cube
 
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
Digital Circuits, important subject in CS
contactparinay1
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 

StarWest 2013 Performance is not an afterthought – make it a part of your Agile Delivery

Editor's Notes

  • #3: Who knows what that is?It’s the Fifa World Cup Trophy
  • #4: Teams are currently competing in the qualifications to compete in Brazil 2014
  • #5: This is “my” austrian national team soccer team. Their GOAL is to qualify for Brazil 2014. After the many failed attempts in the past we hired a new coach who’s goal is to form a new team that PERFORMs good enough to qualify
  • #6: In order to get there the team competed in many test games. Which gaves them a lot of confidence because they played against teams that were “easier” to beat. At the end of these tests we even started in the qualification with some wins against teams that we were expecting to winSo – at the end of these “test and easy qualification games” we thought: “ALL GOOD – THE ROAD IS OPEN FOR 2014 – NOT ONLY WILL WE QUALIFY BUT WE ALSO BELIEVE WE HAVE SUCH A STRONG TEAM THAT WILL ALSO DO WELL AT THE WORLDCUP”
  • #7: Then reality kicked in when we had our first “real competitor” – it was the first qualification against a team whos quality level is at a level that we have to expect at the world cup.The competing team was Germany – and – based on these images you can see how the game went
  • #8: The coach is responsible to watch the game and see how things are going. Like in other sports – soccer has a couple of Key Performance Indicators such as Ball Possession, Fouls and the actual scoreThe first 5 minutes actually didn’t look too bad
  • #9: After the first 5 minutes the game changes – with germany taking over the game in their typical way. The KPIs make this very clearThe coach is responsible to react based on these values and how the game wents
  • #10: The coach should use more data for detailed analysis on what is going wrong in the game
  • #11: One of his options is to substitute players – or even change tacticsDoes this succeed based on the KPIs that we have seen before?
  • #12: Well – not always. Just replacing players – putting some in that are faster in chasing the ball doesn’t always help
  • #15: StoryNew Build Deployed on Thursday Evening Everything runs smooth on Friday DaytimeAn Ad Campaign hits the Air Friday NightThe site crashes under load -> ALERTS GO OFFRestarting Server -> SERVER DOESN’T STARTAdding more Servers-> PROBLEM REMAINSCalling in the “App Experts” and Pizza Delivery!
  • #18: They getOps’ problem description: “App Server crashed”, “Out of file handles”Users’ problem description: “It is slow”, “It crashed”
  • #20: They GetHigh CPU, Memory or Bandwidth IssuesLog files: GB’s of logfiles with 99.9% “useless” information
  • #23: There is lots of data – but – does a high CPU Utilization really mean that this machine has a problem and need to be restarted?What could be the problem if your user experience tool tells you that people have bad response times?But what do we do with all this disconnected data?
  • #26: They needApplication data: Executed Transactions, Load, CPU, Memory, Disk usage,...Impacted transactions with context information: User Actions, Call stack, Thread Overview, Method Parameters, SQL Calls, Invoked Service CallsInvolved Application Components: Web Server, App Servers, DatabaseImpact of service calls: Performance, Availability, Response CodesError Details: HTTP Errors, Exceptions, warning/severe log messages
  • #30: Well – I guess there is just not more to say about this. The attitude between these teams doesn’t help in solving issues any faster
  • #31: We all know this statistic in one form or another – so – it is clear that these problems that are handled in War Rooms are VERY EXPENSIVEBUTWhat is interesting is that these problems are typically not detected earlier because the focus of engineering is on building new features instead of focusing on performance and scalable architecture.What’s interesting though is that many of these problems could easily be found earlier on – LETS have a look at these common problems that we constantly run into …
  • #32: Depending on the audience you want to show or hide some of the following slides
  • #33: Resource Pool ExhaustionMisconfiguration or failed deployment, e.g: default config from devActual resource leak -> can be identified with Unit/Integration Tests
  • #34: Resource Pool Exhaustion (same as before – just different Pool)Using the same deployment tools in Test and Ops can prevent thisTesting with real load can detect that
  • #35: Deployment Issues leading to heavy logging resulting in high I/O and CPUUsing the same deployment tools in Test and Ops can prevent thisAnalyzing Log Output per Component in Dev prevents this problem
  • #36: Deployment Issues leading to heavy logging resulting in high I/O and CPUUsing the same deployment tools in Test and Ops can prevent thisAnalyzing Log Output per Component in Dev prevents this problem
  • #38: Too many and too slow Database QueriesDev and Test need to have “production-like” database – Updates on a “Sample Databases” won’t show slow updatesAccess Patterns such as N+1 can be identified with Unit Tests
  • #39: Too many and too slow Database QueriesDev and Test need to have “production-like” database – Updates on a “Sample Databases” won’t show slow updatesAccess Patterns such as N+1 can be identified with Unit Tests
  • #40: Too much data requested from DatabaseDev and Test need to have “production-like” database – Otherwise these problem patterns can only be found in prodEducate Developers on “the power of SQL” – instead of loading everything in memory and performing filters/aggregations/… in the App
  • #41: Memory Leaks: Too much data in CacheCan be found in test with “production-like” data sets and tests that do not only test the same “search” query -> get feedback from ProdEducate developers on memory and cache strategies
  • #42: Synching issues caused by deadlocksCan be found with small scale performance unit tests by developersEducate developers on synchronization/multi-threading strategies
  • #43: Not following WPO (Web Performance Optimization Rules)Non optimized content, e.g: compression, merging, …Educate developers and automate WPO checks
  • #44: Not leveraging Browser-side CachingMisconfigured CDNs or missing cache settings -> automate cache configuration deploymentEducate developers; Educate testers to do “real life” testing (CDN, …)
  • #45: Slow or failing 3rd party contentImpacts page load time; Ops is required to monitor 3rd party servicesEducatedevs to optimize loading; Educate test to include 3rd party testing
  • #48: Why this is a problem?Biz pushes features. In order to deliver more features in a more agile way development adopted agile development methodologies to deliver more releases with more features in a shorter timeframeTo save costs we outsource. Some companies also organically grew by acquisition leaving us with dev teams that are distributed across the globeTo be faster we use 3rd Party Code as we do not want to re-invent the wheel. However – not every 3rd party component or service is really fit for the requirements we have in our production enviornment. It may work well on the workstation for a single user – but often fails in a larger environment3rd Party Services or ContentAverage US Sports Website loads content from 29! domains3rd Party Components in Application CodeHibernate, Spring, .NET Enterprise Blocks …GWT, ExtJS, jQueryAmazon Web Services, Google API, …
  • #49: Feature – richness vs. NO CHANGE
  • #50: Not well communicated what change is ahead. No “Integration” of Ops Teams in Agile Process
  • #51: CAMS is taken from OpsCode (Creators of Chef) Blog: http://www.opscode.com/blog/2010/07/16/what-devops-means-to-me/ Culture People and process first.  If you don’t have culture, all automation attempts will be fruitless.Automation This is one of the places you start once you understand your culture.  At this point, the tools can start to stitch together an automation fabric for Devops.  Tools for release management, provisioning, configuration management, systems integration, monitoring and control, and orchestration become important pieces in building a Devops fabric.Measurement If you can’t measure, you can’t improve.  A successful Devops implementation will measure everything it can as often as it can… performance metrics, process metrics, and even people metrics.SharingSharing is the loopback in the CAMS cycle.  Creating a culture where people share ideas and problems is critical.  Jody Mulkey, the CIO at Shopzilla, told me that they get in the war room the developers and operations teams describe the problem as the enemy, not each other.  Another interesting motivation in the Devops movement is the way sharing Devops success stories helps others.   First, it attracts talent, and second, there is a belief that by exposing ideas you can create a great open feedback that in the end helps them improveThe change that is required is already well understood in the DevOps movement that’s been going on for years – BUT – it is important to add Performance as Key Requirement to Culture, Automation, Measurement and Sharing. Culture: PERFORMANCE is a key requirement for everything that is done throughout the delivery chain. We have heard that a lot of the problems that lead to a War Room scenario are problems that could be found earlier if there would be a focus on Performance and Quality throughout the organizationAutomation: Automation is Key for DevOps and Agile Development. What needs to change is that performance and architectural problems are automatically detected in the development and delivery process. This can be achieved by focusing automated testing for exactly these problems – whether it is in C/I or in the “traditional” test areaMeasurement: We can only measure success if we have Key Performance Indicators for each team, e.g: Test Coverage %, Number of Tests Executed, Throughput, Response Time, Number of Deployments, … - an additional focus must be on measures that allow us to track performance and architectural issues. This allows us to identify and prevent any performance regressions as soon as they get introducedSharing:
  • #58: When we look at the results of your Testing Framework from Build over Build we can easily spot functional regressions. In our example we see that testPurchase fails in Build 18. We notify the developer, problem gets fixed and with Build 19 we are back to functional correctness. Looking behind the scenesThe problem is that Functional Testing only verifies the functionality to the caller of the tested function. Using dynaTrace we are able to analyze the internals of the tested code. We analyze metrics such as Number of Executed SQL Statements, Number of Exceptions thrown, Time spent on CPU, Memory Consumption, Number of Remoting Calls, Transfered Bytes, …In Build 18 we can see a nice correlation of Exceptions to the failed functional test. We can assume that one of these exceptions caused the problem. For a developer it would be very helpful to get exception information which helps to quickly identify the root cause of the problem and solve it faster.In Build 19 the Testing Framework indicates ALL GREEN. When we look behind the scenes we see that we have a big jump in SQL Statements as well as CPU Usage. What just happened? The Developer fixed the functional problem but introduced an architectural regression. This needs to be looked into – otherwise this change will have negative impact on the application once tested under loadIn Build 20 all these problems are fixed. We are still meeting our functional goals and are back to acceptable number of SQL Statements, Exceptions, CPU Usage, …
  • #60: Web Architectural Metrics# of JS Files, # of CSS, # of redirectsSize of Images