SlideShare a Scribd company logo
Top Performance Challenges in
Distributed Architectures
2
907Calls
41sec
97threads
104Calls
21sec
92threads
AWS CloudWatch API
Single Fetch
Bulk Fetch
$$$$
$$
Example #1: Building Monitoring for AWS
$0.01 / 1000 Calls
3
Click to edit Master title style
Lesson Learned: When moving to a more distributed architecture …
https://www.dynatrace.com/news/blog/monitoring-aws-fargate-with-dynatrace-testing-it-in-the-field/
4
Click to edit Master title style
…you also grow your dependencies …
https://www.dynatrace.com/news/blog/enterprise-cloud-ecs-microservices-and-dynatrace-at-neiman-marcus/
5
Click to edit Master title style
… and the potential impact of a failure grows!
https://www.dynatrace.com/news/blog/enterprise-cloud-ecs-microservices-and-dynatrace-at-neiman-marcus/
1 Bad Update
4 Impacted Services
Because of all
dependencies
6
Common Anti-Patterns we have seen
1. N+1 call
2. N+1 query
3. Payload flood
4. Granularity
5. Tight Coupling
6. Inefficient Service Flow
7. Timeouts, Retries, Backoff
8. Dependencies
7
N + 1 Call Pattern
8
N+1 Call Pattern
Monolithic Code
public double getQuote(String type) {
double quote=0;
for (Product product: products) {
quote += product.getValue();
}
return quote;
}
“Works” well within
a single process
9
N+1 Call Pattern
Product Service
Quote Service
1 call to Quote Service
=
44 calls to Product Service
1
14
17
13
10https://aws.amazon.com/solutions/case-studies/landbay/
11
X
Y
Z
Z
1
74
24
22 22
24 24
24 24
1
1 1
1
Subtotal: 243
12
N+1 Query Pattern
1 call to Quote Service
=
87 calls to DB
Quote Service
1
87
Product Service Product DB
13
Cascading N+1 Query Pattern
26k Database Calls
809
3956
4347
4773
3789
3915
4999
14
14
Payload Flood
15
Payload Flood: Doc Creation Sequential across services
16
Payload Flood in Numbers: Full DOC sent between Services
18MB
20MB
21MB
17
Refactor: Only send relevant data to specialized services
69MB
31.6MB
vs
18
Granularity
19
Granularity: Encryption carved out into separate service
Doc Processor Doc Transformer Doc Signer
Doc Encryption
Doc Shipment
Documents
316
1 1
2 6 6 6
118
20
Tight Coupling
21
Tightly coupled! Shall we really distribute/extract?
When “Breaking the Monolith” be aware …
1:1
https://www.dynatrace.com/news/blog/breaking-up-the-monolith-while-migrating-to-the-cloud-in-6-steps/
22
Inefficient Service Flow
drawing parallels to Web Performance Optimization
23
SFPO (Service Flow&Performance Optimization)
has to teach us how to optimize (micro)service
dependencies through Service Flows
24
Especially useful to identify: inefficient 3rd party services, recursive
call chains, N+1 Query Patterns, loading too much data, no data
caching, … -> sounds very familiar to WPO
25
Classical cascading effect of recursive
service calls!
26
Timeouts, Retries & Backoff
Credits go to Adrian Hornsby (@adhorn)
27
Bad Timeout & Retry Settings
From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=31
App
DB
Conn
Pool
INSERT
Timeout client side = 10s Timeout backend = default (e.g: 60s)
INSERT
INSERT
Retry
Retry
Retry
User 1
ERROR: Failed to get connection from pool
28
Backoff between Retries
From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=33
App
DB
Conn
Poool
INSERT
Timeout client side = 10s Timeout backend = 10s – time elapsed
Wait 2s before Retry
User 1
Wait 4s before Retry
Wait 8s before Retry
Wait 16s before Retry
Backoff
29
Simple Exponential Backoff is not enough: Add Jitter
No jitter With jitter
From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=34
30
Dependencies
31
Look beyond the “Tip of the Iceberg”:
Understanding Dependencies is critical!
32
Who is depending on me? What is the risk of change?
33
Recap - Common Anti-Patterns + Metrics to look at
1. N+1 call: # same Service Invocations per Request
2. N+1 query: # same SQL Invocations per Request
3. Payload flood: Transfer Size!
4. Granularity: # of Service Invocations across End-2-End Transaction
5. Tight Coupling: Ratio between Service Invocations
6. Inefficient Service Flow: # of Involved Services, # of Calls to each Service
7. Timeouts, Retries, Backoff: Pool Utilization, …
8. Dependencies: # of Incoming & Outcoming Dependencies
34
Automated Deployment Validation
https://github.com/keptn/pitometer
https://github.com/keptn, www.keptn.sh
35
Build #17
Build #21
Identify / Optimize Architectural Patterns
Recursive Calls, N+1 Call Pattern, Chatty Interfaces, No Caching Layer …
Automate Architectural Checks into CI/CD/CO!
36
Automate Performance Checks into CI/CD/CO!
How is Performance & Resource Consumption per Service Endpoint?
Build #17 Build #21
37
From Google: “Everything as Code” e.g: Enforce Architectural Rules
38
From Dynatrace: „Performance Signature as Code“ evaluated through Jenkins
“Performance Signature”
for Build Nov 16
“Performance Signature”
for Build Nov 17
“Performance Signature”
for every Build
“Multiple Metrics”
compared to prev
Timeframe
Simple Regression Detection
per Metric
https://www.neotys.com/performance-advisory-council/thomas_steinmaurer
39
Pitometer (part of @keptnProject): Metrics-based grading of a Deployment!
Metric Source &
Query
Grading Details
& Metric Score
Pitometer Specfile
Total Scoring
Objectives
2GB
Allocated Bytes (from Prometheus)
> 2GB: 0 Points
< 2GB: 20 Points
5%
2% < 2%: 0 Points
< 5%: 10 Points
> 5%: 20 Points
Conversion Rate (Dynatrace)
GraderSource
If value: 3GB
Score: 0
If value: 3.9%
Score: 10
Total Score: 10
40
Pitometer: Run Standalone - https://github.com/keptn/pitometer
Init
Source
Source
Grader
Run
Result
41
Pitometer in keptn
Autonomous Cloud Control Plane
prodstage
1: push
2: deploy (shadow)
3: test (performance)
4: evaluate (scalability KPIs)
6: deploy (blue/green)
7: evaluate (business KPIs)
8: operate (NoOps)
5: promote
Dev
42
Resources
• Keptn & Pitometer
• www.keptn.sh
• github.com/keptn
• github.com/keptn/pitometer
• Performance, Resiliency & Availablity Content
• Adrian Hornsby (AWS): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-
3742b5ba-e013-4f50-8512-00a65775f478
• Acacio Cruz (Google): https://www.spreaker.com/user/pureperformance/066-load-shedding-sre-at-
google-with-aca
• Thomas Steinmaurer (Dynatrace): https://www.neotys.com/performance-advisory-
council/thomas_steinmaurer
Top Performance Challenges in
Distributed Architectures
Ad

Recommended

Monitoring as a Self-Service in Atlassian DevOps Toolchain
Monitoring as a Self-Service in Atlassian DevOps Toolchain
Andreas Grabner
 
Release Readiness Validation with Keptn for Austrian Online Banking Software
Release Readiness Validation with Keptn for Austrian Online Banking Software
Andreas Grabner
 
Shipping Code like a keptn: Continuous Delivery & Automated Operations on k8s
Shipping Code like a keptn: Continuous Delivery & Automated Operations on k8s
Andreas Grabner
 
Jenkins Online Meetup - Automated SLI based Build Validation with Keptn
Jenkins Online Meetup - Automated SLI based Build Validation with Keptn
Andreas Grabner
 
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Andreas Grabner
 
Keptn - Automated Operations & Continuous Delivery for k8s
Keptn - Automated Operations & Continuous Delivery for k8s
Andreas Grabner
 
Continuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptn
Andreas Grabner
 
A Guide to Event-Driven SRE-inspired DevOps
A Guide to Event-Driven SRE-inspired DevOps
Andreas Grabner
 
AWS Summit - Trends in Advanced Monitoring for AWS environments
AWS Summit - Trends in Advanced Monitoring for AWS environments
Andreas Grabner
 
DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys
DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys
Andreas Grabner
 
DevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback Loops
Andreas Grabner
 
How to explain DevOps to your mom
How to explain DevOps to your mom
Andreas Grabner
 
DevOps Transformation at Dynatrace and with Dynatrace
DevOps Transformation at Dynatrace and with Dynatrace
Andreas Grabner
 
Metrics-driven Continuous Delivery
Metrics-driven Continuous Delivery
Andrew Phillips
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Mike Villiger
 
Continuous Delivery in the Cloud with Bitbucket Pipelines
Continuous Delivery in the Cloud with Bitbucket Pipelines
Atlassian
 
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Andreas Grabner
 
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Andreas Grabner
 
DevOps for AI Apps
DevOps for AI Apps
Richin Jain
 
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Andreas Grabner
 
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Atlassian
 
How to build your own auto-remediation workflow - Ansible Meetup Munich
How to build your own auto-remediation workflow - Ansible Meetup Munich
Jürgen Etzlstorfer
 
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
Jürgen Etzlstorfer
 
Web and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the News
Andreas Grabner
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
cornelia davis
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
Architectural Commandments for Building & Running Microservices at Scale
Architectural Commandments for Building & Running Microservices at Scale
Brian Wilson
 
Adopting the Cloud
Adopting the Cloud
Tapio Rautonen
 
Architecting for Failures in micro services: patterns and lessons learned
Architecting for Failures in micro services: patterns and lessons learned
Bhakti Mehta
 
Cloud application architecture with Microsoft Azure
Cloud application architecture with Microsoft Azure
Guillermo Zepeda Selman
 

More Related Content

What's hot (16)

AWS Summit - Trends in Advanced Monitoring for AWS environments
AWS Summit - Trends in Advanced Monitoring for AWS environments
Andreas Grabner
 
DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys
DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys
Andreas Grabner
 
DevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback Loops
Andreas Grabner
 
How to explain DevOps to your mom
How to explain DevOps to your mom
Andreas Grabner
 
DevOps Transformation at Dynatrace and with Dynatrace
DevOps Transformation at Dynatrace and with Dynatrace
Andreas Grabner
 
Metrics-driven Continuous Delivery
Metrics-driven Continuous Delivery
Andrew Phillips
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Mike Villiger
 
Continuous Delivery in the Cloud with Bitbucket Pipelines
Continuous Delivery in the Cloud with Bitbucket Pipelines
Atlassian
 
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Andreas Grabner
 
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Andreas Grabner
 
DevOps for AI Apps
DevOps for AI Apps
Richin Jain
 
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Andreas Grabner
 
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Atlassian
 
How to build your own auto-remediation workflow - Ansible Meetup Munich
How to build your own auto-remediation workflow - Ansible Meetup Munich
Jürgen Etzlstorfer
 
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
Jürgen Etzlstorfer
 
Web and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the News
Andreas Grabner
 
AWS Summit - Trends in Advanced Monitoring for AWS environments
AWS Summit - Trends in Advanced Monitoring for AWS environments
Andreas Grabner
 
DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys
DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys
Andreas Grabner
 
DevOps Pipelines and Metrics Driven Feedback Loops
DevOps Pipelines and Metrics Driven Feedback Loops
Andreas Grabner
 
How to explain DevOps to your mom
How to explain DevOps to your mom
Andreas Grabner
 
DevOps Transformation at Dynatrace and with Dynatrace
DevOps Transformation at Dynatrace and with Dynatrace
Andreas Grabner
 
Metrics-driven Continuous Delivery
Metrics-driven Continuous Delivery
Andrew Phillips
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Mike Villiger
 
Continuous Delivery in the Cloud with Bitbucket Pipelines
Continuous Delivery in the Cloud with Bitbucket Pipelines
Atlassian
 
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Andreas Grabner
 
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Andreas Grabner
 
DevOps for AI Apps
DevOps for AI Apps
Richin Jain
 
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Deploy Faster Without Failing Faster - Metrics-Driven - Dynatrace User Groups...
Andreas Grabner
 
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Modern Operations at Scale within Viasat – How to Structure Teams and Build A...
Atlassian
 
How to build your own auto-remediation workflow - Ansible Meetup Munich
How to build your own auto-remediation workflow - Ansible Meetup Munich
Jürgen Etzlstorfer
 
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
Keptn: Unbreakable Continuous Delivery - Berlin CI/CD Meetup
Jürgen Etzlstorfer
 
Web and App Performance: Top Problems to avoid to keep you out of the News
Web and App Performance: Top Problems to avoid to keep you out of the News
Andreas Grabner
 

Similar to Top Performance Problems in Distributed Architectures (20)

Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
cornelia davis
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
Architectural Commandments for Building & Running Microservices at Scale
Architectural Commandments for Building & Running Microservices at Scale
Brian Wilson
 
Adopting the Cloud
Adopting the Cloud
Tapio Rautonen
 
Architecting for Failures in micro services: patterns and lessons learned
Architecting for Failures in micro services: patterns and lessons learned
Bhakti Mehta
 
Cloud application architecture with Microsoft Azure
Cloud application architecture with Microsoft Azure
Guillermo Zepeda Selman
 
Meetup Microservices Commandments
Meetup Microservices Commandments
Bill Zajac
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
C4Media
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia Davis
VMware Tanzu
 
Cloud-native Data
Cloud-native Data
cornelia davis
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Coburn Watson
 
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Fwdays
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
Scalable Microservices at Netflix. Challenges and Tools of the Trade
Scalable Microservices at Netflix. Challenges and Tools of the Trade
C4Media
 
Eight Miles High: Build Cloud-native and Cloud-aware Systems
Eight Miles High: Build Cloud-native and Cloud-aware Systems
Chris Haddad
 
Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)
Rick Hightower
 
Cloud Design Patterns
Cloud Design Patterns
Carlos Mendible
 
Cloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptx
Michel Burger
 
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Bhakti Mehta
 
Cloud computing
Cloud computing
Aaron Tushabe
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
cornelia davis
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
Tapio Rautonen
 
Architectural Commandments for Building & Running Microservices at Scale
Architectural Commandments for Building & Running Microservices at Scale
Brian Wilson
 
Architecting for Failures in micro services: patterns and lessons learned
Architecting for Failures in micro services: patterns and lessons learned
Bhakti Mehta
 
Cloud application architecture with Microsoft Azure
Cloud application architecture with Microsoft Azure
Guillermo Zepeda Selman
 
Meetup Microservices Commandments
Meetup Microservices Commandments
Bill Zajac
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
C4Media
 
Cloud-Native-Data with Cornelia Davis
Cloud-Native-Data with Cornelia Davis
VMware Tanzu
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Coburn Watson
 
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Fwdays
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
Ed Hunter
 
Scalable Microservices at Netflix. Challenges and Tools of the Trade
Scalable Microservices at Netflix. Challenges and Tools of the Trade
C4Media
 
Eight Miles High: Build Cloud-native and Cloud-aware Systems
Eight Miles High: Build Cloud-native and Cloud-aware Systems
Chris Haddad
 
Service Mesh CTO Forum (Draft 3)
Service Mesh CTO Forum (Draft 3)
Rick Hightower
 
Cloud to hybrid edge cloud evolution Jun112020.pptx
Cloud to hybrid edge cloud evolution Jun112020.pptx
Michel Burger
 
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Bhakti Mehta
 
Ad

More from Andreas Grabner (14)

KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
Andreas Grabner
 
OpenTelemetry For GitOps: Tracing Deployments from Git Commit to Production
OpenTelemetry For GitOps: Tracing Deployments from Git Commit to Production
Andreas Grabner
 
Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps Deployments
Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps Deployments
Andreas Grabner
 
Observability and Orchestration of your GitOps Deployments with Keptn
Observability and Orchestration of your GitOps Deployments with Keptn
Andreas Grabner
 
Adding Security to your SLO-based Release Validation with Keptn
Adding Security to your SLO-based Release Validation with Keptn
Andreas Grabner
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
Andreas Grabner
 
OOP 2016 - Building Software That Eats The World
OOP 2016 - Building Software That Eats The World
Andreas Grabner
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Andreas Grabner
 
JavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep Dive
Andreas Grabner
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Andreas Grabner
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
Andreas Grabner
 
Mobile User Experience: Auto Drive through Performance Metrics
Mobile User Experience: Auto Drive through Performance Metrics
Andreas Grabner
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy Checks
Andreas Grabner
 
KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
Andreas Grabner
 
OpenTelemetry For GitOps: Tracing Deployments from Git Commit to Production
OpenTelemetry For GitOps: Tracing Deployments from Git Commit to Production
Andreas Grabner
 
Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps Deployments
Don't Deploy Into the Dark: DORA Metrics for your K8s GitOps Deployments
Andreas Grabner
 
Observability and Orchestration of your GitOps Deployments with Keptn
Observability and Orchestration of your GitOps Deployments with Keptn
Andreas Grabner
 
Adding Security to your SLO-based Release Validation with Keptn
Adding Security to your SLO-based Release Validation with Keptn
Andreas Grabner
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
Four Practices to Fix Your Top .NET Performance Problems
Four Practices to Fix Your Top .NET Performance Problems
Andreas Grabner
 
OOP 2016 - Building Software That Eats The World
OOP 2016 - Building Software That Eats The World
Andreas Grabner
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Andreas Grabner
 
JavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep Dive
Andreas Grabner
 
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Application Quality Gates in Continuous Delivery: Deliver Better Software Fas...
Andreas Grabner
 
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
BTD2015 - Your Place In DevTOps is Finding Solutions - Not Just Bugs!
Andreas Grabner
 
Mobile User Experience: Auto Drive through Performance Metrics
Mobile User Experience: Auto Drive through Performance Metrics
Andreas Grabner
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy Checks
Andreas Grabner
 
Ad

Recently uploaded (20)

University Campus Navigation for All - Peak of Data & AI
University Campus Navigation for All - Peak of Data & AI
Safe Software
 
Automated Testing and Safety Analysis of Deep Neural Networks
Automated Testing and Safety Analysis of Deep Neural Networks
Lionel Briand
 
Decipher SEO Solutions for your startup needs.
Decipher SEO Solutions for your startup needs.
mathai2
 
declaration of Variables and constants.pptx
declaration of Variables and constants.pptx
meemee7378
 
Best MLM Compensation Plans for Network Marketing Success in 2025
Best MLM Compensation Plans for Network Marketing Success in 2025
LETSCMS Pvt. Ltd.
 
NEW-IDM Crack with Internet Download Manager 6.42 Build 27 VERSION
NEW-IDM Crack with Internet Download Manager 6.42 Build 27 VERSION
grete1122g
 
Building Geospatial Data Warehouse for GIS by GIS with FME
Building Geospatial Data Warehouse for GIS by GIS with FME
Safe Software
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
A Guide to Telemedicine Software Development.pdf
A Guide to Telemedicine Software Development.pdf
Olivero Bozzelli
 
How Automation in Claims Handling Streamlined Operations
How Automation in Claims Handling Streamlined Operations
Insurance Tech Services
 
arctitecture application system design os dsa
arctitecture application system design os dsa
za241967
 
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
IFI Techsolutions
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
BradBedford3
 
Folding Cheat Sheet # 9 - List Unfolding 𝑢𝑛𝑓𝑜𝑙𝑑 as the Computational Dual of ...
Folding Cheat Sheet # 9 - List Unfolding 𝑢𝑛𝑓𝑜𝑙𝑑 as the Computational Dual of ...
Philip Schwarz
 
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
Shane Coughlan
 
Digital Transformation: Automating the Placement of Medical Interns
Digital Transformation: Automating the Placement of Medical Interns
Safe Software
 
Y - Recursion The Hard Way GopherCon EU 2025
Y - Recursion The Hard Way GopherCon EU 2025
Eleanor McHugh
 
Download Adobe Illustrator Crack free for Windows 2025?
Download Adobe Illustrator Crack free for Windows 2025?
grete1122g
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
University Campus Navigation for All - Peak of Data & AI
University Campus Navigation for All - Peak of Data & AI
Safe Software
 
Automated Testing and Safety Analysis of Deep Neural Networks
Automated Testing and Safety Analysis of Deep Neural Networks
Lionel Briand
 
Decipher SEO Solutions for your startup needs.
Decipher SEO Solutions for your startup needs.
mathai2
 
declaration of Variables and constants.pptx
declaration of Variables and constants.pptx
meemee7378
 
Best MLM Compensation Plans for Network Marketing Success in 2025
Best MLM Compensation Plans for Network Marketing Success in 2025
LETSCMS Pvt. Ltd.
 
NEW-IDM Crack with Internet Download Manager 6.42 Build 27 VERSION
NEW-IDM Crack with Internet Download Manager 6.42 Build 27 VERSION
grete1122g
 
Building Geospatial Data Warehouse for GIS by GIS with FME
Building Geospatial Data Warehouse for GIS by GIS with FME
Safe Software
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
A Guide to Telemedicine Software Development.pdf
A Guide to Telemedicine Software Development.pdf
Olivero Bozzelli
 
How Automation in Claims Handling Streamlined Operations
How Automation in Claims Handling Streamlined Operations
Insurance Tech Services
 
arctitecture application system design os dsa
arctitecture application system design os dsa
za241967
 
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
Enable Your Cloud Journey With Microsoft Trusted Partner | IFI Tech
IFI Techsolutions
 
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
ERP Systems in the UAE: Driving Business Transformation with Smart Solutions
dheeodoo
 
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
Foundations of Marketo Engage - Programs, Campaigns & Beyond - June 2025
BradBedford3
 
Folding Cheat Sheet # 9 - List Unfolding 𝑢𝑛𝑓𝑜𝑙𝑑 as the Computational Dual of ...
Folding Cheat Sheet # 9 - List Unfolding 𝑢𝑛𝑓𝑜𝑙𝑑 as the Computational Dual of ...
Philip Schwarz
 
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
OpenChain Webinar - AboutCode - Practical Compliance in One Stack – Licensing...
Shane Coughlan
 
Digital Transformation: Automating the Placement of Medical Interns
Digital Transformation: Automating the Placement of Medical Interns
Safe Software
 
Y - Recursion The Hard Way GopherCon EU 2025
Y - Recursion The Hard Way GopherCon EU 2025
Eleanor McHugh
 
Download Adobe Illustrator Crack free for Windows 2025?
Download Adobe Illustrator Crack free for Windows 2025?
grete1122g
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 

Top Performance Problems in Distributed Architectures

  • 1. Top Performance Challenges in Distributed Architectures
  • 2. 2 907Calls 41sec 97threads 104Calls 21sec 92threads AWS CloudWatch API Single Fetch Bulk Fetch $$$$ $$ Example #1: Building Monitoring for AWS $0.01 / 1000 Calls
  • 3. 3 Click to edit Master title style Lesson Learned: When moving to a more distributed architecture … https://www.dynatrace.com/news/blog/monitoring-aws-fargate-with-dynatrace-testing-it-in-the-field/
  • 4. 4 Click to edit Master title style …you also grow your dependencies … https://www.dynatrace.com/news/blog/enterprise-cloud-ecs-microservices-and-dynatrace-at-neiman-marcus/
  • 5. 5 Click to edit Master title style … and the potential impact of a failure grows! https://www.dynatrace.com/news/blog/enterprise-cloud-ecs-microservices-and-dynatrace-at-neiman-marcus/ 1 Bad Update 4 Impacted Services Because of all dependencies
  • 6. 6 Common Anti-Patterns we have seen 1. N+1 call 2. N+1 query 3. Payload flood 4. Granularity 5. Tight Coupling 6. Inefficient Service Flow 7. Timeouts, Retries, Backoff 8. Dependencies
  • 7. 7 N + 1 Call Pattern
  • 8. 8 N+1 Call Pattern Monolithic Code public double getQuote(String type) { double quote=0; for (Product product: products) { quote += product.getValue(); } return quote; } “Works” well within a single process
  • 9. 9 N+1 Call Pattern Product Service Quote Service 1 call to Quote Service = 44 calls to Product Service 1 14 17 13
  • 11. 11 X Y Z Z 1 74 24 22 22 24 24 24 24 1 1 1 1 Subtotal: 243
  • 12. 12 N+1 Query Pattern 1 call to Quote Service = 87 calls to DB Quote Service 1 87 Product Service Product DB
  • 13. 13 Cascading N+1 Query Pattern 26k Database Calls 809 3956 4347 4773 3789 3915 4999
  • 15. 15 Payload Flood: Doc Creation Sequential across services
  • 16. 16 Payload Flood in Numbers: Full DOC sent between Services 18MB 20MB 21MB
  • 17. 17 Refactor: Only send relevant data to specialized services 69MB 31.6MB vs
  • 19. 19 Granularity: Encryption carved out into separate service Doc Processor Doc Transformer Doc Signer Doc Encryption Doc Shipment Documents 316 1 1 2 6 6 6 118
  • 21. 21 Tightly coupled! Shall we really distribute/extract? When “Breaking the Monolith” be aware … 1:1 https://www.dynatrace.com/news/blog/breaking-up-the-monolith-while-migrating-to-the-cloud-in-6-steps/
  • 22. 22 Inefficient Service Flow drawing parallels to Web Performance Optimization
  • 23. 23 SFPO (Service Flow&Performance Optimization) has to teach us how to optimize (micro)service dependencies through Service Flows
  • 24. 24 Especially useful to identify: inefficient 3rd party services, recursive call chains, N+1 Query Patterns, loading too much data, no data caching, … -> sounds very familiar to WPO
  • 25. 25 Classical cascading effect of recursive service calls!
  • 26. 26 Timeouts, Retries & Backoff Credits go to Adrian Hornsby (@adhorn)
  • 27. 27 Bad Timeout & Retry Settings From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=31 App DB Conn Pool INSERT Timeout client side = 10s Timeout backend = default (e.g: 60s) INSERT INSERT Retry Retry Retry User 1 ERROR: Failed to get connection from pool
  • 28. 28 Backoff between Retries From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=33 App DB Conn Poool INSERT Timeout client side = 10s Timeout backend = 10s – time elapsed Wait 2s before Retry User 1 Wait 4s before Retry Wait 8s before Retry Wait 16s before Retry Backoff
  • 29. 29 Simple Exponential Backoff is not enough: Add Jitter No jitter With jitter From Adrian Hornsby (@adhorn): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns-3742b5ba-e013-4f50-8512-00a65775f478?slide=34
  • 31. 31 Look beyond the “Tip of the Iceberg”: Understanding Dependencies is critical!
  • 32. 32 Who is depending on me? What is the risk of change?
  • 33. 33 Recap - Common Anti-Patterns + Metrics to look at 1. N+1 call: # same Service Invocations per Request 2. N+1 query: # same SQL Invocations per Request 3. Payload flood: Transfer Size! 4. Granularity: # of Service Invocations across End-2-End Transaction 5. Tight Coupling: Ratio between Service Invocations 6. Inefficient Service Flow: # of Involved Services, # of Calls to each Service 7. Timeouts, Retries, Backoff: Pool Utilization, … 8. Dependencies: # of Incoming & Outcoming Dependencies
  • 35. 35 Build #17 Build #21 Identify / Optimize Architectural Patterns Recursive Calls, N+1 Call Pattern, Chatty Interfaces, No Caching Layer … Automate Architectural Checks into CI/CD/CO!
  • 36. 36 Automate Performance Checks into CI/CD/CO! How is Performance & Resource Consumption per Service Endpoint? Build #17 Build #21
  • 37. 37 From Google: “Everything as Code” e.g: Enforce Architectural Rules
  • 38. 38 From Dynatrace: „Performance Signature as Code“ evaluated through Jenkins “Performance Signature” for Build Nov 16 “Performance Signature” for Build Nov 17 “Performance Signature” for every Build “Multiple Metrics” compared to prev Timeframe Simple Regression Detection per Metric https://www.neotys.com/performance-advisory-council/thomas_steinmaurer
  • 39. 39 Pitometer (part of @keptnProject): Metrics-based grading of a Deployment! Metric Source & Query Grading Details & Metric Score Pitometer Specfile Total Scoring Objectives 2GB Allocated Bytes (from Prometheus) > 2GB: 0 Points < 2GB: 20 Points 5% 2% < 2%: 0 Points < 5%: 10 Points > 5%: 20 Points Conversion Rate (Dynatrace) GraderSource If value: 3GB Score: 0 If value: 3.9% Score: 10 Total Score: 10
  • 40. 40 Pitometer: Run Standalone - https://github.com/keptn/pitometer Init Source Source Grader Run Result
  • 41. 41 Pitometer in keptn Autonomous Cloud Control Plane prodstage 1: push 2: deploy (shadow) 3: test (performance) 4: evaluate (scalability KPIs) 6: deploy (blue/green) 7: evaluate (business KPIs) 8: operate (NoOps) 5: promote Dev
  • 42. 42 Resources • Keptn & Pitometer • www.keptn.sh • github.com/keptn • github.com/keptn/pitometer • Performance, Resiliency & Availablity Content • Adrian Hornsby (AWS): https://speakerdeck.com/adhorn/resiliency-and-availability-design-patterns- 3742b5ba-e013-4f50-8512-00a65775f478 • Acacio Cruz (Google): https://www.spreaker.com/user/pureperformance/066-load-shedding-sre-at- google-with-aca • Thomas Steinmaurer (Dynatrace): https://www.neotys.com/performance-advisory- council/thomas_steinmaurer
  • 43. Top Performance Challenges in Distributed Architectures

Editor's Notes

  • #2: https://keptn.sh/ https://github.com/keptn/keptn https://github.com/keptn/pitometer https://twitter.com/grabnerandi https://twitter.com/keptnproject
  • #3: Lets start with some examples …