SlideShare a Scribd company logo
Cloud Architecture &
Distributed Systems
Trivia
Dr. Michael Menzel
AQA Session @ Dev Team Meeting
Agenda
1. Distribute & Scale
2. Stabilize & Prevent Failure
3. Deployment
4. Failure in Production
5. Scaling the Persistence Layer
Cloud Architecture & Distributed Systems Trivia
Distribute & Scale
“Distribution and Elasticity are king.”
Load Balancers
• Assume balancing over heterogeneous hardware
• Shared hardware with virtualization
• Different load on machines (long requests)
• Vertical scaling
• Don’t keep state! As much stateless as possible
• Incorporate health checks and feedback channels
• Allow “Lame Ducks” (= healthy but busy)
• Reserve time to boot (commission/decommission)
Health Checks & Monitoring
• Web services typically offer /health or /ping
• Test inwards to give more precise health score (lame duck)
• Don’t make health check too expensive to avoid extra load
• Use monitoring a lot to detect trends and history
• Monitor basics: CPU, Mem, etc.
• Add application-level monitoring
(queued requests, etc.)
Auto Scaling
• Start with capacity planning to skip initial scaling delay
• Benchmark to find scarce resource of your application
• Monitor ftw & apply rules
• Custom metrics are better than generic
• Test behavior to learn about metrics
• Predict resource requirements (future)
Auto Scaling ctd.
• For best elasticity prepare your VM/docker images to boot quickly
• Test and measure your elasticity!!!
• Stress testing: bursts, volatility
• Performance testing: grow, shrink
• Chaos testing
• Test with “Huge Scales”
Stabilize & Prevent Failure
“Expect failures at all loads. Prevent failures before one cascades.”
Degrade Performance
• Introduce grades for important users (if possible)
• Know whose request is processed
• Process only important users on peak loads
Request Time Thresholds
• Long lasting requests are expensive, example:
“30 sec threshold, 1000 QPS with full load
5% of requests take ≥ 30 sec, after 20 sec (latest) you are blocked”
• Define thresholds and propagate sub-thresholds
Example
Future.firstCompletedOf(Seq(
Promise.timeout(InternalServerError("Oops"), 30 second),
Webservice.call(“/fibonacci/next”, 10 second).map(Ok)
))
Web
Service A
Web
Service B
Web
Service C
Request Time Thresholds!
Anti-Overload: Circuit Breakers & Back-off!
• Back off when web service endpoint does not respond (in time)
• Exponential is famous, but not best!
• Jitter back off strategy is better!!!1)
• Use circuit breakers (e.g. https://github.com/Netflix/Hystrix)
1) Source: https://www.awsarchitectureblog.com/2015/03/backoff.html
sleep = random_between(0, min(cap, base * 2 ** attempt))
sleep = min(cap, base * 2 ** attempt)
Random Jitter Back Off
Source: https://www.awsarchitectureblog.com/2015/03/backoff.html
Deployment
“Prevent toil and remain stable!”
Package Deployments
• Prepare a full VM/docker image (if possible)
• VMs bring operating system and only need virtualization stack
• Dockers need docker environment but boot quicker
• Keep old versions for rollbacks and tests/comparisons
• If you don’t package:
• Ensure you deploy into a reset environment (mem usage, temp files, etc.)
• Ensure you use a bundling with all dependencies (Java? Node?)
• Coordinate thoroughly to not interfere with other deployments
Maintain multiple environments
• “The more the merrier”, but costly – find your trade-off!
• Allow many testing environments for different types of tests
• Stress & performance tests
• Integration & regression tests
• Chaos testing & Demos
• Automate the creation of new environments
Canary Deployments
• Canary allow you to monitor new software versions
• Keep track of which servers have which version
• In monitoring
• In logging
• Activate extra logging and notifications for the canaries
Load Balancers during Deployment
• Two strategies
1. Same load balancer: add new instances to existing load balancer
2. Extra load balancer: add whole new load balancer and move over eventually
• Same load balancer tips
• Add instance when ready for health checks
• Tag new instances to differentiate versions
• Extra load balancer tips
• Make sure all settings are identical (infrastructure as code!)
• First run both load balancers in parallel, then switch (use DNS or other LB)
Failure in production
“Goal is to make your pager obsolete.”
Anything can happen!
Countermeasures for Failures
• Install a immediate response channel (pager, SMS)
• Stop the bleeding first! – Symptoms before cause
• Avoid looking for the cause, but prevent further failures
• Shut down parts of the system if necessary
• Declare a coordinator
Document Failures & Solutions
• Document every step and progress of failure resolutions
• Define protocol templates to reduce overhead
• Analyze and replay old protocols
• Write regression tests with your solution
• Tests make sure old bugs sneak back in
• You documented the symptoms of the bug in code
Scaling the Persistence Layer
“Just hard. ‘Nough said.”
CDNs: grab the low hanging fruits
• CDNs are cheap web serving helpers
• Take load from web servers
• Are quick due to in-mem caching of static content
• Edge location with shorter round-trip = best latency
• Digesting with MD5 hash
8425b886b9a2184c48b34212dfaf103b-index.html
6269a326c6a2184d32b39881baac720c-main.js
ReCAP: CAP Theorem?
• Out of C, A, and P only two can be kept.
Pick your storage systems
• Narrow down by purpose, data
structure & features
• ACID vs. BASE
• Basically Available
• Soft state
• Eventually consistent
Complex Queries &
Structured
• Key-Value & BigTable
• SQL
Simple Queries &
Unstructured
• Blob
• Document
Examples of NoSQL usage
Use multiple stores and even redundant data (if necessary for A)
• Simple JSON-based web service: Document store
• Requests to /profile/{id} loads document “profile-{id}”
• Changes are simple and only per document
• Complex, but predictable queries: BigTable store
• Avoid scans!!!
• Create 1 table per query, don’t fear redundant data
• Video and Image service: Blob store (+ CDN)
Database goes global?
• Writing state is hard to distribute globally (c.f. Google Spanner)
• Inconsistencies! (A over C)
• http://research.google.com/archive/spanner.html
• Use distributed replicas & caches for read(?)
• Local caches can drift (remember load balancing!)
• Memcached clusters can help per data center
• Expect eventual consistency with outdated reads
• Sharding & Partitioning (in a global cluster)
• Divide data horizontally on application layer (primary keys)
• Partition/Sharding key design is key
• Be careful with JOINs or scans across partitions/shards!
Knowing your storage system(s) is crucial
• Consistency level & consensus protocols?
Paxos, BFT, 2-phase commit, quorum, hashgraph, etc.
• Replication strategies? Backups?
Replication keys, replication factors, rack/data center-awareness
• Performance? Fault-tolerance?
Benchmark (data layouts, configurations), elasticity, chaos/stress tests
Cloud Architecture & Distributed Systems Trivia
Ad

More Related Content

What's hot (20)

NoSQL - No Security? - The BSides Edition
NoSQL - No Security? - The BSides EditionNoSQL - No Security? - The BSides Edition
NoSQL - No Security? - The BSides Edition
Gavin Holt
 
Altitude SF 2017: Building a continuous deployment pipeline
Altitude SF 2017: Building a continuous deployment pipelineAltitude SF 2017: Building a continuous deployment pipeline
Altitude SF 2017: Building a continuous deployment pipeline
Fastly
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
BIOVIA
 
NoSQL - No Security?
NoSQL - No Security?NoSQL - No Security?
NoSQL - No Security?
Gavin Holt
 
WebLogic Scripting Tool Overview
WebLogic Scripting Tool OverviewWebLogic Scripting Tool Overview
WebLogic Scripting Tool Overview
James Bayer
 
JCache Using JCache
JCache Using JCacheJCache Using JCache
JCache Using JCache
日本Javaユーザーグループ
 
NoSQL, no security?
NoSQL, no security?NoSQL, no security?
NoSQL, no security?
wurbanski
 
Altitude SF 2017: The power of the network
Altitude SF 2017: The power of the networkAltitude SF 2017: The power of the network
Altitude SF 2017: The power of the network
Fastly
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
DataStax Academy
 
Galera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slidesGalera Cluster 4 for MySQL 8 Release Webinar slides
Galera Cluster 4 for MySQL 8 Release Webinar slides
Codership Oy - Creators of Galera Cluster
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
Mydbops
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17
Gerald Muecke
 
Taking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master ClusterTaking Full Advantage of Galera Multi Master Cluster
Taking Full Advantage of Galera Multi Master Cluster
Codership Oy - Creators of Galera Cluster
 
Next Generation DevOps in Drupal: DrupalCamp London 2014
Next Generation DevOps in Drupal: DrupalCamp London 2014Next Generation DevOps in Drupal: DrupalCamp London 2014
Next Generation DevOps in Drupal: DrupalCamp London 2014
Barney Hanlon
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016
Derek Downey
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Haribabu Nandyal Padmanaban
 
Proxysql use case scenarios plam 2016
Proxysql use case scenarios    plam 2016Proxysql use case scenarios    plam 2016
Proxysql use case scenarios plam 2016
Alkin Tezuysal
 
Building a better web
Building a better webBuilding a better web
Building a better web
Fastly
 
Proxysql ha plam_2016_2_keynote
Proxysql ha plam_2016_2_keynoteProxysql ha plam_2016_2_keynote
Proxysql ha plam_2016_2_keynote
Marco Tusa
 
NoSQL - No Security? - The BSides Edition
NoSQL - No Security? - The BSides EditionNoSQL - No Security? - The BSides Edition
NoSQL - No Security? - The BSides Edition
Gavin Holt
 
Altitude SF 2017: Building a continuous deployment pipeline
Altitude SF 2017: Building a continuous deployment pipelineAltitude SF 2017: Building a continuous deployment pipeline
Altitude SF 2017: Building a continuous deployment pipeline
Fastly
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
BIOVIA
 
NoSQL - No Security?
NoSQL - No Security?NoSQL - No Security?
NoSQL - No Security?
Gavin Holt
 
WebLogic Scripting Tool Overview
WebLogic Scripting Tool OverviewWebLogic Scripting Tool Overview
WebLogic Scripting Tool Overview
James Bayer
 
NoSQL, no security?
NoSQL, no security?NoSQL, no security?
NoSQL, no security?
wurbanski
 
Altitude SF 2017: The power of the network
Altitude SF 2017: The power of the networkAltitude SF 2017: The power of the network
Altitude SF 2017: The power of the network
Fastly
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
DataStax Academy
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
Mydbops
 
Gatling workshop lets test17
Gatling workshop lets test17Gatling workshop lets test17
Gatling workshop lets test17
Gerald Muecke
 
Next Generation DevOps in Drupal: DrupalCamp London 2014
Next Generation DevOps in Drupal: DrupalCamp London 2014Next Generation DevOps in Drupal: DrupalCamp London 2014
Next Generation DevOps in Drupal: DrupalCamp London 2014
Barney Hanlon
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa
 
ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016ProxySQL Tutorial - PLAM 2016
ProxySQL Tutorial - PLAM 2016
Derek Downey
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Haribabu Nandyal Padmanaban
 
Proxysql use case scenarios plam 2016
Proxysql use case scenarios    plam 2016Proxysql use case scenarios    plam 2016
Proxysql use case scenarios plam 2016
Alkin Tezuysal
 
Building a better web
Building a better webBuilding a better web
Building a better web
Fastly
 
Proxysql ha plam_2016_2_keynote
Proxysql ha plam_2016_2_keynoteProxysql ha plam_2016_2_keynote
Proxysql ha plam_2016_2_keynote
Marco Tusa
 

Viewers also liked (7)

Value of Non-Functional Qualitites of Cloud Storage
Value of Non-Functional Qualitites of Cloud StorageValue of Non-Functional Qualitites of Cloud Storage
Value of Non-Functional Qualitites of Cloud Storage
Dr.-Ing. Michael Menzel
 
IC2E A Configuration Crawler for Cloud Appliances
IC2E A Configuration Crawler for Cloud AppliancesIC2E A Configuration Crawler for Cloud Appliances
IC2E A Configuration Crawler for Cloud Appliances
Dr.-Ing. Michael Menzel
 
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
Dr.-Ing. Michael Menzel
 
WWW Conference 2012 - Web-Engineering - Cloudgenius
WWW Conference 2012 - Web-Engineering - CloudgeniusWWW Conference 2012 - Web-Engineering - Cloudgenius
WWW Conference 2012 - Web-Engineering - Cloudgenius
Dr.-Ing. Michael Menzel
 
Cloud Migration: Moving to the Cloud
Cloud Migration: Moving to the CloudCloud Migration: Moving to the Cloud
Cloud Migration: Moving to the Cloud
Dr.-Ing. Michael Menzel
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
Luminary Labs
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
LinkedIn
 
Value of Non-Functional Qualitites of Cloud Storage
Value of Non-Functional Qualitites of Cloud StorageValue of Non-Functional Qualitites of Cloud Storage
Value of Non-Functional Qualitites of Cloud Storage
Dr.-Ing. Michael Menzel
 
IC2E A Configuration Crawler for Cloud Appliances
IC2E A Configuration Crawler for Cloud AppliancesIC2E A Configuration Crawler for Cloud Appliances
IC2E A Configuration Crawler for Cloud Appliances
Dr.-Ing. Michael Menzel
 
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
(MC²)²: A Generic Decision-Making Framework and its Application to Cloud Comp...
Dr.-Ing. Michael Menzel
 
WWW Conference 2012 - Web-Engineering - Cloudgenius
WWW Conference 2012 - Web-Engineering - CloudgeniusWWW Conference 2012 - Web-Engineering - Cloudgenius
WWW Conference 2012 - Web-Engineering - Cloudgenius
Dr.-Ing. Michael Menzel
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
Luminary Labs
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
LinkedIn
 
Ad

Similar to Cloud Architecture & Distributed Systems Trivia (20)

Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
Bryan Bende
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackResilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
C4Media
 
Performance tuning Grails applications
Performance tuning Grails applicationsPerformance tuning Grails applications
Performance tuning Grails applications
Lari Hotari
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
Abhijit Kumar
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
Malin Weiss
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Tibo Beijen
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?
Testplant
 
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Gal Marder
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Road to database automation - Database source control
Road to database automation - Database source controlRoad to database automation - Database source control
Road to database automation - Database source control
Eduardo Piairo
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
Qingsong Yao
 
Azure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solutionAzure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solution
Gelis Wu
 
Ioug oow12 em12c
Ioug oow12 em12cIoug oow12 em12c
Ioug oow12 em12c
Kellyn Pot'Vin-Gorman
 
EM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance Pages
Enkitec
 
End-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL ServerEnd-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL Server
Kevin Kline
 
Road to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.comRoad to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.com
Aviran Mordo
 
Building large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor frameworkBuilding large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor framework
Vignesh Sukumar
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
GR8Conf
 
Postgresql in Education
Postgresql in EducationPostgresql in Education
Postgresql in Education
dostatni
 
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshootingTarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Jovan Popovic
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
Bryan Bende
 
Resilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes BackResilience Planning & How the Empire Strikes Back
Resilience Planning & How the Empire Strikes Back
C4Media
 
Performance tuning Grails applications
Performance tuning Grails applicationsPerformance tuning Grails applications
Performance tuning Grails applications
Lari Hotari
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
Abhijit Kumar
 
Work with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMsWork with hundred of hot terabytes in JVMs
Work with hundred of hot terabytes in JVMs
Malin Weiss
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Tibo Beijen
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?
Testplant
 
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Implementing Micro Services Tasks (service discovery, load balancing etc.) - ...
Gal Marder
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
Anubhav Kale
 
Road to database automation - Database source control
Road to database automation - Database source controlRoad to database automation - Database source control
Road to database automation - Database source control
Eduardo Piairo
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
Qingsong Yao
 
Azure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solutionAzure PaaS (WebApp & SQL Database) workshop solution
Azure PaaS (WebApp & SQL Database) workshop solution
Gelis Wu
 
EM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance PagesEM12c Monitoring, Metric Extensions and Performance Pages
EM12c Monitoring, Metric Extensions and Performance Pages
Enkitec
 
End-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL ServerEnd-to-end Troubleshooting Checklist for Microsoft SQL Server
End-to-end Troubleshooting Checklist for Microsoft SQL Server
Kevin Kline
 
Road to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.comRoad to Continuous Delivery - Wix.com
Road to Continuous Delivery - Wix.com
Aviran Mordo
 
Building large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor frameworkBuilding large scale, job processing systems with Scala Akka Actor framework
Building large scale, job processing systems with Scala Akka Actor framework
Vignesh Sukumar
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
GR8Conf
 
Postgresql in Education
Postgresql in EducationPostgresql in Education
Postgresql in Education
dostatni
 
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshootingTarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Tarabica 2019 (Belgrade, Serbia) - SQL Server performance troubleshooting
Jovan Popovic
 
Ad

Recently uploaded (20)

FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
Lionel Briand
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
DVDFab Crack FREE Download Latest Version 2025
DVDFab Crack FREE Download Latest Version 2025DVDFab Crack FREE Download Latest Version 2025
DVDFab Crack FREE Download Latest Version 2025
younisnoman75
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Tools of the Trade: Linux and SQL - Google Certificate
Tools of the Trade: Linux and SQL - Google CertificateTools of the Trade: Linux and SQL - Google Certificate
Tools of the Trade: Linux and SQL - Google Certificate
VICTOR MAESTRE RAMIREZ
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Microsoft Excel Core Points Training.pptx
Microsoft Excel Core Points Training.pptxMicrosoft Excel Core Points Training.pptx
Microsoft Excel Core Points Training.pptx
Mekonnen
 
Foundation Models for Time Series : A Survey
Foundation Models for Time Series : A SurveyFoundation Models for Time Series : A Survey
Foundation Models for Time Series : A Survey
jayanthkalyanam1
 
🌱 Green Grafana 🌱 Essentials_ Data, Visualizations and Plugins.pdf
🌱 Green Grafana 🌱 Essentials_ Data, Visualizations and Plugins.pdf🌱 Green Grafana 🌱 Essentials_ Data, Visualizations and Plugins.pdf
🌱 Green Grafana 🌱 Essentials_ Data, Visualizations and Plugins.pdf
Imma Valls Bernaus
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categorie...
Lionel Briand
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
DVDFab Crack FREE Download Latest Version 2025
DVDFab Crack FREE Download Latest Version 2025DVDFab Crack FREE Download Latest Version 2025
DVDFab Crack FREE Download Latest Version 2025
younisnoman75
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Tools of the Trade: Linux and SQL - Google Certificate
Tools of the Trade: Linux and SQL - Google CertificateTools of the Trade: Linux and SQL - Google Certificate
Tools of the Trade: Linux and SQL - Google Certificate
VICTOR MAESTRE RAMIREZ
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Microsoft Excel Core Points Training.pptx
Microsoft Excel Core Points Training.pptxMicrosoft Excel Core Points Training.pptx
Microsoft Excel Core Points Training.pptx
Mekonnen
 
Foundation Models for Time Series : A Survey
Foundation Models for Time Series : A SurveyFoundation Models for Time Series : A Survey
Foundation Models for Time Series : A Survey
jayanthkalyanam1
 
🌱 Green Grafana 🌱 Essentials_ Data, Visualizations and Plugins.pdf
🌱 Green Grafana 🌱 Essentials_ Data, Visualizations and Plugins.pdf🌱 Green Grafana 🌱 Essentials_ Data, Visualizations and Plugins.pdf
🌱 Green Grafana 🌱 Essentials_ Data, Visualizations and Plugins.pdf
Imma Valls Bernaus
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 

Cloud Architecture & Distributed Systems Trivia

  • 1. Cloud Architecture & Distributed Systems Trivia Dr. Michael Menzel AQA Session @ Dev Team Meeting
  • 2. Agenda 1. Distribute & Scale 2. Stabilize & Prevent Failure 3. Deployment 4. Failure in Production 5. Scaling the Persistence Layer
  • 4. Distribute & Scale “Distribution and Elasticity are king.”
  • 5. Load Balancers • Assume balancing over heterogeneous hardware • Shared hardware with virtualization • Different load on machines (long requests) • Vertical scaling • Don’t keep state! As much stateless as possible • Incorporate health checks and feedback channels • Allow “Lame Ducks” (= healthy but busy) • Reserve time to boot (commission/decommission)
  • 6. Health Checks & Monitoring • Web services typically offer /health or /ping • Test inwards to give more precise health score (lame duck) • Don’t make health check too expensive to avoid extra load • Use monitoring a lot to detect trends and history • Monitor basics: CPU, Mem, etc. • Add application-level monitoring (queued requests, etc.)
  • 7. Auto Scaling • Start with capacity planning to skip initial scaling delay • Benchmark to find scarce resource of your application • Monitor ftw & apply rules • Custom metrics are better than generic • Test behavior to learn about metrics • Predict resource requirements (future)
  • 8. Auto Scaling ctd. • For best elasticity prepare your VM/docker images to boot quickly • Test and measure your elasticity!!! • Stress testing: bursts, volatility • Performance testing: grow, shrink • Chaos testing • Test with “Huge Scales”
  • 9. Stabilize & Prevent Failure “Expect failures at all loads. Prevent failures before one cascades.”
  • 10. Degrade Performance • Introduce grades for important users (if possible) • Know whose request is processed • Process only important users on peak loads
  • 11. Request Time Thresholds • Long lasting requests are expensive, example: “30 sec threshold, 1000 QPS with full load 5% of requests take ≥ 30 sec, after 20 sec (latest) you are blocked” • Define thresholds and propagate sub-thresholds Example Future.firstCompletedOf(Seq( Promise.timeout(InternalServerError("Oops"), 30 second), Webservice.call(“/fibonacci/next”, 10 second).map(Ok) )) Web Service A Web Service B Web Service C
  • 13. Anti-Overload: Circuit Breakers & Back-off! • Back off when web service endpoint does not respond (in time) • Exponential is famous, but not best! • Jitter back off strategy is better!!!1) • Use circuit breakers (e.g. https://github.com/Netflix/Hystrix) 1) Source: https://www.awsarchitectureblog.com/2015/03/backoff.html sleep = random_between(0, min(cap, base * 2 ** attempt)) sleep = min(cap, base * 2 ** attempt)
  • 14. Random Jitter Back Off Source: https://www.awsarchitectureblog.com/2015/03/backoff.html
  • 15. Deployment “Prevent toil and remain stable!”
  • 16. Package Deployments • Prepare a full VM/docker image (if possible) • VMs bring operating system and only need virtualization stack • Dockers need docker environment but boot quicker • Keep old versions for rollbacks and tests/comparisons • If you don’t package: • Ensure you deploy into a reset environment (mem usage, temp files, etc.) • Ensure you use a bundling with all dependencies (Java? Node?) • Coordinate thoroughly to not interfere with other deployments
  • 17. Maintain multiple environments • “The more the merrier”, but costly – find your trade-off! • Allow many testing environments for different types of tests • Stress & performance tests • Integration & regression tests • Chaos testing & Demos • Automate the creation of new environments
  • 18. Canary Deployments • Canary allow you to monitor new software versions • Keep track of which servers have which version • In monitoring • In logging • Activate extra logging and notifications for the canaries
  • 19. Load Balancers during Deployment • Two strategies 1. Same load balancer: add new instances to existing load balancer 2. Extra load balancer: add whole new load balancer and move over eventually • Same load balancer tips • Add instance when ready for health checks • Tag new instances to differentiate versions • Extra load balancer tips • Make sure all settings are identical (infrastructure as code!) • First run both load balancers in parallel, then switch (use DNS or other LB)
  • 20. Failure in production “Goal is to make your pager obsolete.” Anything can happen!
  • 21. Countermeasures for Failures • Install a immediate response channel (pager, SMS) • Stop the bleeding first! – Symptoms before cause • Avoid looking for the cause, but prevent further failures • Shut down parts of the system if necessary • Declare a coordinator
  • 22. Document Failures & Solutions • Document every step and progress of failure resolutions • Define protocol templates to reduce overhead • Analyze and replay old protocols • Write regression tests with your solution • Tests make sure old bugs sneak back in • You documented the symptoms of the bug in code
  • 23. Scaling the Persistence Layer “Just hard. ‘Nough said.”
  • 24. CDNs: grab the low hanging fruits • CDNs are cheap web serving helpers • Take load from web servers • Are quick due to in-mem caching of static content • Edge location with shorter round-trip = best latency • Digesting with MD5 hash 8425b886b9a2184c48b34212dfaf103b-index.html 6269a326c6a2184d32b39881baac720c-main.js
  • 25. ReCAP: CAP Theorem? • Out of C, A, and P only two can be kept.
  • 26. Pick your storage systems • Narrow down by purpose, data structure & features • ACID vs. BASE • Basically Available • Soft state • Eventually consistent Complex Queries & Structured • Key-Value & BigTable • SQL Simple Queries & Unstructured • Blob • Document
  • 27. Examples of NoSQL usage Use multiple stores and even redundant data (if necessary for A) • Simple JSON-based web service: Document store • Requests to /profile/{id} loads document “profile-{id}” • Changes are simple and only per document • Complex, but predictable queries: BigTable store • Avoid scans!!! • Create 1 table per query, don’t fear redundant data • Video and Image service: Blob store (+ CDN)
  • 28. Database goes global? • Writing state is hard to distribute globally (c.f. Google Spanner) • Inconsistencies! (A over C) • http://research.google.com/archive/spanner.html • Use distributed replicas & caches for read(?) • Local caches can drift (remember load balancing!) • Memcached clusters can help per data center • Expect eventual consistency with outdated reads • Sharding & Partitioning (in a global cluster) • Divide data horizontally on application layer (primary keys) • Partition/Sharding key design is key • Be careful with JOINs or scans across partitions/shards!
  • 29. Knowing your storage system(s) is crucial • Consistency level & consensus protocols? Paxos, BFT, 2-phase commit, quorum, hashgraph, etc. • Replication strategies? Backups? Replication keys, replication factors, rack/data center-awareness • Performance? Fault-tolerance? Benchmark (data layouts, configurations), elasticity, chaos/stress tests