SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Arun Gupta, @arungupta
Principal Open Source Technologist,
Amazon Web Services
Using Chaos to Bring Resiliency
to Your Applications in
Kubernetes
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Failures are a given and
everything will eventually
fail over time.
https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://www.youtube.com/watch?v=zoz0ZjfrQ9s
Amazon 2006
GameDay: Creating
Resiliency Through
Destruction
Jesse Robbins
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Monkeys
https://github.com/Netflix/SimianArmy
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Resilience
Ability of a system to adapt
to changes, failures, and disturbances
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering is the discipline of
experimenting on a distributed system in
order to build confidence in the system’s
capability to withstand turbulent
conditions in production
Credit: https://www.flickr.com/photos/loseryouthcrew/8775130600/
https://principlesofchaos.org/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Bad things will happen to your system,
no matter how well designed it is
You cannot become ignorant to it
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Break your systems on purpose
Find out their weaknesses and
fix them before they break when least expected
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos doesn’t cause problems.
It reveals them.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• Application level
• Host failure
• Resource attacks (CPU, memory, …)
• Network attacks (dependencies, latency, …)
• Region attacks!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Where do you inject Chaos?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://www.elastic.co/blog/timelion-tutorial-from-zero-to-hero
”Normal” behavior of your system
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Business metric
https://medium.com/netflix-
techblog/sps-the-pulse-of-
netflix-streaming-
ae4db0e05f8a
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• a service gives 404 or 503?
• latency increases by 300ms?
• the port is not accessible?
• security group rules changed?
• the database stops?
• excessive number of requests come?
• iptables are wiped out?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Pick hypothesis
Scope the experiment
Identify metrics
Notify the organization
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Start with very small
As close as possible to production
Minimize the blast radius.
Have an emergency STOP!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Users
Canary deployment
99%
users
1%
users
Start with...
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Time to detect?
Time for notification? And escalation?
Time to public notification?
Time for graceful degradation to kick-in?
Time for self healing to happen?
Time to recovery—partial and full?
Time to all-clear and stable?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
DON’T blame that one person…
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
PostMortems—COE (Correction of Errors)
The 5 WHYs
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Phases of chaos engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fix
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Failure free operations require
experience with failure.
http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Reconciles desired and actual state for pods
Distributes pods across AZs
Automatic health-check based restarts
Rolling deployment of a service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster with Amazon EKS
AWS managed
Customer account
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kubernetes cluster with Amazon EKS
mycluster.eks.amazonaws.com
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Kubectl
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Region and Availability Zones
Control Plane is highly available
Master and Workers are configured in ASG
Master instance type auto-scaling
Etcd is HA and backed up every hour
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos in a Kubernetes cluster
mycluster.eks.amazonaws.com
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Kubectl
x
x
Health check?
Dead node?
x
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio
Chaos Toolkit
Kube Monkey
PowerfulSeal
Gremlin
Simian Army
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio
Intelligent routing
and load balancing
Resilience across
languages and
platforms
Fleet-wide policy
enforcement
In-depth
telemetry
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Timeouts
Bounded retries with timeout budget
Concurrent connections limit and request load
Active health checks (periodic)
Passive health checks (circuit breakers)
AZ-aware load balancing with automatic failover
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
• Timing failures
• Increased network latency
• Overloaded upstream service
• Crashes
• HTTP error codes
• TCP connection failures
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fault injection using Istio—timeout
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting
spec:
hosts:
- greeting
http:
- fault:
delay:
fixedDelay: 10s
percent: 100
route:
- destination:
host: greeting
subset: greeting-hello
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Fault injection using Istio—HTTP abort
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting
spec:
hosts:
- greeting
http:
- fault:
abort:
httpStatus: 500
percent: 100
route:
- destination:
host: greeting
subset: greeting-hello
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio traffic management
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: greeting-virtual-service
spec:
hosts:
- greeting
http:
- route:
- destination:
host: greeting
subset: greeting-hello
weight: 75
- destination:
host: greeting
subset: greeting-howdy
weight: 25
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
- name: greeting-howdy
labels:
greeting: howdy
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Istio circuit breaker
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: greeting-destination-rule
spec:
host: greeting
subsets:
- name: greeting-hello
labels:
greeting: hello
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://istio.io/docs/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit
Open API for Chaos Engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
CLI-driven
Experiments declared in JSON/YAML files
Open specification
Extensible: Kubernetes, AWS, Spring, others
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit follows the principles of chaos
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
query a system to observe a behavior
• Check state of a pod with a specific label
• Multiple probes to define steady state
real-world events
• Terminate a deployment
• Multiple actions simulate events
Types of probe and method
• Process: Run a binary
• HTTP: Invoke a HTTP endpoint
• Python: Call a Python function to perform richer operations
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit metadata
{
"version": "1.0.0",
"title": "Terminating the greeting service should not impact users",
"description": "How does the greeting service unavailbility impacts our users? Do they see
an error or does the webapp gets slower?",
"tags": [
"kubernetes",
"aws"
],
"configuration": {
"web_app_url": {
"type": "env",
"key": "WEBAPP_URL"
}
},
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit steady state & hypothesis
"steady-state-hypothesis": {
"title": "Services are all available and healthy",
"probes": [
{
"type": "probe",
"name": "alive-and-healthy",
"tolerance": true,
"provider": {
"type": "python",
"module": "chaosk8s.pod.probes",
"func": "pods_in_phase",
"arguments": {
"label_selector": "app=webapp-pod",
"phase": "Running",
"ns": "default"
}
}
},
{
"type": "probe",
"name": "application-must-respond-normally",
"tolerance": 200,
"provider": {
"type": "http",
"url": "${web_app_url}",
"timeout": 3
}
}
]
},
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit experiment & verify
"method": [
{
"type": "action",
"name": "terminate-greeting-service",
"provider": {
"type": "python",
"module": "chaosk8s.pod.actions",
"func": "terminate_pods",
"arguments": {
"label_selector": "app=greeter-pod",
"ns": "default"
}
}
},
{
"type": "probe",
"name": "fetch-application-logs",
"provider": {
"type": "python",
"module": "chaosk8s.pod.probes",
"func": "read_pod_logs",
"arguments": {
"label_selector": "app=webapp-pod",
"last": "20s",
"ns": "default"
}
}
}
],
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Toolkit run
$ chaos run experiments/experiment.json
[2018-03-10 14:42:38 INFO] Validating the experiment's syntax
[2018-03-10 14:42:38 INFO] Experiment looks valid
[2018-03-10 14:42:38 INFO] Running experiment: Terminate the greeting service should not impact users
[2018-03-10 14:42:38 INFO] Steady state hypothesis: Services are all available and healthy
[2018-03-10 14:42:38 INFO] Probe: application-should-be-alive-and-healthy
[2018-03-10 14:42:38 INFO] Probe: application-must-respond-normally
[2018-03-10 14:42:39 INFO] Steady state hypothesis is met!
[2018-03-10 14:42:39 INFO] Action: terminate-greeting-service
[2018-03-10 14:42:40 INFO] Probe: fetch-application-logs
[2018-03-10 14:42:41 INFO] Steady state hypothesis: Services are all available and healthy
[2018-03-10 14:42:41 INFO] Probe: application-should-be-alive-and-healthy
[2018-03-10 14:42:42 INFO] Probe: application-must-respond-normally
[2018-03-10 14:42:45 ERROR] => failed: activity took too long to complete
[2018-03-10 14:42:45 CRITICAL] Steady state probe 'application-must-respond-normally' is not in the
given tolerance so failing this experiment
[2018-03-10 14:42:45 INFO] Let's rollback...
[2018-03-10 14:42:45 INFO] No declared rollbacks, let's move on.
[2018-03-10 14:42:45 INFO] Experiment ended with status: failed
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://github.com/chaostoolkit/chaostoolkit/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Implementation of Netflix’s Chaos Monkey for Kubernetes
Randomly deletes pods in the cluster
Applications opt-in using annotations
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Run Kube-Monkey—create configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-monkey-config-map
namespace: kube-system
data:
config.toml: |
[kubemonkey]
run_hour = 8
start_hour = 10
end_hour = 16
blacklisted_namespaces = ["kube-system"]
whitelisted_namespaces = [""]
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Kube-Monkey application opt-in
apiVersion: apps/v1
kind: Deployment
. . .
template:
metadata:
labels:
app: greeting
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim-pods
kube-monkey/mtbf: 2
kube-monkey/kill-mode: random-max-percent
kube-monkey/kill-value: 40
spec:
containers:
- name: greeting
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
https://github.com/asobti/kube-monkey
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering working group @ CNCF
https://github.com/chaoseng/wg-chaoseng
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Chaos Engineering mind map
https://bit.ly/2uKOJMQ
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
You don’t chose the moment,
the moment chooses you.
You only choose how prepared
you are, when it does.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Thank you!
Ad

More Related Content

What's hot (20)

Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Nils Meder
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Ana Medina
 
Chaos Engineering Kubernetes
Chaos Engineering KubernetesChaos Engineering Kubernetes
Chaos Engineering Kubernetes
Alex Soto
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
Raymond Adrian (Rad) Butalid
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at Netflix
Josh Evans
 
Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018
Araf Karsh Hamid
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SRE
Araf Karsh Hamid
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic Architecture
Paul Mooney
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
Claus Ibsen
 
Azure DevOps CI/CD For Beginners
Azure DevOps CI/CD  For BeginnersAzure DevOps CI/CD  For Beginners
Azure DevOps CI/CD For Beginners
Rahul Nath
 
Modern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsModern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOps
GlobalLogic Ukraine
 
Azure DevOps
Azure DevOpsAzure DevOps
Azure DevOps
Felipe Artur Feltes
 
Introduction to CICD
Introduction to CICDIntroduction to CICD
Introduction to CICD
Knoldus Inc.
 
Introduction to Docker - 2017
Introduction to Docker - 2017Introduction to Docker - 2017
Introduction to Docker - 2017
Docker, Inc.
 
Azure DevOps
Azure DevOpsAzure DevOps
Azure DevOps
Juan Fabian
 
DevOps 101 - an Introduction to DevOps
DevOps 101  - an Introduction to DevOpsDevOps 101  - an Introduction to DevOps
DevOps 101 - an Introduction to DevOps
Red Gate Software
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
Abeer R
 
Azure DevOps Presentation
Azure DevOps PresentationAzure DevOps Presentation
Azure DevOps Presentation
InCycleSoftware
 
Getting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeGetting Started with Infrastructure as Code
Getting Started with Infrastructure as Code
WinWire Technologies Inc
 
Jenkins Pipeline Tutorial | Jenkins Build And Delivery Pipeline | Jenkins Tut...
Jenkins Pipeline Tutorial | Jenkins Build And Delivery Pipeline | Jenkins Tut...Jenkins Pipeline Tutorial | Jenkins Build And Delivery Pipeline | Jenkins Tut...
Jenkins Pipeline Tutorial | Jenkins Build And Delivery Pipeline | Jenkins Tut...
Simplilearn
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Nils Meder
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Ana Medina
 
Chaos Engineering Kubernetes
Chaos Engineering KubernetesChaos Engineering Kubernetes
Chaos Engineering Kubernetes
Alex Soto
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at Netflix
Josh Evans
 
Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018
Araf Karsh Hamid
 
Microservices, DevOps & SRE
Microservices, DevOps & SREMicroservices, DevOps & SRE
Microservices, DevOps & SRE
Araf Karsh Hamid
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic Architecture
Paul Mooney
 
Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...Best Practices for Middleware and Integration Architecture Modernization with...
Best Practices for Middleware and Integration Architecture Modernization with...
Claus Ibsen
 
Azure DevOps CI/CD For Beginners
Azure DevOps CI/CD  For BeginnersAzure DevOps CI/CD  For Beginners
Azure DevOps CI/CD For Beginners
Rahul Nath
 
Modern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsModern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOps
GlobalLogic Ukraine
 
Introduction to CICD
Introduction to CICDIntroduction to CICD
Introduction to CICD
Knoldus Inc.
 
Introduction to Docker - 2017
Introduction to Docker - 2017Introduction to Docker - 2017
Introduction to Docker - 2017
Docker, Inc.
 
DevOps 101 - an Introduction to DevOps
DevOps 101  - an Introduction to DevOpsDevOps 101  - an Introduction to DevOps
DevOps 101 - an Introduction to DevOps
Red Gate Software
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
Abeer R
 
Azure DevOps Presentation
Azure DevOps PresentationAzure DevOps Presentation
Azure DevOps Presentation
InCycleSoftware
 
Getting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeGetting Started with Infrastructure as Code
Getting Started with Infrastructure as Code
WinWire Technologies Inc
 
Jenkins Pipeline Tutorial | Jenkins Build And Delivery Pipeline | Jenkins Tut...
Jenkins Pipeline Tutorial | Jenkins Build And Delivery Pipeline | Jenkins Tut...Jenkins Pipeline Tutorial | Jenkins Build And Delivery Pipeline | Jenkins Tut...
Jenkins Pipeline Tutorial | Jenkins Build And Delivery Pipeline | Jenkins Tut...
Simplilearn
 

Similar to Chaos Engineering with Kubernetes (12)

Using chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applicationsUsing chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applications
John Varghese
 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)
Yan Cui
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
Adrian Hornsby
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
AWS User Group Bengaluru
 
Intro to SageMaker
Intro to SageMakerIntro to SageMaker
Intro to SageMaker
Soji Adeshina
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
Thomas Delteil
 
Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18
CodeOps Technologies LLP
 
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Web Services Korea
 
IVS CTO Night And Day 2018 Winter - [re:Cap] AI & ML Services
IVS CTO Night And Day 2018 Winter - [re:Cap] AI & ML ServicesIVS CTO Night And Day 2018 Winter - [re:Cap] AI & ML Services
IVS CTO Night And Day 2018 Winter - [re:Cap] AI & ML Services
Amazon Web Services Japan
 
Building Serverless IoT solutions - EPAM SEC 2018 Minsk
Building Serverless IoT solutions - EPAM SEC 2018 MinskBuilding Serverless IoT solutions - EPAM SEC 2018 Minsk
Building Serverless IoT solutions - EPAM SEC 2018 Minsk
Boaz Ziniman
 
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate ManagerAWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
Amazon Web Services Japan
 
AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018
AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018
AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018
Amazon Web Services Korea
 
Using chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applicationsUsing chaos to bring resiliency to your applications
Using chaos to bring resiliency to your applications
John Varghese
 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)
Yan Cui
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
Adrian Hornsby
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
AWS User Group Bengaluru
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
Thomas Delteil
 
Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18Microservices for Startups - Donnie Prakoso - AWS - CC18
Microservices for Startups - Donnie Prakoso - AWS - CC18
CodeOps Technologies LLP
 
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Web Services Korea
 
IVS CTO Night And Day 2018 Winter - [re:Cap] AI & ML Services
IVS CTO Night And Day 2018 Winter - [re:Cap] AI & ML ServicesIVS CTO Night And Day 2018 Winter - [re:Cap] AI & ML Services
IVS CTO Night And Day 2018 Winter - [re:Cap] AI & ML Services
Amazon Web Services Japan
 
Building Serverless IoT solutions - EPAM SEC 2018 Minsk
Building Serverless IoT solutions - EPAM SEC 2018 MinskBuilding Serverless IoT solutions - EPAM SEC 2018 Minsk
Building Serverless IoT solutions - EPAM SEC 2018 Minsk
Boaz Ziniman
 
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate ManagerAWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
Amazon Web Services Japan
 
AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018
AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018
AWS 기반 Microservice 운영을 위한 데브옵스 사례와 Spinnaker 소개::김영욱::AWS Summit Seoul 2018
Amazon Web Services Korea
 
Ad

More from Arun Gupta (20)

5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf
Arun Gupta
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
Arun Gupta
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
Arun Gupta
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
Arun Gupta
 
Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019
Arun Gupta
 
Why Amazon Cares about Open Source
Why Amazon Cares about Open SourceWhy Amazon Cares about Open Source
Why Amazon Cares about Open Source
Arun Gupta
 
Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using Kubernetes
Arun Gupta
 
Building Cloud Native Applications
Building Cloud Native ApplicationsBuilding Cloud Native Applications
Building Cloud Native Applications
Arun Gupta
 
How to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMHow to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAM
Arun Gupta
 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018
Arun Gupta
 
The Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteThe Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 Keynote
Arun Gupta
 
Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018
Arun Gupta
 
Mastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitMastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv Summit
Arun Gupta
 
Top 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeTop 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's Landscape
Arun Gupta
 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017
Arun Gupta
 
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftJava EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Arun Gupta
 
Docker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersDocker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developers
Arun Gupta
 
Thanks Managers!
Thanks Managers!Thanks Managers!
Thanks Managers!
Arun Gupta
 
Migrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersMigrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to Containers
Arun Gupta
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern Success
Arun Gupta
 
5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf5 Skills To Force Multiply Technical Talents.pdf
5 Skills To Force Multiply Technical Talents.pdf
Arun Gupta
 
Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019Machine Learning using Kubernetes - AI Conclave 2019
Machine Learning using Kubernetes - AI Conclave 2019
Arun Gupta
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
Arun Gupta
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
Arun Gupta
 
Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019Building Java in the Open - j.Day at OSCON 2019
Building Java in the Open - j.Day at OSCON 2019
Arun Gupta
 
Why Amazon Cares about Open Source
Why Amazon Cares about Open SourceWhy Amazon Cares about Open Source
Why Amazon Cares about Open Source
Arun Gupta
 
Machine learning using Kubernetes
Machine learning using KubernetesMachine learning using Kubernetes
Machine learning using Kubernetes
Arun Gupta
 
Building Cloud Native Applications
Building Cloud Native ApplicationsBuilding Cloud Native Applications
Building Cloud Native Applications
Arun Gupta
 
How to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAMHow to be a mentor to bring more girls to STEAM
How to be a mentor to bring more girls to STEAM
Arun Gupta
 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018
Arun Gupta
 
The Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 KeynoteThe Serverless Tidal Wave - SwampUP 2018 Keynote
The Serverless Tidal Wave - SwampUP 2018 Keynote
Arun Gupta
 
Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018Introduction to Amazon EKS - KubeCon 2018
Introduction to Amazon EKS - KubeCon 2018
Arun Gupta
 
Mastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv SummitMastering Kubernetes on AWS - Tel Aviv Summit
Mastering Kubernetes on AWS - Tel Aviv Summit
Arun Gupta
 
Top 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's LandscapeTop 10 Technology Trends Changing Developer's Landscape
Top 10 Technology Trends Changing Developer's Landscape
Arun Gupta
 
Container Landscape in 2017
Container Landscape in 2017Container Landscape in 2017
Container Landscape in 2017
Arun Gupta
 
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShiftJava EE and NoSQL using JBoss EAP 7 and OpenShift
Java EE and NoSQL using JBoss EAP 7 and OpenShift
Arun Gupta
 
Docker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developersDocker, Kubernetes, and Mesos recipes for Java developers
Docker, Kubernetes, and Mesos recipes for Java developers
Arun Gupta
 
Thanks Managers!
Thanks Managers!Thanks Managers!
Thanks Managers!
Arun Gupta
 
Migrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to ContainersMigrate your traditional VM-based Clusters to Containers
Migrate your traditional VM-based Clusters to Containers
Arun Gupta
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern Success
Arun Gupta
 
Ad

Recently uploaded (20)

Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 

Chaos Engineering with Kubernetes

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Arun Gupta, @arungupta Principal Open Source Technologist, Amazon Web Services Using Chaos to Bring Resiliency to Your Applications in Kubernetes
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Failures are a given and everything will eventually fail over time. https://www.allthingsdistributed.com/2016/03/10-lessons-from-10-years-of-aws.html
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://www.youtube.com/watch?v=zoz0ZjfrQ9s Amazon 2006 GameDay: Creating Resiliency Through Destruction Jesse Robbins
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Monkeys https://github.com/Netflix/SimianArmy
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Resilience Ability of a system to adapt to changes, failures, and disturbances
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production Credit: https://www.flickr.com/photos/loseryouthcrew/8775130600/ https://principlesofchaos.org/
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Bad things will happen to your system, no matter how well designed it is You cannot become ignorant to it
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Break your systems on purpose Find out their weaknesses and fix them before they break when least expected
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos doesn’t cause problems. It reveals them.
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • Application level • Host failure • Resource attacks (CPU, memory, …) • Network attacks (dependencies, latency, …) • Region attacks!
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Where do you inject Chaos?
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://www.elastic.co/blog/timelion-tutorial-from-zero-to-hero ”Normal” behavior of your system
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Business metric https://medium.com/netflix- techblog/sps-the-pulse-of- netflix-streaming- ae4db0e05f8a
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • a service gives 404 or 503? • latency increases by 300ms? • the port is not accessible? • security group rules changed? • the database stops? • excessive number of requests come? • iptables are wiped out?
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Pick hypothesis Scope the experiment Identify metrics Notify the organization
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Start with very small As close as possible to production Minimize the blast radius. Have an emergency STOP!
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Users Canary deployment 99% users 1% users Start with...
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Time to detect? Time for notification? And escalation? Time to public notification? Time for graceful degradation to kick-in? Time for self healing to happen? Time to recovery—partial and full? Time to all-clear and stable?
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark DON’T blame that one person…
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark PostMortems—COE (Correction of Errors) The 5 WHYs
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Phases of chaos engineering
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fix
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Failure free operations require experience with failure. http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Reconciles desired and actual state for pods Distributes pods across AZs Automatic health-check based restarts Rolling deployment of a service
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster with Amazon EKS AWS managed Customer account
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kubernetes cluster with Amazon EKS mycluster.eks.amazonaws.com Availability Zone 1 Availability Zone 2 Availability Zone 3 Kubectl
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Region and Availability Zones Control Plane is highly available Master and Workers are configured in ASG Master instance type auto-scaling Etcd is HA and backed up every hour
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos in a Kubernetes cluster mycluster.eks.amazonaws.com Availability Zone 1 Availability Zone 2 Availability Zone 3 Kubectl x x Health check? Dead node? x
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio Chaos Toolkit Kube Monkey PowerfulSeal Gremlin Simian Army
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio Intelligent routing and load balancing Resilience across languages and platforms Fleet-wide policy enforcement In-depth telemetry
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Timeouts Bounded retries with timeout budget Concurrent connections limit and request load Active health checks (periodic) Passive health checks (circuit breakers) AZ-aware load balancing with automatic failover
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark • Timing failures • Increased network latency • Overloaded upstream service • Crashes • HTTP error codes • TCP connection failures
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fault injection using Istio—timeout apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting spec: hosts: - greeting http: - fault: delay: fixedDelay: 10s percent: 100 route: - destination: host: greeting subset: greeting-hello --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello
  • 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Fault injection using Istio—HTTP abort apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting spec: hosts: - greeting http: - fault: abort: httpStatus: 500 percent: 100 route: - destination: host: greeting subset: greeting-hello
  • 47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio traffic management apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: greeting-virtual-service spec: hosts: - greeting http: - route: - destination: host: greeting subset: greeting-hello weight: 75 - destination: host: greeting subset: greeting-howdy weight: 25 --- apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello - name: greeting-howdy labels: greeting: howdy
  • 48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Istio circuit breaker apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: greeting-destination-rule spec: host: greeting subsets: - name: greeting-hello labels: greeting: hello trafficPolicy: connectionPool: tcp: maxConnections: 100
  • 49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://istio.io/docs/
  • 50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit Open API for Chaos Engineering
  • 51. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark CLI-driven Experiments declared in JSON/YAML files Open specification Extensible: Kubernetes, AWS, Spring, others
  • 52. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit follows the principles of chaos
  • 53. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark query a system to observe a behavior • Check state of a pod with a specific label • Multiple probes to define steady state real-world events • Terminate a deployment • Multiple actions simulate events Types of probe and method • Process: Run a binary • HTTP: Invoke a HTTP endpoint • Python: Call a Python function to perform richer operations
  • 54. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit metadata { "version": "1.0.0", "title": "Terminating the greeting service should not impact users", "description": "How does the greeting service unavailbility impacts our users? Do they see an error or does the webapp gets slower?", "tags": [ "kubernetes", "aws" ], "configuration": { "web_app_url": { "type": "env", "key": "WEBAPP_URL" } },
  • 55. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit steady state & hypothesis "steady-state-hypothesis": { "title": "Services are all available and healthy", "probes": [ { "type": "probe", "name": "alive-and-healthy", "tolerance": true, "provider": { "type": "python", "module": "chaosk8s.pod.probes", "func": "pods_in_phase", "arguments": { "label_selector": "app=webapp-pod", "phase": "Running", "ns": "default" } } }, { "type": "probe", "name": "application-must-respond-normally", "tolerance": 200, "provider": { "type": "http", "url": "${web_app_url}", "timeout": 3 } } ] },
  • 56. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit experiment & verify "method": [ { "type": "action", "name": "terminate-greeting-service", "provider": { "type": "python", "module": "chaosk8s.pod.actions", "func": "terminate_pods", "arguments": { "label_selector": "app=greeter-pod", "ns": "default" } } }, { "type": "probe", "name": "fetch-application-logs", "provider": { "type": "python", "module": "chaosk8s.pod.probes", "func": "read_pod_logs", "arguments": { "label_selector": "app=webapp-pod", "last": "20s", "ns": "default" } } } ],
  • 57. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Toolkit run $ chaos run experiments/experiment.json [2018-03-10 14:42:38 INFO] Validating the experiment's syntax [2018-03-10 14:42:38 INFO] Experiment looks valid [2018-03-10 14:42:38 INFO] Running experiment: Terminate the greeting service should not impact users [2018-03-10 14:42:38 INFO] Steady state hypothesis: Services are all available and healthy [2018-03-10 14:42:38 INFO] Probe: application-should-be-alive-and-healthy [2018-03-10 14:42:38 INFO] Probe: application-must-respond-normally [2018-03-10 14:42:39 INFO] Steady state hypothesis is met! [2018-03-10 14:42:39 INFO] Action: terminate-greeting-service [2018-03-10 14:42:40 INFO] Probe: fetch-application-logs [2018-03-10 14:42:41 INFO] Steady state hypothesis: Services are all available and healthy [2018-03-10 14:42:41 INFO] Probe: application-should-be-alive-and-healthy [2018-03-10 14:42:42 INFO] Probe: application-must-respond-normally [2018-03-10 14:42:45 ERROR] => failed: activity took too long to complete [2018-03-10 14:42:45 CRITICAL] Steady state probe 'application-must-respond-normally' is not in the given tolerance so failing this experiment [2018-03-10 14:42:45 INFO] Let's rollback... [2018-03-10 14:42:45 INFO] No declared rollbacks, let's move on. [2018-03-10 14:42:45 INFO] Experiment ended with status: failed
  • 58. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://github.com/chaostoolkit/chaostoolkit/
  • 59. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Implementation of Netflix’s Chaos Monkey for Kubernetes Randomly deletes pods in the cluster Applications opt-in using annotations
  • 60. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Run Kube-Monkey—create configuration apiVersion: v1 kind: ConfigMap metadata: name: kube-monkey-config-map namespace: kube-system data: config.toml: | [kubemonkey] run_hour = 8 start_hour = 10 end_hour = 16 blacklisted_namespaces = ["kube-system"] whitelisted_namespaces = [""]
  • 61. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Kube-Monkey application opt-in apiVersion: apps/v1 kind: Deployment . . . template: metadata: labels: app: greeting kube-monkey/enabled: enabled kube-monkey/identifier: monkey-victim-pods kube-monkey/mtbf: 2 kube-monkey/kill-mode: random-max-percent kube-monkey/kill-value: 40 spec: containers: - name: greeting
  • 62. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark https://github.com/asobti/kube-monkey
  • 63. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering working group @ CNCF https://github.com/chaoseng/wg-chaoseng
  • 64. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Chaos Engineering mind map https://bit.ly/2uKOJMQ
  • 65. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark You don’t chose the moment, the moment chooses you. You only choose how prepared you are, when it does.
  • 66. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Thank you!