SlideShare a Scribd company logo
Chaos Engineering
Injecting failure for building
resilience in systems
Nice to meet you
YURY NIÑO
Software Engineer and Chaos
Engineer Advocate.
Loves building software applications, solving
resilience issues and teaching. Passionate about
reading, writing and cycling.
Agenda
● Resilience vs Reliability
● Why the world needs Resilience and
Reliability?
● Chaos Engineering
● Principles of Chaos
● Chaos in Practice
● Game Days
How many of you
Have encountered a crash of
your systems on production?
A recognition for ...
This talk is dedicated to
the #SystemAdministrators well
caffeinated, who get woken up in the
middle of the night when “things go
bump”.
#EngineeringTeam #DigitalFactory
@jnhernandz @
What is a
Resilient System?
A resilient system can maintain an acceptable level
of service in the face of failure.
A resilient system can weather the storm such a
large scale natural disaster or a controlled chaos
engineering.
Tammy Bütow Principal SRE at Gremlin
https://securethegrid.com
A distributed system on production needs to be
resilient in order to be reliable and this is precisely
a target that we Software Engineers, Systems
Engineers, Site Reliability Engineers and Chaos
Engineers always aim.
Mine :)
Why the world needs
Resilient Systems?
Because ...
We are surrounded by
distributed systems.
When we read the news in our
cellphones, send an email or buy our
lunch ...
We do not tolerate that
they fail!
Chaos Engineering: Injecting Failure for Building Resilience in Systems
February 28th, 2017 will be remembered
● Simple Storage Service (S3) went down in US-EAST.
● Outage lasted about 4 hrs.
● > 100.000 websites across the world were impacted.
Me :(
The World is Chaotic!
● Distributed systems contains moving
parts.
● Many things can go wrong.
○ Hard disks can fail.
○ The network can go down.
○ Customer traffic can overload.
How many of you know
What is Chaos
Engineering?
Chaos Engineering
It is the discipline of experimenting in
production on a distributed system in
order to reveal their weakness and to
build confidence in their resilience
capability.
https://principlesofchaos.org/
Chaos Engineering
It is deliberately inducing stress or
fault into software and/or hardware as
a way of learning/verifying things
about systems.
https://www.gremlin.com
Chaos Engineering is about
● Simulating the failure of a datacenter.
● Injecting latency between services.
● Randomly causing exceptions.
● Changing time travel.
● Emulating I/O errors.
http://principlesofchaos.org/
2008
Chaos Engineering
began at Netflix
2010
Chaos Monkey was
launched
2018
A lot of resources for
Chaos Engineering.
2014
Role of Chaos
Engineer was created.
History of Chaos Engineering
Kolton Andrus
Chaos in Practice
Principles of
Chaos
https://principlesofchaos.org/
1. Steady Stead
Chaos Engineering: Injecting Failure for Building Resilience in Systems
2. Hypothesis:
Circuit
Breaker
builds
Resilience
2. Hypothesis:
Circuit
Breaker
builds
Resilience
4. Run the Experiment
Application
Name Finer Observability DataDog
Hypothesis Circuit Breaker works
Environment My Home Results
Duration 5 - 10 seconds
Load 1 request
Actions
4. Run the Experiment
Application
Name Finer Observability DataDog
Hypothesis Circuit Breaker works
Facing latencies > 5 seconds between
dashboard_api and smart_api to open
the circuit.
Environment My Home Results
Duration 20 milliseconds
Load 1 request
Issue #4356
Configure the proper hystrix parameters
according the results.
Implement a fallback.
Actions
Game Days
Game Day: Roles
Master of Disaster First on-call Team
https://www.pinterest.es/pin/824299538021645731/
Game Days can Transform our Teams
Even though Game Days are not real! they
make Engineers gain confidence.
Since we, Engineers are experiencing the failure as part
of our job, we should start designing for failure.
Me :)
The best time to learn about fire
is when you’re on fire.
—Jen Hammond, New Relic engineering manager
How to begin ...
https://chaosengineering.slack.com
https://github.com/dastergon/awesome-chaos
-engineering
https://www.infoq.com/chaos-engineering
@yurynino
Ad

More Related Content

What's hot (20)

Technical Debt
Technical DebtTechnical Debt
Technical Debt
Gary Short
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Nils Meder
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-Knolx
Knoldus Inc.
 
Shift Left Security - The What, Why and How
Shift Left Security - The What, Why and HowShift Left Security - The What, Why and How
Shift Left Security - The What, Why and How
DevOps.com
 
[Konveyor] introduction to cloud native chaos engineering with litmus chaos (1)
[Konveyor] introduction to cloud native chaos engineering with litmus chaos (1)[Konveyor] introduction to cloud native chaos engineering with litmus chaos (1)
[Konveyor] introduction to cloud native chaos engineering with litmus chaos (1)
Konveyor Community
 
Perfomatix - NodeJS Coding Standards
Perfomatix - NodeJS Coding StandardsPerfomatix - NodeJS Coding Standards
Perfomatix - NodeJS Coding Standards
Perfomatix Solutions
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
Gremlin
 
Introduction to Resilience4j
Introduction to Resilience4jIntroduction to Resilience4j
Introduction to Resilience4j
Knoldus Inc.
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
New Relic
 
Observability in the world of microservices
Observability in the world of microservicesObservability in the world of microservices
Observability in the world of microservices
Chandresh Pancholi
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale
Knoldus Inc.
 
Engineering Velocity: Shifting the Curve at Netflix
Engineering Velocity: Shifting the Curve at NetflixEngineering Velocity: Shifting the Curve at Netflix
Engineering Velocity: Shifting the Curve at Netflix
Dianne Marsh
 
The 12 Agile Principles
The 12 Agile PrinciplesThe 12 Agile Principles
The 12 Agile Principles
Agile201
 
SAFe® - scaled agile framework in practice
SAFe® - scaled agile framework in practiceSAFe® - scaled agile framework in practice
SAFe® - scaled agile framework in practice
Intland Software GmbH
 
Chaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformChaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin Platform
Anshul Patel
 
10 Essential SAFe(tm) patterns you should focus on when scaling Agile
10 Essential SAFe(tm) patterns you should focus on when scaling Agile10 Essential SAFe(tm) patterns you should focus on when scaling Agile
10 Essential SAFe(tm) patterns you should focus on when scaling Agile
Yuval Yeret
 
10 steps to a successsful enterprise agile transformation global scrum 2018
10 steps to a successsful enterprise agile transformation   global scrum 201810 steps to a successsful enterprise agile transformation   global scrum 2018
10 steps to a successsful enterprise agile transformation global scrum 2018
Agile Velocity
 
The State of DevSecOps
The State of DevSecOpsThe State of DevSecOps
The State of DevSecOps
DevOps Indonesia
 
Feature flag launchdarkly
Feature flag launchdarklyFeature flag launchdarkly
Feature flag launchdarkly
Sandeep Soni
 
Cyclomatic and cognitive complexity
Cyclomatic and cognitive complexityCyclomatic and cognitive complexity
Cyclomatic and cognitive complexity
Heemeng Foo
 
Technical Debt
Technical DebtTechnical Debt
Technical Debt
Gary Short
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Nils Meder
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-Knolx
Knoldus Inc.
 
Shift Left Security - The What, Why and How
Shift Left Security - The What, Why and HowShift Left Security - The What, Why and How
Shift Left Security - The What, Why and How
DevOps.com
 
[Konveyor] introduction to cloud native chaos engineering with litmus chaos (1)
[Konveyor] introduction to cloud native chaos engineering with litmus chaos (1)[Konveyor] introduction to cloud native chaos engineering with litmus chaos (1)
[Konveyor] introduction to cloud native chaos engineering with litmus chaos (1)
Konveyor Community
 
Perfomatix - NodeJS Coding Standards
Perfomatix - NodeJS Coding StandardsPerfomatix - NodeJS Coding Standards
Perfomatix - NodeJS Coding Standards
Perfomatix Solutions
 
An Introduction to Chaos Engineering
An Introduction to Chaos EngineeringAn Introduction to Chaos Engineering
An Introduction to Chaos Engineering
Gremlin
 
Introduction to Resilience4j
Introduction to Resilience4jIntroduction to Resilience4j
Introduction to Resilience4j
Knoldus Inc.
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
New Relic
 
Observability in the world of microservices
Observability in the world of microservicesObservability in the world of microservices
Observability in the world of microservices
Chandresh Pancholi
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale
Knoldus Inc.
 
Engineering Velocity: Shifting the Curve at Netflix
Engineering Velocity: Shifting the Curve at NetflixEngineering Velocity: Shifting the Curve at Netflix
Engineering Velocity: Shifting the Curve at Netflix
Dianne Marsh
 
The 12 Agile Principles
The 12 Agile PrinciplesThe 12 Agile Principles
The 12 Agile Principles
Agile201
 
SAFe® - scaled agile framework in practice
SAFe® - scaled agile framework in practiceSAFe® - scaled agile framework in practice
SAFe® - scaled agile framework in practice
Intland Software GmbH
 
Chaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin PlatformChaos Engineering with Gremlin Platform
Chaos Engineering with Gremlin Platform
Anshul Patel
 
10 Essential SAFe(tm) patterns you should focus on when scaling Agile
10 Essential SAFe(tm) patterns you should focus on when scaling Agile10 Essential SAFe(tm) patterns you should focus on when scaling Agile
10 Essential SAFe(tm) patterns you should focus on when scaling Agile
Yuval Yeret
 
10 steps to a successsful enterprise agile transformation global scrum 2018
10 steps to a successsful enterprise agile transformation   global scrum 201810 steps to a successsful enterprise agile transformation   global scrum 2018
10 steps to a successsful enterprise agile transformation global scrum 2018
Agile Velocity
 
Feature flag launchdarkly
Feature flag launchdarklyFeature flag launchdarkly
Feature flag launchdarkly
Sandeep Soni
 
Cyclomatic and cognitive complexity
Cyclomatic and cognitive complexityCyclomatic and cognitive complexity
Cyclomatic and cognitive complexity
Heemeng Foo
 

Similar to Chaos Engineering: Injecting Failure for Building Resilience in Systems (20)

Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
C4Media
 
Embracing Failure - AzureDay Rome
Embracing Failure - AzureDay RomeEmbracing Failure - AzureDay Rome
Embracing Failure - AzureDay Rome
Alberto Acerbis
 
Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...
Alex Cachia
 
JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...
JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...
JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...
PROIDEA
 
Architectural Patterns of Resilient Distributed Systems
 Architectural Patterns of Resilient Distributed Systems Architectural Patterns of Resilient Distributed Systems
Architectural Patterns of Resilient Distributed Systems
Ines Sombra
 
Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018
Dinis Cruz
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
Christophe Rochefolle
 
Chaos Engineering to Establish Software Reliability
Chaos Engineering to Establish Software ReliabilityChaos Engineering to Establish Software Reliability
Chaos Engineering to Establish Software Reliability
GleecusTechlabs1
 
Stability anti patterns in cloud-native applications
Stability anti patterns in cloud-native applicationsStability anti patterns in cloud-native applications
Stability anti patterns in cloud-native applications
Ana-Maria Mihalceanu
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
Haggai Philip Zagury
 
Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)Availability in a cloud native world v1.6 (Feb 2019)
Availability in a cloud native world v1.6 (Feb 2019)
removed_414e600f33c7539c2e1b596a774aaebd
 
Containers and Why They Matter
Containers and Why They MatterContainers and Why They Matter
Containers and Why They Matter
Ray Lukas
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Ana Medina
 
Designing Cloud Backup to reduce DR downtime for IT Professionals
Designing Cloud Backup to reduce DR downtime for IT ProfessionalsDesigning Cloud Backup to reduce DR downtime for IT Professionals
Designing Cloud Backup to reduce DR downtime for IT Professionals
Storage Switzerland
 
CS5032 Lecture 2: Failure
CS5032 Lecture 2: FailureCS5032 Lecture 2: Failure
CS5032 Lecture 2: Failure
John Rooksby
 
Unleash The Monkeys
Unleash The MonkeysUnleash The Monkeys
Unleash The Monkeys
Jacob Duijzer
 
ppt_rs.jpg
ppt_rs.jpgppt_rs.jpg
ppt_rs.jpg
webhostingguy
 
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Jorge Cardoso
 
Webinar_DevOps_Nov10_D2
Webinar_DevOps_Nov10_D2Webinar_DevOps_Nov10_D2
Webinar_DevOps_Nov10_D2
Phil Christensen
 
Migrating_to_Cloud-Native_App_Architectures_Pivotal (2)
Migrating_to_Cloud-Native_App_Architectures_Pivotal (2)Migrating_to_Cloud-Native_App_Architectures_Pivotal (2)
Migrating_to_Cloud-Native_App_Architectures_Pivotal (2)
Tim Kirby
 
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient SystemsChaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering: Why the World Needs More Resilient Systems
C4Media
 
Embracing Failure - AzureDay Rome
Embracing Failure - AzureDay RomeEmbracing Failure - AzureDay Rome
Embracing Failure - AzureDay Rome
Alberto Acerbis
 
Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...
Alex Cachia
 
JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...
JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...
JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...
PROIDEA
 
Architectural Patterns of Resilient Distributed Systems
 Architectural Patterns of Resilient Distributed Systems Architectural Patterns of Resilient Distributed Systems
Architectural Patterns of Resilient Distributed Systems
Ines Sombra
 
Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018
Dinis Cruz
 
From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018From Duke of DevOps to Queen of Chaos - Api days 2018
From Duke of DevOps to Queen of Chaos - Api days 2018
Christophe Rochefolle
 
Chaos Engineering to Establish Software Reliability
Chaos Engineering to Establish Software ReliabilityChaos Engineering to Establish Software Reliability
Chaos Engineering to Establish Software Reliability
GleecusTechlabs1
 
Stability anti patterns in cloud-native applications
Stability anti patterns in cloud-native applicationsStability anti patterns in cloud-native applications
Stability anti patterns in cloud-native applications
Ana-Maria Mihalceanu
 
Containers and Why They Matter
Containers and Why They MatterContainers and Why They Matter
Containers and Why They Matter
Ray Lukas
 
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Ana Medina
 
Designing Cloud Backup to reduce DR downtime for IT Professionals
Designing Cloud Backup to reduce DR downtime for IT ProfessionalsDesigning Cloud Backup to reduce DR downtime for IT Professionals
Designing Cloud Backup to reduce DR downtime for IT Professionals
Storage Switzerland
 
CS5032 Lecture 2: Failure
CS5032 Lecture 2: FailureCS5032 Lecture 2: Failure
CS5032 Lecture 2: Failure
John Rooksby
 
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Jorge Cardoso
 
Migrating_to_Cloud-Native_App_Architectures_Pivotal (2)
Migrating_to_Cloud-Native_App_Architectures_Pivotal (2)Migrating_to_Cloud-Native_App_Architectures_Pivotal (2)
Migrating_to_Cloud-Native_App_Architectures_Pivotal (2)
Tim Kirby
 
Ad

Recently uploaded (20)

WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Top 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docxTop 10 Client Portal Software Solutions for 2025.docx
Top 10 Client Portal Software Solutions for 2025.docx
Portli
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdfMicrosoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
Microsoft AI Nonprofit Use Cases and Live Demo_2025.04.30.pdf
TechSoup
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)Who Watches the Watchmen (SciFiDevCon 2025)
Who Watches the Watchmen (SciFiDevCon 2025)
Allon Mureinik
 
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRYLEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
LEARN SEO AND INCREASE YOUR KNOWLDGE IN SOFTWARE INDUSTRY
NidaFarooq10
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025PDF Reader Pro Crack Latest Version FREE Download 2025
PDF Reader Pro Crack Latest Version FREE Download 2025
mu394968
 
Ad

Chaos Engineering: Injecting Failure for Building Resilience in Systems

  • 1. Chaos Engineering Injecting failure for building resilience in systems
  • 2. Nice to meet you YURY NIÑO Software Engineer and Chaos Engineer Advocate. Loves building software applications, solving resilience issues and teaching. Passionate about reading, writing and cycling.
  • 3. Agenda ● Resilience vs Reliability ● Why the world needs Resilience and Reliability? ● Chaos Engineering ● Principles of Chaos ● Chaos in Practice ● Game Days
  • 4. How many of you Have encountered a crash of your systems on production?
  • 5. A recognition for ... This talk is dedicated to the #SystemAdministrators well caffeinated, who get woken up in the middle of the night when “things go bump”. #EngineeringTeam #DigitalFactory @jnhernandz @
  • 7. A resilient system can maintain an acceptable level of service in the face of failure. A resilient system can weather the storm such a large scale natural disaster or a controlled chaos engineering. Tammy Bütow Principal SRE at Gremlin
  • 9. A distributed system on production needs to be resilient in order to be reliable and this is precisely a target that we Software Engineers, Systems Engineers, Site Reliability Engineers and Chaos Engineers always aim. Mine :)
  • 10. Why the world needs Resilient Systems?
  • 11. Because ... We are surrounded by distributed systems. When we read the news in our cellphones, send an email or buy our lunch ... We do not tolerate that they fail!
  • 13. February 28th, 2017 will be remembered ● Simple Storage Service (S3) went down in US-EAST. ● Outage lasted about 4 hrs. ● > 100.000 websites across the world were impacted.
  • 14. Me :(
  • 15. The World is Chaotic! ● Distributed systems contains moving parts. ● Many things can go wrong. ○ Hard disks can fail. ○ The network can go down. ○ Customer traffic can overload.
  • 16. How many of you know What is Chaos Engineering?
  • 17. Chaos Engineering It is the discipline of experimenting in production on a distributed system in order to reveal their weakness and to build confidence in their resilience capability. https://principlesofchaos.org/
  • 18. Chaos Engineering It is deliberately inducing stress or fault into software and/or hardware as a way of learning/verifying things about systems. https://www.gremlin.com
  • 19. Chaos Engineering is about ● Simulating the failure of a datacenter. ● Injecting latency between services. ● Randomly causing exceptions. ● Changing time travel. ● Emulating I/O errors. http://principlesofchaos.org/
  • 20. 2008 Chaos Engineering began at Netflix 2010 Chaos Monkey was launched 2018 A lot of resources for Chaos Engineering. 2014 Role of Chaos Engineer was created. History of Chaos Engineering Kolton Andrus
  • 27. 4. Run the Experiment Application Name Finer Observability DataDog Hypothesis Circuit Breaker works Environment My Home Results Duration 5 - 10 seconds Load 1 request Actions
  • 28. 4. Run the Experiment Application Name Finer Observability DataDog Hypothesis Circuit Breaker works Facing latencies > 5 seconds between dashboard_api and smart_api to open the circuit. Environment My Home Results Duration 20 milliseconds Load 1 request Issue #4356 Configure the proper hystrix parameters according the results. Implement a fallback. Actions
  • 30. Game Day: Roles Master of Disaster First on-call Team https://www.pinterest.es/pin/824299538021645731/
  • 31. Game Days can Transform our Teams Even though Game Days are not real! they make Engineers gain confidence.
  • 32. Since we, Engineers are experiencing the failure as part of our job, we should start designing for failure. Me :) The best time to learn about fire is when you’re on fire. —Jen Hammond, New Relic engineering manager
  • 33. How to begin ... https://chaosengineering.slack.com https://github.com/dastergon/awesome-chaos -engineering https://www.infoq.com/chaos-engineering @yurynino