SlideShare a Scribd company logo
SRE in Apiary
CZJUG 21.5.2018
Ladislav Prskavec
@abtris
1
What is SRE?
2
"What happens when a software engineer is tasked with what used to
be called operations."
» Ben Treynor Sloss, Vice President, Google Engineering,
founder of Google SRE
3
SRE implement DevOps
— Google Cloud
4
Apiary in numbers
» Apiary users: 336,786 Apiary API projects: 440,178
» Apiary engineers: 19
» Apiary platform engineers: 10 + 4
» Apiary SREs: 4
» App deploys: 15 (per week)
» Parsing service invocations [1 day]: ~200k
» CI build: ~19 min (8 parallel workers)
5
How we started with
SRE team
6
2014 - 2 people
software developer and ops guy
7
2015-2017 - 3 people
software developers
8
2018 - 4 people
2 seniors, 2 juniors
9
Culture
Process > Tools
10
No team separation
» bounder context, but ...
» Shared ownership of platform - shared responsibility
» Shared tooling (debug, deploy, monitor)
» Shared codebase
» Brainstorm
» Motivation for good design (monitoring, future debugging)
11
Things break
» They do - better be ready
» Knowing when there's problem (logs, metrics, alerting)
» Having someone there - being oncall
» Responding (mitigation, resolution)
» Learning from it (postmortems)
12
Measure everything
» No gut feeling when we have the data (app metrics, runtime
metrics)
» Both production and non-production systems (e.g. our CI test
time)
» Thresholds, automated alerting
» Visualize the data (oncall dashboard, happiness dashboard)
13
14
15
Gradual changes
» Delivery vs deploy
» Continuous Integration / Continuous Delivery (CI/CD)
Automated testing within CI
» Testing environments (similar to production)
» Short iterations, fast rollbacks
» No-downtime deploy & immutable
» Rolling delivery
16
Tooling & automation
» oncall logistics
» schedules
» escalations
» alerting
» conflicts
» documentation
» runbooks
» internal processes
» domain dictionary
17
Reason 1. Decreasing changes of errors
» Source and great post: http://www.devops.ch/2017/05/10/devops-explained/
18
Reason 2: Eliminating toil, work that is:
» Repetitive
» Automatable
» Doesn't provide enduring value
» Scales linearly with service
» Compounds significantly and surprisingly
19
Reason 3: Focusing on creative
engineering work that:
» Improves reliability
» Improves performance & stability of systems
» Ensures scalability
» Reduces toil
» Is fun: improves morale, speeds up progress, allows skill
development
20
Incidents
Types:
» Low-priority incident
» High-priority incident
» Security incident
Both production and non-production systems
21
Being oncall
» Shared among developers (roles, not individuals, increase bus
factor) Responsible for the platform
» Safety net - you know who to call
» Runbooks - you know what to do
» Early alerting - proactively investigate
22
Incident response
If critical: Incident commander role Separate roles, if necessary:
» outbound and inbound communication
» root cause analysis
» issue mitigation
Tracking time (incident ack expiration) and keeping track Tooling
(alerts, paging, postmortem reminders)
23
Postmortems
» Root cause
» Lessons learned
» Actionable items
» Prevent future issues
» Create runbooks
» Blameless
» Generated reminders
24
Incident reviews
» Weekly, team lead sync
» Reviewing past incidents - types, occurrence, actionability
» Discuss improvements
» Incident fatigue prevention
25
Summary
26
Summary
» Culture is more important than process!
» Start early and work on improvements!
» Product owner for SRE work is useful role!
27
References
» SRE vs. DevOps: competing standards or close friends?
» SRE Weekly
» Awesome Site Reliability Engineering
Books
» SRE book
» Seeking SRE
28
Ad

More Related Content

What's hot (20)

DevOps & SRE at Google Scale
DevOps & SRE at Google ScaleDevOps & SRE at Google Scale
DevOps & SRE at Google Scale
Kaushik Bhattacharya
 
DevOps as-a-Service (DaaS) value
DevOps as-a-Service (DaaS) valueDevOps as-a-Service (DaaS) value
DevOps as-a-Service (DaaS) value
Marc Hornbeek
 
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOpsDOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
Gene Kim
 
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Splunk
 
You got a couple Microservices, now what? - Adding SRE to DevOps
You got a couple Microservices, now what?  - Adding SRE to DevOpsYou got a couple Microservices, now what?  - Adding SRE to DevOps
You got a couple Microservices, now what? - Adding SRE to DevOps
Gonzalo Maldonado
 
Continuous testing webinar 041017 slideshare
Continuous testing webinar 041017 slideshareContinuous testing webinar 041017 slideshare
Continuous testing webinar 041017 slideshare
QualiQuali
 
What's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE ParisWhat's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE Paris
Clément Michaud
 
SRE in Enterprise - Local Journey DevopsDays Galway
SRE in Enterprise - Local Journey  DevopsDays GalwaySRE in Enterprise - Local Journey  DevopsDays Galway
SRE in Enterprise - Local Journey DevopsDays Galway
Kevin Connaughton
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Mike Villiger
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
Hussain Mansoor
 
DevOps Testing | Continuous Testing In DevOps | DevOps Tutorial | DevOps Trai...
DevOps Testing | Continuous Testing In DevOps | DevOps Tutorial | DevOps Trai...DevOps Testing | Continuous Testing In DevOps | DevOps Tutorial | DevOps Trai...
DevOps Testing | Continuous Testing In DevOps | DevOps Tutorial | DevOps Trai...
Edureka!
 
Roles and Responsibilities of a DevOps Engineer
Roles and Responsibilities of a DevOps EngineerRoles and Responsibilities of a DevOps Engineer
Roles and Responsibilities of a DevOps Engineer
ZaranTech LLC
 
Bjorn Rabenstein. SRE, DevOps, Google, and you
Bjorn Rabenstein. SRE, DevOps, Google, and youBjorn Rabenstein. SRE, DevOps, Google, and you
Bjorn Rabenstein. SRE, DevOps, Google, and you
IT Arena
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
Derek Chen
 
DevOps Evolution - The Next Generation ?
DevOps Evolution - The Next Generation ?DevOps Evolution - The Next Generation ?
DevOps Evolution - The Next Generation ?
Marc Hornbeek
 
2017 DevSecOps Survey
2017 DevSecOps Survey2017 DevSecOps Survey
2017 DevSecOps Survey
Sonatype
 
How to Build the Right Automation
How to Build the Right AutomationHow to Build the Right Automation
How to Build the Right Automation
Jules Pierre-Louis
 
QA in DevOps: Transformation thru Automation via Jenkins
QA in DevOps:  Transformation thru Automation via JenkinsQA in DevOps:  Transformation thru Automation via Jenkins
QA in DevOps: Transformation thru Automation via Jenkins
Tatyana Kravtsov
 
Continuous Delivery Distilled
Continuous Delivery DistilledContinuous Delivery Distilled
Continuous Delivery Distilled
Matt Callanan
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
Michael Kehoe
 
DevOps as-a-Service (DaaS) value
DevOps as-a-Service (DaaS) valueDevOps as-a-Service (DaaS) value
DevOps as-a-Service (DaaS) value
Marc Hornbeek
 
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOpsDOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
Gene Kim
 
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Data-Driven DevOps: Improve Velocity and Quality of Software Delivery with Me...
Splunk
 
You got a couple Microservices, now what? - Adding SRE to DevOps
You got a couple Microservices, now what?  - Adding SRE to DevOpsYou got a couple Microservices, now what?  - Adding SRE to DevOps
You got a couple Microservices, now what? - Adding SRE to DevOps
Gonzalo Maldonado
 
Continuous testing webinar 041017 slideshare
Continuous testing webinar 041017 slideshareContinuous testing webinar 041017 slideshare
Continuous testing webinar 041017 slideshare
QualiQuali
 
What's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE ParisWhat's an SRE at Criteo - Meetup SRE Paris
What's an SRE at Criteo - Meetup SRE Paris
Clément Michaud
 
SRE in Enterprise - Local Journey DevopsDays Galway
SRE in Enterprise - Local Journey  DevopsDays GalwaySRE in Enterprise - Local Journey  DevopsDays Galway
SRE in Enterprise - Local Journey DevopsDays Galway
Kevin Connaughton
 
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...
Mike Villiger
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
Hussain Mansoor
 
DevOps Testing | Continuous Testing In DevOps | DevOps Tutorial | DevOps Trai...
DevOps Testing | Continuous Testing In DevOps | DevOps Tutorial | DevOps Trai...DevOps Testing | Continuous Testing In DevOps | DevOps Tutorial | DevOps Trai...
DevOps Testing | Continuous Testing In DevOps | DevOps Tutorial | DevOps Trai...
Edureka!
 
Roles and Responsibilities of a DevOps Engineer
Roles and Responsibilities of a DevOps EngineerRoles and Responsibilities of a DevOps Engineer
Roles and Responsibilities of a DevOps Engineer
ZaranTech LLC
 
Bjorn Rabenstein. SRE, DevOps, Google, and you
Bjorn Rabenstein. SRE, DevOps, Google, and youBjorn Rabenstein. SRE, DevOps, Google, and you
Bjorn Rabenstein. SRE, DevOps, Google, and you
IT Arena
 
The Art of Container Monitoring
The Art of Container MonitoringThe Art of Container Monitoring
The Art of Container Monitoring
Derek Chen
 
DevOps Evolution - The Next Generation ?
DevOps Evolution - The Next Generation ?DevOps Evolution - The Next Generation ?
DevOps Evolution - The Next Generation ?
Marc Hornbeek
 
2017 DevSecOps Survey
2017 DevSecOps Survey2017 DevSecOps Survey
2017 DevSecOps Survey
Sonatype
 
How to Build the Right Automation
How to Build the Right AutomationHow to Build the Right Automation
How to Build the Right Automation
Jules Pierre-Louis
 
QA in DevOps: Transformation thru Automation via Jenkins
QA in DevOps:  Transformation thru Automation via JenkinsQA in DevOps:  Transformation thru Automation via Jenkins
QA in DevOps: Transformation thru Automation via Jenkins
Tatyana Kravtsov
 
Continuous Delivery Distilled
Continuous Delivery DistilledContinuous Delivery Distilled
Continuous Delivery Distilled
Matt Callanan
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
Michael Kehoe
 

Similar to SRE in Apiary (20)

Software engineering the genesis
Software engineering  the genesisSoftware engineering  the genesis
Software engineering the genesis
Pawel Szulc
 
Create Your Future with z Systems Cloud
Create Your Future with z Systems CloudCreate Your Future with z Systems Cloud
Create Your Future with z Systems Cloud
CA Technologies
 
Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020
Ralf Laemmel
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
Tao Xie
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your System
Giovanni Asproni
 
Please, Please, PLEASE Defend Your Mobile Apps!
Please, Please, PLEASE Defend Your Mobile Apps!Please, Please, PLEASE Defend Your Mobile Apps!
Please, Please, PLEASE Defend Your Mobile Apps!
Jerod Brennen
 
DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...
DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...
DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...
Gene Kim
 
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINALJun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Alex Tarra
 
Illogical engineers
Illogical engineersIllogical engineers
Illogical engineers
Pawel Szulc
 
Illogical engineers
Illogical engineersIllogical engineers
Illogical engineers
Pawel Szulc
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native Apps
Araf Karsh Hamid
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Tom Mens
 
Agile and Agile methods: what is the most important to understand to succeed
Agile and Agile methods: what is the most important to understand to succeedAgile and Agile methods: what is the most important to understand to succeed
Agile and Agile methods: what is the most important to understand to succeed
Vaidas Adomauskas
 
Right-sized Architecture: Integrity for Emerging Designs
Right-sized Architecture: Integrity for Emerging DesignsRight-sized Architecture: Integrity for Emerging Designs
Right-sized Architecture: Integrity for Emerging Designs
TechWell
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2
bdemchak
 
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
NETFest
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
Tao Xie
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
Tao Xie
 
DevOps – what is it? Why? Is it real? How to do it?
DevOps – what is it? Why? Is it real? How to do it?DevOps – what is it? Why? Is it real? How to do it?
DevOps – what is it? Why? Is it real? How to do it?
Sailaja Tennati
 
Software engineering the genesis
Software engineering  the genesisSoftware engineering  the genesis
Software engineering the genesis
Pawel Szulc
 
Create Your Future with z Systems Cloud
Create Your Future with z Systems CloudCreate Your Future with z Systems Cloud
Create Your Future with z Systems Cloud
CA Technologies
 
Keynote at-icpc-2020
Keynote at-icpc-2020Keynote at-icpc-2020
Keynote at-icpc-2020
Ralf Laemmel
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
Tao Xie
 
Creating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your SystemCreating An Incremental Architecture For Your System
Creating An Incremental Architecture For Your System
Giovanni Asproni
 
Please, Please, PLEASE Defend Your Mobile Apps!
Please, Please, PLEASE Defend Your Mobile Apps!Please, Please, PLEASE Defend Your Mobile Apps!
Please, Please, PLEASE Defend Your Mobile Apps!
Jerod Brennen
 
DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...
DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...
DOES16 San Francisco - Susanna Brown & Ben Chan - DevOps in the Midst of an A...
Gene Kim
 
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINALJun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Jun 08 - PMWT Featured Paper -Tarabykin - XP PAPER - FINAL
Alex Tarra
 
Illogical engineers
Illogical engineersIllogical engineers
Illogical engineers
Pawel Szulc
 
Illogical engineers
Illogical engineersIllogical engineers
Illogical engineers
Pawel Szulc
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native Apps
Araf Karsh Hamid
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
Tao Xie
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Tom Mens
 
Agile and Agile methods: what is the most important to understand to succeed
Agile and Agile methods: what is the most important to understand to succeedAgile and Agile methods: what is the most important to understand to succeed
Agile and Agile methods: what is the most important to understand to succeed
Vaidas Adomauskas
 
Right-sized Architecture: Integrity for Emerging Designs
Right-sized Architecture: Integrity for Emerging DesignsRight-sized Architecture: Integrity for Emerging Designs
Right-sized Architecture: Integrity for Emerging Designs
TechWell
 
Cytoscape CI Chapter 2
Cytoscape CI Chapter 2Cytoscape CI Chapter 2
Cytoscape CI Chapter 2
bdemchak
 
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
NETFest
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
Tao Xie
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
Tao Xie
 
DevOps – what is it? Why? Is it real? How to do it?
DevOps – what is it? Why? Is it real? How to do it?DevOps – what is it? Why? Is it real? How to do it?
DevOps – what is it? Why? Is it real? How to do it?
Sailaja Tennati
 
Ad

More from Ladislav Prskavec (20)

Modern Web Architecture<br>based on JS, API and Markup
Modern Web Architecture<br>based on JS, API and MarkupModern Web Architecture<br>based on JS, API and Markup
Modern Web Architecture<br>based on JS, API and Markup
Ladislav Prskavec
 
How you can kill Wordpress!
How you can kill Wordpress!How you can kill Wordpress!
How you can kill Wordpress!
Ladislav Prskavec
 
SRE in Startup
SRE in StartupSRE in Startup
SRE in Startup
Ladislav Prskavec
 
CI and CD
CI and CDCI and CD
CI and CD
Ladislav Prskavec
 
Datascript: Serverless Architetecture
Datascript: Serverless ArchitetectureDatascript: Serverless Architetecture
Datascript: Serverless Architetecture
Ladislav Prskavec
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
Ladislav Prskavec
 
CI and CD
CI and CDCI and CD
CI and CD
Ladislav Prskavec
 
PragueJS meetups 30th anniversary
PragueJS meetups 30th anniversaryPragueJS meetups 30th anniversary
PragueJS meetups 30th anniversary
Ladislav Prskavec
 
How to easy deploy app into any cloud
How to easy deploy app into any cloudHow to easy deploy app into any cloud
How to easy deploy app into any cloud
Ladislav Prskavec
 
Docker - modern platform for developement and operations
Docker - modern platform for developement and operationsDocker - modern platform for developement and operations
Docker - modern platform for developement and operations
Ladislav Prskavec
 
GDGSCL - Docker a jeho provoz v Heroku a AWS
GDGSCL - Docker a jeho provoz v Heroku a AWSGDGSCL - Docker a jeho provoz v Heroku a AWS
GDGSCL - Docker a jeho provoz v Heroku a AWS
Ladislav Prskavec
 
AWS Elastic Container Service
AWS Elastic Container ServiceAWS Elastic Container Service
AWS Elastic Container Service
Ladislav Prskavec
 
Comparison nodejs frameworks using Polls API
Comparison nodejs frameworks using Polls APIComparison nodejs frameworks using Polls API
Comparison nodejs frameworks using Polls API
Ladislav Prskavec
 
Docker Elastic Beanstalk
Docker Elastic BeanstalkDocker Elastic Beanstalk
Docker Elastic Beanstalk
Ladislav Prskavec
 
Docker včera, dnes a zítra
Docker včera, dnes a zítraDocker včera, dnes a zítra
Docker včera, dnes a zítra
Ladislav Prskavec
 
Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.
Ladislav Prskavec
 
Docker.io
Docker.ioDocker.io
Docker.io
Ladislav Prskavec
 
Docker.io
Docker.ioDocker.io
Docker.io
Ladislav Prskavec
 
AngularJS
AngularJSAngularJS
AngularJS
Ladislav Prskavec
 
Firebase and AngularJS
Firebase and AngularJSFirebase and AngularJS
Firebase and AngularJS
Ladislav Prskavec
 
Modern Web Architecture<br>based on JS, API and Markup
Modern Web Architecture<br>based on JS, API and MarkupModern Web Architecture<br>based on JS, API and Markup
Modern Web Architecture<br>based on JS, API and Markup
Ladislav Prskavec
 
Datascript: Serverless Architetecture
Datascript: Serverless ArchitetectureDatascript: Serverless Architetecture
Datascript: Serverless Architetecture
Ladislav Prskavec
 
PragueJS meetups 30th anniversary
PragueJS meetups 30th anniversaryPragueJS meetups 30th anniversary
PragueJS meetups 30th anniversary
Ladislav Prskavec
 
How to easy deploy app into any cloud
How to easy deploy app into any cloudHow to easy deploy app into any cloud
How to easy deploy app into any cloud
Ladislav Prskavec
 
Docker - modern platform for developement and operations
Docker - modern platform for developement and operationsDocker - modern platform for developement and operations
Docker - modern platform for developement and operations
Ladislav Prskavec
 
GDGSCL - Docker a jeho provoz v Heroku a AWS
GDGSCL - Docker a jeho provoz v Heroku a AWSGDGSCL - Docker a jeho provoz v Heroku a AWS
GDGSCL - Docker a jeho provoz v Heroku a AWS
Ladislav Prskavec
 
AWS Elastic Container Service
AWS Elastic Container ServiceAWS Elastic Container Service
AWS Elastic Container Service
Ladislav Prskavec
 
Comparison nodejs frameworks using Polls API
Comparison nodejs frameworks using Polls APIComparison nodejs frameworks using Polls API
Comparison nodejs frameworks using Polls API
Ladislav Prskavec
 
Docker včera, dnes a zítra
Docker včera, dnes a zítraDocker včera, dnes a zítra
Docker včera, dnes a zítra
Ladislav Prskavec
 
Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.
Ladislav Prskavec
 
Ad

Recently uploaded (20)

AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 

SRE in Apiary

  • 1. SRE in Apiary CZJUG 21.5.2018 Ladislav Prskavec @abtris 1
  • 3. "What happens when a software engineer is tasked with what used to be called operations." » Ben Treynor Sloss, Vice President, Google Engineering, founder of Google SRE 3
  • 4. SRE implement DevOps — Google Cloud 4
  • 5. Apiary in numbers » Apiary users: 336,786 Apiary API projects: 440,178 » Apiary engineers: 19 » Apiary platform engineers: 10 + 4 » Apiary SREs: 4 » App deploys: 15 (per week) » Parsing service invocations [1 day]: ~200k » CI build: ~19 min (8 parallel workers) 5
  • 6. How we started with SRE team 6
  • 7. 2014 - 2 people software developer and ops guy 7
  • 8. 2015-2017 - 3 people software developers 8
  • 9. 2018 - 4 people 2 seniors, 2 juniors 9
  • 11. No team separation » bounder context, but ... » Shared ownership of platform - shared responsibility » Shared tooling (debug, deploy, monitor) » Shared codebase » Brainstorm » Motivation for good design (monitoring, future debugging) 11
  • 12. Things break » They do - better be ready » Knowing when there's problem (logs, metrics, alerting) » Having someone there - being oncall » Responding (mitigation, resolution) » Learning from it (postmortems) 12
  • 13. Measure everything » No gut feeling when we have the data (app metrics, runtime metrics) » Both production and non-production systems (e.g. our CI test time) » Thresholds, automated alerting » Visualize the data (oncall dashboard, happiness dashboard) 13
  • 14. 14
  • 15. 15
  • 16. Gradual changes » Delivery vs deploy » Continuous Integration / Continuous Delivery (CI/CD) Automated testing within CI » Testing environments (similar to production) » Short iterations, fast rollbacks » No-downtime deploy & immutable » Rolling delivery 16
  • 17. Tooling & automation » oncall logistics » schedules » escalations » alerting » conflicts » documentation » runbooks » internal processes » domain dictionary 17
  • 18. Reason 1. Decreasing changes of errors » Source and great post: http://www.devops.ch/2017/05/10/devops-explained/ 18
  • 19. Reason 2: Eliminating toil, work that is: » Repetitive » Automatable » Doesn't provide enduring value » Scales linearly with service » Compounds significantly and surprisingly 19
  • 20. Reason 3: Focusing on creative engineering work that: » Improves reliability » Improves performance & stability of systems » Ensures scalability » Reduces toil » Is fun: improves morale, speeds up progress, allows skill development 20
  • 21. Incidents Types: » Low-priority incident » High-priority incident » Security incident Both production and non-production systems 21
  • 22. Being oncall » Shared among developers (roles, not individuals, increase bus factor) Responsible for the platform » Safety net - you know who to call » Runbooks - you know what to do » Early alerting - proactively investigate 22
  • 23. Incident response If critical: Incident commander role Separate roles, if necessary: » outbound and inbound communication » root cause analysis » issue mitigation Tracking time (incident ack expiration) and keeping track Tooling (alerts, paging, postmortem reminders) 23
  • 24. Postmortems » Root cause » Lessons learned » Actionable items » Prevent future issues » Create runbooks » Blameless » Generated reminders 24
  • 25. Incident reviews » Weekly, team lead sync » Reviewing past incidents - types, occurrence, actionability » Discuss improvements » Incident fatigue prevention 25
  • 27. Summary » Culture is more important than process! » Start early and work on improvements! » Product owner for SRE work is useful role! 27
  • 28. References » SRE vs. DevOps: competing standards or close friends? » SRE Weekly » Awesome Site Reliability Engineering Books » SRE book » Seeking SRE 28