SlideShare a Scribd company logo
1
Confidential do not distribute 1
Chris Lavery - Senior Site Reliability Engineer
SRE and GitOps for
Building Robust
Kubernetes Platforms
2
2
Webinar Platform - FAQs
Using Zoom
• You are in listen only mode
• This webinar is being recorded
• Q&A session will follow the presentation, please use the Q&A panel to
submit questions
• Hit escape to exit full screen
• Please introduce yourself in the chat.
Technical Issues - please visit Zoom Help
https://support.zoom.us/hc/en-us/articles/206175806-Top-Questions
3
Weaveworks’ is backed by solid investors
Weaveworks created the GitOps
methodology and tooling to solve our own
Kubernetes management, scalability,
and reliability requirements
Weaveworks is a key partner with all the
major infrastructure and Kubernetes vendors
Weaveworks: the GitOps company
Weaveworks is deeply committed
to the Open Source Community
4
Confidential do not distribute 4
Weaveworks
Site
Reliability
Engineer
Chris Lavery
Belfast, Northern Ireland
Chris Lavery is a Senior Site Reliability Engineer at Weaveworks (currently on secondment to
Deutsche Telekom) where he champions continuous improvement through DevOps/GitOps
practices, collaborating with multiple technical and non-technical stakeholders to achieve
organisational goals effectively.
Chris has experience around high performance computing and modern data center
architectures, familiarity with different use cases and verticals (Telecoms, Fintech, Gaming).
Outside of work Chris enjoys cycling, music and a neverending list of DIY tasks.
https://www.linkedin.com/in/christopherlavery
https://twitter.com/mrchrislavery
github.com/fire-ant
5
Introducing Site Reliability Engineering (SRE)
A (basic) definition:
▪ Operations focused (Production Infrastructure, On-call work)
▪ Often Embedded within Development teams to facilitate
better operational outcomes
▪ Applying Software Engineering Principles to Operational
Problems.
6
Introducing Site Reliability Engineering (SRE)
Why SRE ? What has changed?
▪ DevOps has progressed the culture of collaboration
between development and operations
▪ Cloud computing has standardized/commodified aspects of
software businesses ie. infrastructure as a service
▪ Consumers/End Users have transitioned to digital services
7
Data Driven Decisions
Data Driven Decisions
▪ Modern Distributed Systems have more moving parts
▪ These systems may be a hybrid of managed cloud
resources/services and bespoke custom logic
▪ Modern Application Architectures can also be distributed in
nature
▪ Increasing Cardinality (more things per thing) can provide
better granularity/visibility but at the cost of complexity
8
Data Driven Decisions
Observability:
▪ System/Service Metrics (Infrastructure)
▪ Logging (and Events)
▪ Tracing (and Spans)
APM (Application Performance Monitoring)
9
SLIs and SLOs: Dashboard Gauges, Minimum
Indicators and Penalty points
▪ SLIs - Service Level Indicator
▪ A measurement derived from one or more underlying
metrics to support an SLO
▪ For example: A Yield SLI (fraction of successful requests)
can inform an availability SLA based on an SLO of 97.5%
10
SLIs and SLOs: Dashboard Gauges, Minimum
Indicators and Penalty points
▪ SLOs - Service Level Objective
▪ The defined threshold that a service should ideally operate
above or within as per the terms of an SLA
11
SLIs and SLOs: Dashboard Gauges, Minimum
Indicators and Penalty points
▪ SLAs - Service Level Agreements
▪ An agreement contracted with a customer to set
expectations and convey minimum levels of service
12
SLIs and SLOs: Dashboard Gauges, Minimum
Indicators and Penalty points
▪ Reliable enough, but no more reliable than it needs to be.
▪ Use your error budget wisely.
▪ Use it to take risks.
▪ An error budget is 1 minus the SLO of the service
13
Uptime
Traditional Focus is on Uptime
▪ If you were down for 1 second per day, you would exceed the 5 9’s SLA.
▪ This level of uptime is expensive and not often needed.
▪ Set your uptime to a reasonable level and no better
Uptime Target Yearly allowed downtime
99% 3d 15h 39m 29s
99.9% 8h 45m 56s
99.99% 52m 35s
99.999% 5m 15s
14
Service Levels Summary
Service Level Agreements = Agreement with a
Customer to provide a service or penalties are
applied.
Service Level Objectives = A promise that must
be achieved, e.g. uptime, response time
Service Level Indicators = The metric that
corresponds to meeting an objective, e.g.
measured uptime, latency, error rates etc.
15
DORA: DevOps Research and Assessment
16
DORA: DevOps Research and Assessment
▪ Low Lead Time (minutes > hours > days)
▪ High Deployment Frequency (minutes < hours <
days)
17
DORA: DevOps Research and Assessment
▪ Low Lead Time (minutes > hours > days)
▪ High Deployment Frequency (minutes < hours <
days)
▪ Change failure rate
▪ Time to restore
18
DORA: DevOps Research and Assessment
▪ Lead Time + Deployment Frequency = Throughput
▪ Change failure rate + Time to restore = Stability
19
DORA: DevOps Research and Assessment
20
Confidential do not distribute 20
GitOps SRE
21
Confidential do not distribute 21
The entire
system is
described
declaratively
The canonical
desired system
state is
versioned in git
Software agents
ensure
correctness and
perform actions
on divergence in
a closed loop
The Principles of GitOps
Approved
changes can be
automatically
applied
to the system
Weave GitOps
Continuous delivery and operations for Kubernetes
22
23
Embrace Risk
Traditional systems focus on Availability such as Uptime, but ignores velocity of
deploying new features.
An SRE balances both availability and velocity of features.
So risk is an acceptable part of the system
and should not be avoided,
but it should be managed.
24
Progressive Delivery: Overview
● Progressive delivery is the practice of limiting the
audience for your code changes or new feature
releases
● It is done to restrict the exposure area to a
minimum in case of any risk instances
● Progressive delivery can be implemented
through a number of strategies from A/B testing
to canary, Blue/Green, Rolling/Immutable
upgrades or Feature flag management
● Make Deployments boring!
25
Progressive Delivery: SRE orientation
● Leverage the data emitted from an instrumented
and observable system to reason about whether
to proceed or rollback
● Configure the application infrastructure (service
meshes, key metrics), SLA (overall downtime or
service degradation) and automated operations
(rollback/revert strategy)
● Foster higher collaboration and enables higher
velocity through early feedback and deeper
insight. Transferable approach which can be
standardised with components and configurations
within an organisation
26
Progressive Delivery: Flagger
● Cloud Native Open source Progressive delivery Operator
● Broad support for all popular Service meshes, Ingresses and K8s
Gateway API
● Designed with GitOps Methodologies in mind and a
complementary component to FluxCD
● Commercial UI Integration in Weave GitOps
● Adopters include CNCF projects right through to Enterprise users
27
Progressive Delivery: How it works
28
Progressive Delivery: Monitoring
29
Progressive Delivery: Visualisation
30
Progressive Delivery: Visualisation
31
Progressive Delivery: Visualisation
32
Progressive Delivery: Takeaways
● Progressive delivery is the practice of limiting the audience for your code changes or new
feature releases
● It is done to restrict the exposure area to a minimum in case of any risk instances
● Progressive delivery can be implemented through a number of strategies from A/B testing
to canary, Blue/Green or Feature flag management
33
Progressive Delivery: Business Value
● Deployment Frequency is higher with GitOps & Progressive Delivery
● Lead times are shorter with Progressive Delivery
● Change Failure Rate is lower with Progressive Delivery
● Mean Time to Recovery is shorter with GitOps
34
34
Whitepaper: Progressive Delivery with GitOps
https://bit.ly/3K8oZwU
Learn about Weave GitOps Assured
www.weave.works/product/gitops/
Learn more about Weave GitOps Enterprise
www.weave.works/enterprise and a 5 min demo
https://youtu.be/aqJaHNCz2lM
Request a personal demo
www.weave.works/contact
More information
35
Confidential do not distribute
3
5
You
Thank
Join our Community
https://slack.weave.works
Contact Us
sales@weave.works
Our products & services
www.weave.works
36
Progressive Delivery: Deployment Strategies
Strategy Positive Negative
Rolling Zero Downtime Longer Deployments
Canary Low Risk/Early Feedback Complex Monitoring/Testing
Blue/Green Zero Downtime + Fast
Rollback
Cost (2 active instances)
Feature Flags Controlled Release + test in
Production
Increased complexity in app
management
Immutable Predictable and Simple
process
Increase cost, Data
persistence

More Related Content

What's hot (20)

PDF
Kubernetes GitOps featuring GitHub, Kustomize and ArgoCD
Sunnyvale
 
PDF
Get started with gitops and flux
LibbySchulze1
 
PPTX
GitOps - Modern best practices for high velocity app dev using cloud native t...
Weaveworks
 
PDF
DevSecOps: The DoD Software Factory
scoopnewsgroup
 
PDF
Azure DevOps & GitHub... Better Together!
Lorenzo Barbieri
 
PDF
GitOps with ArgoCD
CloudOps2005
 
PDF
Gitops Hands On
Brice Fernandes
 
PDF
Let's build Developer Portal with Backstage
Opsta
 
PDF
Gitops: the kubernetes way
sparkfabrik
 
PDF
Implementing Flux for Scale with Soft Multi-tenancy
Weaveworks
 
PDF
DevOps Best Practices
Giragadurai Vallirajan
 
PDF
ArgoCD Meetup PPT final.pdf
amanmakwana3
 
PDF
Monoliths to microservices workshop
Judy Breedlove
 
PPTX
Using Azure DevOps to continuously build, test, and deploy containerized appl...
Adrian Todorov
 
PDF
Gitops: a new paradigm for software defined operations
Mariano Cunietti
 
PPTX
SRE-iously! Reliability!
New Relic
 
PDF
Managing Infrastructure as a Product - Introduction to Platform Engineering
Adityo Pratomo
 
PPTX
Azure DevOps in Action
Callon Campbell
 
PDF
Red Hat multi-cluster management & what's new in OpenShift
Kangaroot
 
PPTX
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
InfluxData
 
Kubernetes GitOps featuring GitHub, Kustomize and ArgoCD
Sunnyvale
 
Get started with gitops and flux
LibbySchulze1
 
GitOps - Modern best practices for high velocity app dev using cloud native t...
Weaveworks
 
DevSecOps: The DoD Software Factory
scoopnewsgroup
 
Azure DevOps & GitHub... Better Together!
Lorenzo Barbieri
 
GitOps with ArgoCD
CloudOps2005
 
Gitops Hands On
Brice Fernandes
 
Let's build Developer Portal with Backstage
Opsta
 
Gitops: the kubernetes way
sparkfabrik
 
Implementing Flux for Scale with Soft Multi-tenancy
Weaveworks
 
DevOps Best Practices
Giragadurai Vallirajan
 
ArgoCD Meetup PPT final.pdf
amanmakwana3
 
Monoliths to microservices workshop
Judy Breedlove
 
Using Azure DevOps to continuously build, test, and deploy containerized appl...
Adrian Todorov
 
Gitops: a new paradigm for software defined operations
Mariano Cunietti
 
SRE-iously! Reliability!
New Relic
 
Managing Infrastructure as a Product - Introduction to Platform Engineering
Adityo Pratomo
 
Azure DevOps in Action
Callon Campbell
 
Red Hat multi-cluster management & what's new in OpenShift
Kangaroot
 
Reduce SRE Stress: Minimizing Service Downtime with Grafana, InfluxDB and Tel...
InfluxData
 

Similar to SRE and GitOps for Building Robust Kubernetes Platforms.pdf (20)

PDF
Cloud Native Engineering with SRE and GitOps
Weaveworks
 
PDF
SRE & Kubernetes
Afkham Azeez
 
PDF
Sre With Java Microservices Patterns For Reliable Microservices In The Enterp...
sharatacosra
 
PDF
PDF GitOps Cookbook (Third Early Release) Natale Vinto download
xamysakuchuk
 
PDF
Free GitOps Workshop
Weaveworks
 
PDF
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Weaveworks
 
PDF
What is DevOps? And Why Use DevOps? What?
jvntecnologia
 
PDF
How to get started with Site Reliability Engineering
Andrew Kirkpatrick
 
PDF
Site Reliability Engineering slide deck 101
ManikumarKothapalli1
 
PPTX
SRE (service reliability engineer) on big DevOps platform running on the clou...
DevClub_lv
 
PDF
Site-Reliability-Engineering-v2[6241].pdf
DeepakGupta747774
 
PPTX
DevOps & Site Reliability Engineering (SRE).pptx
abiguimeleroy
 
PPTX
A Blueprint for a Successful DevOps Metamorphosis
XebiaLabs
 
PDF
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
Weaveworks
 
PPTX
Facilitating DevOps Execution in an All Digital Environment
Kurt Andersen
 
PPTX
GitOps in a nutshell (Montreal CNCF meetup May 2024)
Lucien Boix
 
PPTX
Cloud Native Apps with GitOps
Weaveworks
 
PPTX
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
ShikhaSrivastava820471
 
PPTX
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
Shikha Srivastava
 
PDF
La La Land of DevOps Integration (Continuous Lifecycle London 2018)
Manuel Pais
 
Cloud Native Engineering with SRE and GitOps
Weaveworks
 
SRE & Kubernetes
Afkham Azeez
 
Sre With Java Microservices Patterns For Reliable Microservices In The Enterp...
sharatacosra
 
PDF GitOps Cookbook (Third Early Release) Natale Vinto download
xamysakuchuk
 
Free GitOps Workshop
Weaveworks
 
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Weaveworks
 
What is DevOps? And Why Use DevOps? What?
jvntecnologia
 
How to get started with Site Reliability Engineering
Andrew Kirkpatrick
 
Site Reliability Engineering slide deck 101
ManikumarKothapalli1
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
DevClub_lv
 
Site-Reliability-Engineering-v2[6241].pdf
DeepakGupta747774
 
DevOps & Site Reliability Engineering (SRE).pptx
abiguimeleroy
 
A Blueprint for a Successful DevOps Metamorphosis
XebiaLabs
 
Shift Deployment Security Left with Weave GitOps & Upbound’s Universal Crossp...
Weaveworks
 
Facilitating DevOps Execution in an All Digital Environment
Kurt Andersen
 
GitOps in a nutshell (Montreal CNCF meetup May 2024)
Lucien Boix
 
Cloud Native Apps with GitOps
Weaveworks
 
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
ShikhaSrivastava820471
 
ADDO_2022_SRE Architectural Patterns_Nov10.pptx
Shikha Srivastava
 
La La Land of DevOps Integration (Continuous Lifecycle London 2018)
Manuel Pais
 
Ad

More from Weaveworks (20)

PDF
Weave AI Controllers (Weave GitOps Office Hours)
Weaveworks
 
PDF
Flamingo: Expand ArgoCD with Flux (Office Hours)
Weaveworks
 
PDF
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Weaveworks
 
PDF
Six Signs You Need Platform Engineering
Weaveworks
 
PDF
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
Weaveworks
 
PDF
Flux Beyond Git Harnessing the Power of OCI
Weaveworks
 
PDF
Automated Provisioning, Management & Cost Control for Kubernetes Clusters
Weaveworks
 
PDF
How to Avoid Kubernetes Multi-tenancy Catastrophes
Weaveworks
 
PDF
Building internal developer platform with EKS and GitOps
Weaveworks
 
PDF
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Weaveworks
 
PDF
Accelerating Hybrid Multistage Delivery with Weave GitOps on EKS
Weaveworks
 
PDF
The Story of Flux Reaching Graduation in the CNCF
Weaveworks
 
PDF
Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...
Weaveworks
 
PDF
Flux’s Security & Scalability with OCI & Helm Slides.pdf
Weaveworks
 
PDF
Flux Security & Scalability using VS Code GitOps Extension
Weaveworks
 
PDF
Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps
Weaveworks
 
PDF
Robust Network Security and Observability with GitOps and Cilium
Weaveworks
 
PDF
Intro to GitOps & Flux.pdf
Weaveworks
 
PDF
Simplifying Hybrid Kubernetes with Weaveworks and EKS.pdf
Weaveworks
 
PDF
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weaveworks
 
Weave AI Controllers (Weave GitOps Office Hours)
Weaveworks
 
Flamingo: Expand ArgoCD with Flux (Office Hours)
Weaveworks
 
Webinar: Capabilities, Confidence and Community – What Flux GA Means for You
Weaveworks
 
Six Signs You Need Platform Engineering
Weaveworks
 
Webinar: End to End Security & Operations with Chainguard and Weave GitOps
Weaveworks
 
Flux Beyond Git Harnessing the Power of OCI
Weaveworks
 
Automated Provisioning, Management & Cost Control for Kubernetes Clusters
Weaveworks
 
How to Avoid Kubernetes Multi-tenancy Catastrophes
Weaveworks
 
Building internal developer platform with EKS and GitOps
Weaveworks
 
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Weaveworks
 
Accelerating Hybrid Multistage Delivery with Weave GitOps on EKS
Weaveworks
 
The Story of Flux Reaching Graduation in the CNCF
Weaveworks
 
Securing Your App Deployments with Tunnels, OIDC, RBAC, and Progressive Deliv...
Weaveworks
 
Flux’s Security & Scalability with OCI & Helm Slides.pdf
Weaveworks
 
Flux Security & Scalability using VS Code GitOps Extension
Weaveworks
 
Deploying Stateful Applications Securely & Confidently with Ondat & Weave GitOps
Weaveworks
 
Robust Network Security and Observability with GitOps and Cilium
Weaveworks
 
Intro to GitOps & Flux.pdf
Weaveworks
 
Simplifying Hybrid Kubernetes with Weaveworks and EKS.pdf
Weaveworks
 
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weaveworks
 
Ad

Recently uploaded (20)

PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Human Resources Information System (HRIS)
Amity University, Patna
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 

SRE and GitOps for Building Robust Kubernetes Platforms.pdf

  • 1. 1 Confidential do not distribute 1 Chris Lavery - Senior Site Reliability Engineer SRE and GitOps for Building Robust Kubernetes Platforms
  • 2. 2 2 Webinar Platform - FAQs Using Zoom • You are in listen only mode • This webinar is being recorded • Q&A session will follow the presentation, please use the Q&A panel to submit questions • Hit escape to exit full screen • Please introduce yourself in the chat. Technical Issues - please visit Zoom Help https://support.zoom.us/hc/en-us/articles/206175806-Top-Questions
  • 3. 3 Weaveworks’ is backed by solid investors Weaveworks created the GitOps methodology and tooling to solve our own Kubernetes management, scalability, and reliability requirements Weaveworks is a key partner with all the major infrastructure and Kubernetes vendors Weaveworks: the GitOps company Weaveworks is deeply committed to the Open Source Community
  • 4. 4 Confidential do not distribute 4 Weaveworks Site Reliability Engineer Chris Lavery Belfast, Northern Ireland Chris Lavery is a Senior Site Reliability Engineer at Weaveworks (currently on secondment to Deutsche Telekom) where he champions continuous improvement through DevOps/GitOps practices, collaborating with multiple technical and non-technical stakeholders to achieve organisational goals effectively. Chris has experience around high performance computing and modern data center architectures, familiarity with different use cases and verticals (Telecoms, Fintech, Gaming). Outside of work Chris enjoys cycling, music and a neverending list of DIY tasks. https://www.linkedin.com/in/christopherlavery https://twitter.com/mrchrislavery github.com/fire-ant
  • 5. 5 Introducing Site Reliability Engineering (SRE) A (basic) definition: ▪ Operations focused (Production Infrastructure, On-call work) ▪ Often Embedded within Development teams to facilitate better operational outcomes ▪ Applying Software Engineering Principles to Operational Problems.
  • 6. 6 Introducing Site Reliability Engineering (SRE) Why SRE ? What has changed? ▪ DevOps has progressed the culture of collaboration between development and operations ▪ Cloud computing has standardized/commodified aspects of software businesses ie. infrastructure as a service ▪ Consumers/End Users have transitioned to digital services
  • 7. 7 Data Driven Decisions Data Driven Decisions ▪ Modern Distributed Systems have more moving parts ▪ These systems may be a hybrid of managed cloud resources/services and bespoke custom logic ▪ Modern Application Architectures can also be distributed in nature ▪ Increasing Cardinality (more things per thing) can provide better granularity/visibility but at the cost of complexity
  • 8. 8 Data Driven Decisions Observability: ▪ System/Service Metrics (Infrastructure) ▪ Logging (and Events) ▪ Tracing (and Spans) APM (Application Performance Monitoring)
  • 9. 9 SLIs and SLOs: Dashboard Gauges, Minimum Indicators and Penalty points ▪ SLIs - Service Level Indicator ▪ A measurement derived from one or more underlying metrics to support an SLO ▪ For example: A Yield SLI (fraction of successful requests) can inform an availability SLA based on an SLO of 97.5%
  • 10. 10 SLIs and SLOs: Dashboard Gauges, Minimum Indicators and Penalty points ▪ SLOs - Service Level Objective ▪ The defined threshold that a service should ideally operate above or within as per the terms of an SLA
  • 11. 11 SLIs and SLOs: Dashboard Gauges, Minimum Indicators and Penalty points ▪ SLAs - Service Level Agreements ▪ An agreement contracted with a customer to set expectations and convey minimum levels of service
  • 12. 12 SLIs and SLOs: Dashboard Gauges, Minimum Indicators and Penalty points ▪ Reliable enough, but no more reliable than it needs to be. ▪ Use your error budget wisely. ▪ Use it to take risks. ▪ An error budget is 1 minus the SLO of the service
  • 13. 13 Uptime Traditional Focus is on Uptime ▪ If you were down for 1 second per day, you would exceed the 5 9’s SLA. ▪ This level of uptime is expensive and not often needed. ▪ Set your uptime to a reasonable level and no better Uptime Target Yearly allowed downtime 99% 3d 15h 39m 29s 99.9% 8h 45m 56s 99.99% 52m 35s 99.999% 5m 15s
  • 14. 14 Service Levels Summary Service Level Agreements = Agreement with a Customer to provide a service or penalties are applied. Service Level Objectives = A promise that must be achieved, e.g. uptime, response time Service Level Indicators = The metric that corresponds to meeting an objective, e.g. measured uptime, latency, error rates etc.
  • 15. 15 DORA: DevOps Research and Assessment
  • 16. 16 DORA: DevOps Research and Assessment ▪ Low Lead Time (minutes > hours > days) ▪ High Deployment Frequency (minutes < hours < days)
  • 17. 17 DORA: DevOps Research and Assessment ▪ Low Lead Time (minutes > hours > days) ▪ High Deployment Frequency (minutes < hours < days) ▪ Change failure rate ▪ Time to restore
  • 18. 18 DORA: DevOps Research and Assessment ▪ Lead Time + Deployment Frequency = Throughput ▪ Change failure rate + Time to restore = Stability
  • 19. 19 DORA: DevOps Research and Assessment
  • 20. 20 Confidential do not distribute 20 GitOps SRE
  • 21. 21 Confidential do not distribute 21 The entire system is described declaratively The canonical desired system state is versioned in git Software agents ensure correctness and perform actions on divergence in a closed loop The Principles of GitOps Approved changes can be automatically applied to the system
  • 22. Weave GitOps Continuous delivery and operations for Kubernetes 22
  • 23. 23 Embrace Risk Traditional systems focus on Availability such as Uptime, but ignores velocity of deploying new features. An SRE balances both availability and velocity of features. So risk is an acceptable part of the system and should not be avoided, but it should be managed.
  • 24. 24 Progressive Delivery: Overview ● Progressive delivery is the practice of limiting the audience for your code changes or new feature releases ● It is done to restrict the exposure area to a minimum in case of any risk instances ● Progressive delivery can be implemented through a number of strategies from A/B testing to canary, Blue/Green, Rolling/Immutable upgrades or Feature flag management ● Make Deployments boring!
  • 25. 25 Progressive Delivery: SRE orientation ● Leverage the data emitted from an instrumented and observable system to reason about whether to proceed or rollback ● Configure the application infrastructure (service meshes, key metrics), SLA (overall downtime or service degradation) and automated operations (rollback/revert strategy) ● Foster higher collaboration and enables higher velocity through early feedback and deeper insight. Transferable approach which can be standardised with components and configurations within an organisation
  • 26. 26 Progressive Delivery: Flagger ● Cloud Native Open source Progressive delivery Operator ● Broad support for all popular Service meshes, Ingresses and K8s Gateway API ● Designed with GitOps Methodologies in mind and a complementary component to FluxCD ● Commercial UI Integration in Weave GitOps ● Adopters include CNCF projects right through to Enterprise users
  • 32. 32 Progressive Delivery: Takeaways ● Progressive delivery is the practice of limiting the audience for your code changes or new feature releases ● It is done to restrict the exposure area to a minimum in case of any risk instances ● Progressive delivery can be implemented through a number of strategies from A/B testing to canary, Blue/Green or Feature flag management
  • 33. 33 Progressive Delivery: Business Value ● Deployment Frequency is higher with GitOps & Progressive Delivery ● Lead times are shorter with Progressive Delivery ● Change Failure Rate is lower with Progressive Delivery ● Mean Time to Recovery is shorter with GitOps
  • 34. 34 34 Whitepaper: Progressive Delivery with GitOps https://bit.ly/3K8oZwU Learn about Weave GitOps Assured www.weave.works/product/gitops/ Learn more about Weave GitOps Enterprise www.weave.works/enterprise and a 5 min demo https://youtu.be/aqJaHNCz2lM Request a personal demo www.weave.works/contact More information
  • 35. 35 Confidential do not distribute 3 5 You Thank Join our Community https://slack.weave.works Contact Us [email protected] Our products & services www.weave.works
  • 36. 36 Progressive Delivery: Deployment Strategies Strategy Positive Negative Rolling Zero Downtime Longer Deployments Canary Low Risk/Early Feedback Complex Monitoring/Testing Blue/Green Zero Downtime + Fast Rollback Cost (2 active instances) Feature Flags Controlled Release + test in Production Increased complexity in app management Immutable Predictable and Simple process Increase cost, Data persistence