SlideShare a Scribd company logo
SRE & Kubernetes
February, 2022
Hello!
Afkham Azeez
VP & Deputy CTO - Cloud
azeez@wso2.com
3
Off-roading, camping, birding & nature enthusiast
Amateur radio operator - 4S7AZE
/afkham_azeez /afkhamazeez
Software development in 2020 and beyond…
A paradigm shift
● Major changes to how software is designed & built are taking place
● Businesses have realized that they have to build digital experiences
● Building a ‘Digitally-driven Business’ takes time and significant engineering
effort
4
Cloud-native software engineering
Building for the Cloud, on the Cloud
● Start building your product on the cloud
⦿ Have your dev environment on the cloud
● Multi-environment on the cloud
⦿ dev, test, staging, prod
● Leverage cloud services and APIs
⦿ Don’t run everything yourself
● Containers & Kubernetes are game changers
5
With great power comes great complexity!
What is Kubernetes?
● A cluster operating system
● A collection of control loops
7
https://buttondown.email/nelhage/archive/two-reasons-kubernetes-is-so-complex/
IaC
● The process of managing and provisioning computer data centers through
machine-readable definition files, rather than physical hardware configuration or
interactive configuration tools
● Everything is code
⦿ Cluster creation
⦿ Creating workloads
⦿ System configuration
⦿ Security
⦿ etc.
8
Site Reliability Engineering
● SRE is an approach taken to solve IT Operations challenges using Software
Engineering principles.
● SREs use software as a tool to manage cloud systems, diagnose problems, and
automate tasks.
● A key role of SRE is to find the right balance between releasing new features
and ensuring they are reliable;
⦿ Dev teams want to deploy as many features as possible as soon as possible
⦿ SRE tries to facilitates the dev team’s goals while ensuring reliability
● What is reliability?
⦿ Minimizing the impact on end users by minimizing outages
9
What do SREs do?
● Define compliance standards & processes
● Write cluster/system setup code
● Define build pipelines & help dev teams setup pipelines
● Setup monitoring and alerting (code)
● Plan backup and recovery
● Plan DR strategy
● Threat modeling & security scanning
● Incident management
● Chaos engineering
● Root cause analysis
● Perform routine tasks
● Cost analysis & optimization
10
Core Concepts & Methodologies
CICD and GitOps
● Git repos as the single & central sources of truth of the current cluster
configuration
● Use standard git practices
⦿ fork -> branch -> change -> build -> send PR -> CI -> review -> merge -> CD
12
Logging
13
omsagent
Node 2
omsagent
Node 1
omsagent
omsagent
-rs
Node 3
Kubernetes Cluster
Data Explorer
Log Analytics
14
https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-log-query
Logging + analytics + alerting
15
Log publishing Analytics
Issue or
anomaly
detection
Alerting
Incident
management
Observability, Monitoring & Alerting
● Observability vs monitoring - monitoring is what you do after a system is
observable
● System level monitoring
⦿ Cluster, pod, node health
⦿ System level services/APIs health - includes errors & latencies
⦿ System logs
⦿ Intrusion detection
⦿ DoS
● Application level monitoring
⦿ Application level services/APIs health - includes errors & latencies
⦿ Internal application level observability
⦿ Application logs
16
Incident Management
Unplanned interruption to or quality reduction
of an IT service
17
Normal incident management process
18
Major incident management process
19
SLI, SLO, SLA
● SLI
⦿ Metrics used to measure the level of service provided to end-users (e.g., availability,
latency, throughput)
● SLO
⦿ Targeted levels of service, measured by SLIs
⦿ Typically expressed as a percentage over a period of time
⦿ Help you figure out the right balance between product innovation and reliability
● SLA
⦿ Contractual agreements that outline the level of service end users can expect
⦿ If these promises are not met, there can be significant consequences for the provider,
which are often financial in nature
20
Error Budget
● Error budget = 1-SLO
● Acceptable levels of unreliability for a service before it falls out of compliance
with an SLO
● Measure of risk you can take to
⦿ get new features in
⦿ stop services for maintenance
⦿ routine improvements
⦿ network and infrastructure outages
⦿ unforeseen circumstances
21
Toil & Toil Budget
● Toil
⦿ Kind of work tied to running a production service that tends to be manual, repetitive,
automatable, tactical, devoid of enduring value, and that scales linearly as a service
grows.
⦿ The SRE discipline focuses on a slump of toil as much as possible.
● Toil budget
⦿ A measure of acceptable toil
22
Cron jobs
apiVersion: batch/v1
kind: CronJob
metadata:
name: expenserpt
spec:
schedule: "0 0 1 * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: report
image: expenserpt
imagePullPolicy: IfNotPresent
restartPolicy: OnFailure
23
Cost management
● Use tools provided by cloud platforms
● Set proper cost thresholds
● Resource audit & cost analysis reports
● Set up a cost management team &
weekly reviews
24
● Kubecost
⦿ Provides real-time cost visibility and
insights for teams using Kubernetes
⦿ Helps to continuously reduce cloud
costs
Anti-fragility
● Improving resilience using fire drills, chaos monkey, security and automation
● Kubernetes liveness & readiness probes can be used for health checks
● Kubernetes secret management for sensitive data using Secrets and CSI
25
Security
● Threat modeling using methodologies such as STRIDE
● Scan code repos using tools such as Checkov
● Security specialists - DevSecOps
● Security Operations Center (SOC)
● Kubernetes
⦿ Service Accounts, roles & role bindings
⦿ Network Policies
⦿ Cluster and namespace level isolation
⦿ mTLS enforcement via service meshes
26
Business Continuity & Disaster Recovery
● BCP is the process involved in creating a system of prevention and recovery
from potential threats to a company
● What is a disaster?
⦿ An unforeseen event that could potentially put the organization at risk by interfering
with operations
● Ideally there should be BC plans for all functions of the company which are
amalgamated into a single corporate BC plan
27
Adopting SRE
A way of structuring teams
29
How can your organization adopt SRE?
● Start small & evolve
● Analyze existing team structures/processes and see how they can be adopted
● Recruiting experienced SREs can be hard
⦿ Dev2SRE program
● On the job training
● Certifications are important
⦿ CKAD, CKA, CKS
⦿ Cloud platform certifications - Azure, AWS, GCP etc.
⦿ “Well architected” programs
● Maintain a central knowledge base - document everything
● Define standards, conventions & best practices and ensure that those are
followed
● Define and continuously improve processes
● Work closely with development teams. Engage with all stakeholders.
● Get standards certifications/reports - SOC2, ISO 27001, HIPAA, HITRUST etc 30
TL;DR
● Kubernetes & even app development are just the tip of the iceberg in your
organization’s overall SRE & cloud native story
● Establishment of the SRE discipline is essential for running seamless
operations
● Start small, adapt & evolve
31
Question Time!
wso2.com
Thanks!
Ad

More Related Content

What's hot (20)

How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)
Setyo Legowo
 
Containers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioContainers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes Istio
Araf Karsh Hamid
 
CI/CD (DevOps) 101
CI/CD (DevOps) 101CI/CD (DevOps) 101
CI/CD (DevOps) 101
Hazzim Anaya
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
Brice Fernandes
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
Julian Mazzitelli
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale
Knoldus Inc.
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
New Relic
 
GitOps is IaC done right
GitOps is IaC done rightGitOps is IaC done right
GitOps is IaC done right
Chen Cheng-Wei
 
Devops On Cloud Powerpoint Template Slides Powerpoint Presentation Slides
Devops On Cloud Powerpoint Template Slides Powerpoint Presentation SlidesDevops On Cloud Powerpoint Template Slides Powerpoint Presentation Slides
Devops On Cloud Powerpoint Template Slides Powerpoint Presentation Slides
SlideTeam
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
QBurst
 
SRE 101
SRE 101SRE 101
SRE 101
Diego Pacheco
 
DevOps-as-a-Service: Towards Automating the Automation
DevOps-as-a-Service: Towards Automating the AutomationDevOps-as-a-Service: Towards Automating the Automation
DevOps-as-a-Service: Towards Automating the Automation
Keith Pleas
 
DevSecOps - The big picture
DevSecOps - The big pictureDevSecOps - The big picture
DevSecOps - The big picture
Stefan Streichsbier
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
Scaling DevSecOps Culture for Enterprise
Scaling DevSecOps Culture for EnterpriseScaling DevSecOps Culture for Enterprise
Scaling DevSecOps Culture for Enterprise
Opsta
 
Introduction to Red Hat OpenShift 4
Introduction to Red Hat OpenShift 4Introduction to Red Hat OpenShift 4
Introduction to Red Hat OpenShift 4
HngNguyn748044
 
What is DevOps? What is DevOps CoE?
What is DevOps? What is DevOps CoE? What is DevOps? What is DevOps CoE?
What is DevOps? What is DevOps CoE?
7Targets AI Sales Assistants
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
Abeer R
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101
Sander Knape
 
Metrics to Power DevOps
Metrics to Power DevOpsMetrics to Power DevOps
Metrics to Power DevOps
CollabNet
 
How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)How Small Team Get Ready for SRE (public version)
How Small Team Get Ready for SRE (public version)
Setyo Legowo
 
Containers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioContainers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes Istio
Araf Karsh Hamid
 
CI/CD (DevOps) 101
CI/CD (DevOps) 101CI/CD (DevOps) 101
CI/CD (DevOps) 101
Hazzim Anaya
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
Brice Fernandes
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
Julian Mazzitelli
 
Observability at Scale
Observability at Scale Observability at Scale
Observability at Scale
Knoldus Inc.
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
New Relic
 
GitOps is IaC done right
GitOps is IaC done rightGitOps is IaC done right
GitOps is IaC done right
Chen Cheng-Wei
 
Devops On Cloud Powerpoint Template Slides Powerpoint Presentation Slides
Devops On Cloud Powerpoint Template Slides Powerpoint Presentation SlidesDevops On Cloud Powerpoint Template Slides Powerpoint Presentation Slides
Devops On Cloud Powerpoint Template Slides Powerpoint Presentation Slides
SlideTeam
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
QBurst
 
DevOps-as-a-Service: Towards Automating the Automation
DevOps-as-a-Service: Towards Automating the AutomationDevOps-as-a-Service: Towards Automating the Automation
DevOps-as-a-Service: Towards Automating the Automation
Keith Pleas
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
Scaling DevSecOps Culture for Enterprise
Scaling DevSecOps Culture for EnterpriseScaling DevSecOps Culture for Enterprise
Scaling DevSecOps Culture for Enterprise
Opsta
 
Introduction to Red Hat OpenShift 4
Introduction to Red Hat OpenShift 4Introduction to Red Hat OpenShift 4
Introduction to Red Hat OpenShift 4
HngNguyn748044
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
Abeer R
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101
Sander Knape
 
Metrics to Power DevOps
Metrics to Power DevOpsMetrics to Power DevOps
Metrics to Power DevOps
CollabNet
 

Similar to SRE & Kubernetes (20)

OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
NETWAYS
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
Joshua Gossett
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 
Moving from Monolith to Microservices
Moving from Monolith to MicroservicesMoving from Monolith to Microservices
Moving from Monolith to Microservices
Mist.io
 
Ahmed El Mawaziny CV
Ahmed El Mawaziny CVAhmed El Mawaziny CV
Ahmed El Mawaziny CV
Ahmed El Mawaziny
 
Production-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About TechnologyProduction-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About Technology
Antoine Craske
 
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Weaveworks
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
Puppet
 
Why we should consider Open Hybrid Cloud.pdf
Why we should  consider Open Hybrid Cloud.pdfWhy we should  consider Open Hybrid Cloud.pdf
Why we should consider Open Hybrid Cloud.pdf
Masahiko Umeno
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
Puppet
 
Deploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsDeploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOps
Opsta
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
YashrajNayak4
 
GCP Security Refresher and GKE Enterprise In Action
GCP Security Refresher and GKE Enterprise In ActionGCP Security Refresher and GKE Enterprise In Action
GCP Security Refresher and GKE Enterprise In Action
Stacy Véronneau
 
CI/CD patterns for cloud native apps
CI/CD patterns for  cloud native appsCI/CD patterns for  cloud native apps
CI/CD patterns for cloud native apps
Helder Klemp
 
Workshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databasesWorkshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databases
Eduardo Piairo
 
Wie macht man aus Software einen Online-Service in der Cloud
Wie macht man aus Software einen Online-Service in der CloudWie macht man aus Software einen Online-Service in der Cloud
Wie macht man aus Software einen Online-Service in der Cloud
Aarno Aukia
 
VMWorld 2004 - Justifying the transition from Physical to Virtual
VMWorld 2004 - Justifying the transition from Physical to VirtualVMWorld 2004 - Justifying the transition from Physical to Virtual
VMWorld 2004 - Justifying the transition from Physical to Virtual
David Kent
 
Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020
VMware Tanzu
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
Haggai Philip Zagury
 
Solving the Hidden Costs of Kubernetes with Observability
Solving the Hidden Costs of Kubernetes with ObservabilitySolving the Hidden Costs of Kubernetes with Observability
Solving the Hidden Costs of Kubernetes with Observability
DevOps.com
 
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
OSMC 2022 | Scaling SLOs with K8s and Cloud-native Observability by George Ha...
NETWAYS
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
Joshua Gossett
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 
Moving from Monolith to Microservices
Moving from Monolith to MicroservicesMoving from Monolith to Microservices
Moving from Monolith to Microservices
Mist.io
 
Production-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About TechnologyProduction-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About Technology
Antoine Craske
 
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Free GitOps Workshop (with Intro to Kubernetes & GitOps)
Weaveworks
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
Puppet
 
Why we should consider Open Hybrid Cloud.pdf
Why we should  consider Open Hybrid Cloud.pdfWhy we should  consider Open Hybrid Cloud.pdf
Why we should consider Open Hybrid Cloud.pdf
Masahiko Umeno
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
Puppet
 
Deploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsDeploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOps
Opsta
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
YashrajNayak4
 
GCP Security Refresher and GKE Enterprise In Action
GCP Security Refresher and GKE Enterprise In ActionGCP Security Refresher and GKE Enterprise In Action
GCP Security Refresher and GKE Enterprise In Action
Stacy Véronneau
 
CI/CD patterns for cloud native apps
CI/CD patterns for  cloud native appsCI/CD patterns for  cloud native apps
CI/CD patterns for cloud native apps
Helder Klemp
 
Workshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databasesWorkshop: Delivering chnages for applications and databases
Workshop: Delivering chnages for applications and databases
Eduardo Piairo
 
Wie macht man aus Software einen Online-Service in der Cloud
Wie macht man aus Software einen Online-Service in der CloudWie macht man aus Software einen Online-Service in der Cloud
Wie macht man aus Software einen Online-Service in der Cloud
Aarno Aukia
 
VMWorld 2004 - Justifying the transition from Physical to Virtual
VMWorld 2004 - Justifying the transition from Physical to VirtualVMWorld 2004 - Justifying the transition from Physical to Virtual
VMWorld 2004 - Justifying the transition from Physical to Virtual
David Kent
 
Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020Accelerate Application Migration - August 5, 2020
Accelerate Application Migration - August 5, 2020
VMware Tanzu
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
Haggai Philip Zagury
 
Solving the Hidden Costs of Kubernetes with Observability
Solving the Hidden Costs of Kubernetes with ObservabilitySolving the Hidden Costs of Kubernetes with Observability
Solving the Hidden Costs of Kubernetes with Observability
DevOps.com
 
Ad

More from Afkham Azeez (20)

Microservices Resiliency with BallerinaLang
Microservices Resiliency with BallerinaLangMicroservices Resiliency with BallerinaLang
Microservices Resiliency with BallerinaLang
Afkham Azeez
 
WSO2Con USA Microservices Transactions
WSO2Con USA  Microservices TransactionsWSO2Con USA  Microservices Transactions
WSO2Con USA Microservices Transactions
Afkham Azeez
 
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
Afkham Azeez
 
Microservices with MSF4J - WSO2 Meetup
Microservices with MSF4J - WSO2 MeetupMicroservices with MSF4J - WSO2 Meetup
Microservices with MSF4J - WSO2 Meetup
Afkham Azeez
 
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
Afkham Azeez
 
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
Afkham Azeez
 
WSO2Con 2015-us-introduction-to-mss-v2
WSO2Con 2015-us-introduction-to-mss-v2WSO2Con 2015-us-introduction-to-mss-v2
WSO2Con 2015-us-introduction-to-mss-v2
Afkham Azeez
 
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
Afkham Azeez
 
Java Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable AppsJava Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable Apps
Afkham Azeez
 
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration CloudWSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
Afkham Azeez
 
Unleashing creativity through Arduino
Unleashing creativity through ArduinoUnleashing creativity through Arduino
Unleashing creativity through Arduino
Afkham Azeez
 
Wso2 con raspberry-pi-cluster
Wso2 con raspberry-pi-clusterWso2 con raspberry-pi-cluster
Wso2 con raspberry-pi-cluster
Afkham Azeez
 
Adjusting carbon topology to match high availability scenario requirements
Adjusting carbon topology to match high availability scenario requirements   Adjusting carbon topology to match high availability scenario requirements
Adjusting carbon topology to match high availability scenario requirements
Afkham Azeez
 
A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2
Afkham Azeez
 
Building a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServerBuilding a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServer
Afkham Azeez
 
Colombo
ColomboColombo
Colombo
Afkham Azeez
 
Intelli J IDEA
Intelli J IDEAIntelli J IDEA
Intelli J IDEA
Afkham Azeez
 
WSO2con 2011: Introduction to Stratos
WSO2con 2011:  Introduction to StratosWSO2con 2011:  Introduction to Stratos
WSO2con 2011: Introduction to Stratos
Afkham Azeez
 
WSO2Con 2011: Introduction to Stratos
WSO2Con 2011: Introduction to StratosWSO2Con 2011: Introduction to Stratos
WSO2Con 2011: Introduction to Stratos
Afkham Azeez
 
WSO2Con 2011: Introduction to the WSO2 Carbon Platform
WSO2Con 2011: Introduction to the WSO2 Carbon PlatformWSO2Con 2011: Introduction to the WSO2 Carbon Platform
WSO2Con 2011: Introduction to the WSO2 Carbon Platform
Afkham Azeez
 
Microservices Resiliency with BallerinaLang
Microservices Resiliency with BallerinaLangMicroservices Resiliency with BallerinaLang
Microservices Resiliency with BallerinaLang
Afkham Azeez
 
WSO2Con USA Microservices Transactions
WSO2Con USA  Microservices TransactionsWSO2Con USA  Microservices Transactions
WSO2Con USA Microservices Transactions
Afkham Azeez
 
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
Introduction to WSO2 Microservices Framework for Java (MSF4J) 2.0
Afkham Azeez
 
Microservices with MSF4J - WSO2 Meetup
Microservices with MSF4J - WSO2 MeetupMicroservices with MSF4J - WSO2 Meetup
Microservices with MSF4J - WSO2 Meetup
Afkham Azeez
 
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
Introduction to WSO2 Microservices Framework for Java - MSF4J - WSO2Con Asia ...
Afkham Azeez
 
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
WSO2ConUS 2015 - Introduction to WSO2 Microservices Server (MSS)
Afkham Azeez
 
WSO2Con 2015-us-introduction-to-mss-v2
WSO2Con 2015-us-introduction-to-mss-v2WSO2Con 2015-us-introduction-to-mss-v2
WSO2Con 2015-us-introduction-to-mss-v2
Afkham Azeez
 
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
[WSO2Con Asia 2014] Accelerating Mobile App Development with MBaaS
Afkham Azeez
 
Java Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable AppsJava Colombo: Developing Highly Scalable Apps
Java Colombo: Developing Highly Scalable Apps
Afkham Azeez
 
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration CloudWSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
WSO2Con 2013 - The Integration Game Changer: WSO2 Integration Cloud
Afkham Azeez
 
Unleashing creativity through Arduino
Unleashing creativity through ArduinoUnleashing creativity through Arduino
Unleashing creativity through Arduino
Afkham Azeez
 
Wso2 con raspberry-pi-cluster
Wso2 con raspberry-pi-clusterWso2 con raspberry-pi-cluster
Wso2 con raspberry-pi-cluster
Afkham Azeez
 
Adjusting carbon topology to match high availability scenario requirements
Adjusting carbon topology to match high availability scenario requirements   Adjusting carbon topology to match high availability scenario requirements
Adjusting carbon topology to match high availability scenario requirements
Afkham Azeez
 
A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2A multi-tenant architecture for Apache Axis2
A multi-tenant architecture for Apache Axis2
Afkham Azeez
 
Building a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServerBuilding a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServer
Afkham Azeez
 
WSO2con 2011: Introduction to Stratos
WSO2con 2011:  Introduction to StratosWSO2con 2011:  Introduction to Stratos
WSO2con 2011: Introduction to Stratos
Afkham Azeez
 
WSO2Con 2011: Introduction to Stratos
WSO2Con 2011: Introduction to StratosWSO2Con 2011: Introduction to Stratos
WSO2Con 2011: Introduction to Stratos
Afkham Azeez
 
WSO2Con 2011: Introduction to the WSO2 Carbon Platform
WSO2Con 2011: Introduction to the WSO2 Carbon PlatformWSO2Con 2011: Introduction to the WSO2 Carbon Platform
WSO2Con 2011: Introduction to the WSO2 Carbon Platform
Afkham Azeez
 
Ad

Recently uploaded (20)

Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 

SRE & Kubernetes

  • 3. 3 Off-roading, camping, birding & nature enthusiast Amateur radio operator - 4S7AZE /afkham_azeez /afkhamazeez
  • 4. Software development in 2020 and beyond… A paradigm shift ● Major changes to how software is designed & built are taking place ● Businesses have realized that they have to build digital experiences ● Building a ‘Digitally-driven Business’ takes time and significant engineering effort 4
  • 5. Cloud-native software engineering Building for the Cloud, on the Cloud ● Start building your product on the cloud ⦿ Have your dev environment on the cloud ● Multi-environment on the cloud ⦿ dev, test, staging, prod ● Leverage cloud services and APIs ⦿ Don’t run everything yourself ● Containers & Kubernetes are game changers 5
  • 6. With great power comes great complexity!
  • 7. What is Kubernetes? ● A cluster operating system ● A collection of control loops 7 https://buttondown.email/nelhage/archive/two-reasons-kubernetes-is-so-complex/
  • 8. IaC ● The process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools ● Everything is code ⦿ Cluster creation ⦿ Creating workloads ⦿ System configuration ⦿ Security ⦿ etc. 8
  • 9. Site Reliability Engineering ● SRE is an approach taken to solve IT Operations challenges using Software Engineering principles. ● SREs use software as a tool to manage cloud systems, diagnose problems, and automate tasks. ● A key role of SRE is to find the right balance between releasing new features and ensuring they are reliable; ⦿ Dev teams want to deploy as many features as possible as soon as possible ⦿ SRE tries to facilitates the dev team’s goals while ensuring reliability ● What is reliability? ⦿ Minimizing the impact on end users by minimizing outages 9
  • 10. What do SREs do? ● Define compliance standards & processes ● Write cluster/system setup code ● Define build pipelines & help dev teams setup pipelines ● Setup monitoring and alerting (code) ● Plan backup and recovery ● Plan DR strategy ● Threat modeling & security scanning ● Incident management ● Chaos engineering ● Root cause analysis ● Perform routine tasks ● Cost analysis & optimization 10
  • 11. Core Concepts & Methodologies
  • 12. CICD and GitOps ● Git repos as the single & central sources of truth of the current cluster configuration ● Use standard git practices ⦿ fork -> branch -> change -> build -> send PR -> CI -> review -> merge -> CD 12
  • 15. Logging + analytics + alerting 15 Log publishing Analytics Issue or anomaly detection Alerting Incident management
  • 16. Observability, Monitoring & Alerting ● Observability vs monitoring - monitoring is what you do after a system is observable ● System level monitoring ⦿ Cluster, pod, node health ⦿ System level services/APIs health - includes errors & latencies ⦿ System logs ⦿ Intrusion detection ⦿ DoS ● Application level monitoring ⦿ Application level services/APIs health - includes errors & latencies ⦿ Internal application level observability ⦿ Application logs 16
  • 17. Incident Management Unplanned interruption to or quality reduction of an IT service 17
  • 20. SLI, SLO, SLA ● SLI ⦿ Metrics used to measure the level of service provided to end-users (e.g., availability, latency, throughput) ● SLO ⦿ Targeted levels of service, measured by SLIs ⦿ Typically expressed as a percentage over a period of time ⦿ Help you figure out the right balance between product innovation and reliability ● SLA ⦿ Contractual agreements that outline the level of service end users can expect ⦿ If these promises are not met, there can be significant consequences for the provider, which are often financial in nature 20
  • 21. Error Budget ● Error budget = 1-SLO ● Acceptable levels of unreliability for a service before it falls out of compliance with an SLO ● Measure of risk you can take to ⦿ get new features in ⦿ stop services for maintenance ⦿ routine improvements ⦿ network and infrastructure outages ⦿ unforeseen circumstances 21
  • 22. Toil & Toil Budget ● Toil ⦿ Kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows. ⦿ The SRE discipline focuses on a slump of toil as much as possible. ● Toil budget ⦿ A measure of acceptable toil 22
  • 23. Cron jobs apiVersion: batch/v1 kind: CronJob metadata: name: expenserpt spec: schedule: "0 0 1 * *" jobTemplate: spec: template: spec: containers: - name: report image: expenserpt imagePullPolicy: IfNotPresent restartPolicy: OnFailure 23
  • 24. Cost management ● Use tools provided by cloud platforms ● Set proper cost thresholds ● Resource audit & cost analysis reports ● Set up a cost management team & weekly reviews 24 ● Kubecost ⦿ Provides real-time cost visibility and insights for teams using Kubernetes ⦿ Helps to continuously reduce cloud costs
  • 25. Anti-fragility ● Improving resilience using fire drills, chaos monkey, security and automation ● Kubernetes liveness & readiness probes can be used for health checks ● Kubernetes secret management for sensitive data using Secrets and CSI 25
  • 26. Security ● Threat modeling using methodologies such as STRIDE ● Scan code repos using tools such as Checkov ● Security specialists - DevSecOps ● Security Operations Center (SOC) ● Kubernetes ⦿ Service Accounts, roles & role bindings ⦿ Network Policies ⦿ Cluster and namespace level isolation ⦿ mTLS enforcement via service meshes 26
  • 27. Business Continuity & Disaster Recovery ● BCP is the process involved in creating a system of prevention and recovery from potential threats to a company ● What is a disaster? ⦿ An unforeseen event that could potentially put the organization at risk by interfering with operations ● Ideally there should be BC plans for all functions of the company which are amalgamated into a single corporate BC plan 27
  • 29. A way of structuring teams 29
  • 30. How can your organization adopt SRE? ● Start small & evolve ● Analyze existing team structures/processes and see how they can be adopted ● Recruiting experienced SREs can be hard ⦿ Dev2SRE program ● On the job training ● Certifications are important ⦿ CKAD, CKA, CKS ⦿ Cloud platform certifications - Azure, AWS, GCP etc. ⦿ “Well architected” programs ● Maintain a central knowledge base - document everything ● Define standards, conventions & best practices and ensure that those are followed ● Define and continuously improve processes ● Work closely with development teams. Engage with all stakeholders. ● Get standards certifications/reports - SOC2, ISO 27001, HIPAA, HITRUST etc 30
  • 31. TL;DR ● Kubernetes & even app development are just the tip of the iceberg in your organization’s overall SRE & cloud native story ● Establishment of the SRE discipline is essential for running seamless operations ● Start small, adapt & evolve 31