SlideShare a Scribd company logo
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Kubernetes - 7 lessons learned from 7 data centers in 7 months
Kubernetes for Developers Meetup – May 13, 2019
Mike Tougeron –
Senior Site Reliability Engineer @
Adobe
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
$ whoami && id | grep Adobe
 Mike Tougeron
 Senior Site Reliability Engineer @ Adobe
 Twitter: @mtougeron
 Started using Kubernetes in 2015
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Agenda
 Quick Introduction to Adobe Advertising Cloud’s Kubernetes Infrastructure
 Lesson 1: Communication, Teamwork & Training
 Lesson 2: Code to production pipelines
 Lesson 3: The ABCs of Production apps
 Lesson 4: Multi-cloud challenges
 Lesson 5: Knowing your application
 Lesson 6: Metrics based monitoring
 Lesson 7: Take a deep breath
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
High Traffic
350 billion requests
a day
Latency
<50ms @ 95th
percentile
Huge Datasets
Billions of objects to
store
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Adobe Advertising Cloud’s Kubernetes Overview
 ~225 worker nodes; growing to ~300
in May/June
 6 OpenStack data centers across 4
regions
 Running on VMs
 No persistent storage
 No autoscaling; “fixed” footprint
 Smaller but growing
 3 AWS clusters in us-east-1
 Running on m5d.12xlarge ec2 instances
 EBS volumes for persistent storage
 Uses cluster-autoscaler
 Autoscaling events many times per hour
 Prometheus for monitoring
 Dozens of Machine Learning
workloads in AWS
 Reason for frequent autoscaling events
 Cluster updates done via new Image
and rolling update of existing nodes
 Updates are deployed approx every
4-6 weeks
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 1: Communication, Teamwork & Training
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Communication: Reaching large, distributed teams
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Teamwork: Who’s responsible for what?
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Abstraction vs Experts
 Need understanding of core
resources but also need easy
onboarding
 Pair programming training sessions
 Remove need for boiler plate
 Don’t duplicate efforts by avoiding
abstraction
 Don’t abstract to the point where
you’re not using Kubernetes
 kubectl should *not* be your
entrypoint
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 2: Code to production pipelines
De
v
Pull
Request
maste
r
Unit
testin
g
merge
Deplo
y bot
Production
Integration
testing
Insert your steps here!
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Tools to help build application resources
 Helm (templating and/or tiller)
 Kustomize
 Kapitan
 and more…
 We use a combination of Helm
templating for infrastructure/3rd-party
and Kustomize for application teams
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
$> helm template --name opa --namespace opa --values ./values/globals.yaml
--values ./values/mgmt/cluster.yaml --values ./values/mgmt/adcloud-
opa/values.yaml --output-dir ../../../cloud/opa/mgmt charts/adcloud-opa
versus
$> ./build.py --chart adcloud-opa --cluster mgmt
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 3: The ABCs of Production
 HorizontalPodAutoscaler
 PodDisruptionBudget
 "DevOps"
 Cluster Upgrades
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
HorizontalPodAutoscaler
 Easily scale on CPU or Memory usage
 Also able to scale on custom metrics like
http_requests from Ingress resources
 Don’t set replicas in your Deployment
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
PodDisruptionBudget
 Not the same thing as a Deployment
strategy
 Helps prevent taking down so many Pods
that the application is overwhelmed
 Can set by minAvailable or
maxUnavailable by number or
percentage
 Good for helping keep quorum
 Doesn’t apply to manual deletions
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
DevOps
 Expertise/specialists
 But empowerment & speed
 Things get lost in shuffle
 Everyone can do everything; aka don’t forget your guardrails
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
deny[msg] {
input.request.kind.kind = "Ingress"
input.request.operation = "CREATE"
host = input.request.object.spec.rules[_].host
ingress = ingresses[other_ns][other_ingress]
other_ns != input.request.namespace
ingress.spec.rules[_].host = host
msg = sprintf("invalid ingress host %q (conflicts with
%v/%v)", [host, other_ns, other_ingress])
}
patch[patchCode] {
isCreateOrUpdate
input.request.kind.kind == "Ingress"
not hasAnnotation(input.request.object,
"kubernetes.io/ingress.class")
patchCode =
makeAnnotationPatch("add",
"kubernetes.io/ingress.class", "nginx-
internal", "")
}
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Cluster Upgrades - Blue/Green or Canary?
 Who really has the hardware to run a 2nd full
Kubernetes cluster in their datacenter?
 Public cloud is easier, but you still have cost
considerations
 Are the application team(s) able to handle
deploying to a 2nd mirrored cluster?
 Does it make more sense to run N workers of a
different version/config for a period of time?
 Do you have the visibility into the cluster to know
how one performs vs the other?
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 4: Multi-Cloud Challenges
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Multiple code-bases but consistent infrastructure
 Packer – Shared modular code base, different builders
 Terraform – Separate but closely aligned code bases
 Puppet – Same code base
 Helm – Same modular code base
 Leverage templating to build the same deployments for
different (and future) clouds
 Re-use, re-use, re-use!
 Lab environments in all clouds
 OSSIA for HV/rack metadata for region/zone
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 5: Knowing your applications
 Seems like an obvious statement but it’s easy to forget to
think about
 Kubernetes brings advantages, but not all the ones that
bare metal and virtual machines bring out of the box
 Think about how your app actually functions
 Service Discovery
 Persistent Storage
 Shared Storage (e.g. replication, sharding, etc)
 Scheduling / Restarting
 Networking Ingress / Egress
 Think about how your app is going to handle the way
Kubernetes does things
https://imgur.com/gallery/B4D7Lf1
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Elasticsearch as Deployment (What We Did)
https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly
modified)
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Oops…yeah Touge, I think something is wrong…
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Elasticsearch as StatefulSet (What We Should Have Done)
https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly
modified)
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 6: Metrics-Based Monitoring
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Lesson 7: Take a deep breath
 Same team so we all learn & fix together
 Experience has been enlightening &
engineers have had fun
 Teams already onboarded are moving
faster than before
 Dev cycle to production is faster as we
integrate more automated testing
© 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
Thanks!
Slides: https://touge.me/k8s-7lessons-meetup
Mike Tougeron
Email: tougeron@adobe.com
Twitter: @mtougeron
Images from https://stock.adobe.com
Ad

More Related Content

Similar to Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months (meetup) (20)

Cloud native java workshop
Cloud native java workshopCloud native java workshop
Cloud native java workshop
Jamie Coleman
 
MicroShed Testing
MicroShed TestingMicroShed Testing
MicroShed Testing
Andrew Guibert
 
Journey to Cloud: Fast Track to Azure
Journey to Cloud: Fast Track to AzureJourney to Cloud: Fast Track to Azure
Journey to Cloud: Fast Track to Azure
Fausto Pasqualetti
 
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Capgemini
 
So you want to provision a test environment...
So you want to provision a test environment...So you want to provision a test environment...
So you want to provision a test environment...
DevOps.com
 
Azure fundamentals
Azure fundamentalsAzure fundamentals
Azure fundamentals
Alexandre BERGERE
 
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for AzureAzure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
azuredayit
 
A Toolchain for Lean Architecture at American Airlines
A Toolchain for Lean Architecture at American AirlinesA Toolchain for Lean Architecture at American Airlines
A Toolchain for Lean Architecture at American Airlines
Shahir Daya
 
Multi cloud costs how to leverage insight and avoid overspending
Multi cloud costs  how to leverage insight and avoid overspendingMulti cloud costs  how to leverage insight and avoid overspending
Multi cloud costs how to leverage insight and avoid overspending
Appvia
 
Mobile cloud2020
Mobile cloud2020Mobile cloud2020
Mobile cloud2020
Arif A.
 
React Native App Development in 2023-Tips to Practice.pdf
React Native App Development in 2023-Tips to Practice.pdfReact Native App Development in 2023-Tips to Practice.pdf
React Native App Development in 2023-Tips to Practice.pdf
Techugo
 
Writing Applications at Cloud Scale
Writing Applications at Cloud ScaleWriting Applications at Cloud Scale
Writing Applications at Cloud Scale
Matt Ryan
 
Ensure the integration of Microservices with Consumer Driven Contracts
Ensure the integration of Microservices with Consumer Driven ContractsEnsure the integration of Microservices with Consumer Driven Contracts
Ensure the integration of Microservices with Consumer Driven Contracts
Ingo Griebsch
 
Azure
AzureAzure
Azure
Janu Jahnavi
 
Azure
AzureAzure
Azure
Janu Jahnavi
 
Emerging Cloud Migration Approaches
Emerging Cloud Migration ApproachesEmerging Cloud Migration Approaches
Emerging Cloud Migration Approaches
Arvind Viswanathan
 
Flutter App Performance Optimization_ Tips and Techniques.pdf
Flutter App Performance Optimization_ Tips and Techniques.pdfFlutter App Performance Optimization_ Tips and Techniques.pdf
Flutter App Performance Optimization_ Tips and Techniques.pdf
DianApps Technologies
 
How IBM is helping developers win the race to innovate with next-gen cloud se...
How IBM is helping developers win the race to innovate with next-gen cloud se...How IBM is helping developers win the race to innovate with next-gen cloud se...
How IBM is helping developers win the race to innovate with next-gen cloud se...
Michael Elder
 
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif A.
 
CI/CD Best Practices for Your DevOps Journey
CI/CD Best  Practices for Your DevOps JourneyCI/CD Best  Practices for Your DevOps Journey
CI/CD Best Practices for Your DevOps Journey
DevOps.com
 
Cloud native java workshop
Cloud native java workshopCloud native java workshop
Cloud native java workshop
Jamie Coleman
 
Journey to Cloud: Fast Track to Azure
Journey to Cloud: Fast Track to AzureJourney to Cloud: Fast Track to Azure
Journey to Cloud: Fast Track to Azure
Fausto Pasqualetti
 
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Enabling and accelerating multi-tenancy with Capgemini Digital Cloud Platform...
Capgemini
 
So you want to provision a test environment...
So you want to provision a test environment...So you want to provision a test environment...
So you want to provision a test environment...
DevOps.com
 
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for AzureAzure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
Azure Day Rome Reloaded 2019 - Cloud Journey – FastTrack for Azure
azuredayit
 
A Toolchain for Lean Architecture at American Airlines
A Toolchain for Lean Architecture at American AirlinesA Toolchain for Lean Architecture at American Airlines
A Toolchain for Lean Architecture at American Airlines
Shahir Daya
 
Multi cloud costs how to leverage insight and avoid overspending
Multi cloud costs  how to leverage insight and avoid overspendingMulti cloud costs  how to leverage insight and avoid overspending
Multi cloud costs how to leverage insight and avoid overspending
Appvia
 
Mobile cloud2020
Mobile cloud2020Mobile cloud2020
Mobile cloud2020
Arif A.
 
React Native App Development in 2023-Tips to Practice.pdf
React Native App Development in 2023-Tips to Practice.pdfReact Native App Development in 2023-Tips to Practice.pdf
React Native App Development in 2023-Tips to Practice.pdf
Techugo
 
Writing Applications at Cloud Scale
Writing Applications at Cloud ScaleWriting Applications at Cloud Scale
Writing Applications at Cloud Scale
Matt Ryan
 
Ensure the integration of Microservices with Consumer Driven Contracts
Ensure the integration of Microservices with Consumer Driven ContractsEnsure the integration of Microservices with Consumer Driven Contracts
Ensure the integration of Microservices with Consumer Driven Contracts
Ingo Griebsch
 
Emerging Cloud Migration Approaches
Emerging Cloud Migration ApproachesEmerging Cloud Migration Approaches
Emerging Cloud Migration Approaches
Arvind Viswanathan
 
Flutter App Performance Optimization_ Tips and Techniques.pdf
Flutter App Performance Optimization_ Tips and Techniques.pdfFlutter App Performance Optimization_ Tips and Techniques.pdf
Flutter App Performance Optimization_ Tips and Techniques.pdf
DianApps Technologies
 
How IBM is helping developers win the race to innovate with next-gen cloud se...
How IBM is helping developers win the race to innovate with next-gen cloud se...How IBM is helping developers win the race to innovate with next-gen cloud se...
How IBM is helping developers win the race to innovate with next-gen cloud se...
Michael Elder
 
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...Arif's PhD Defense (Title:  Efficient Cloud Application Deployment in Distrib...
Arif's PhD Defense (Title: Efficient Cloud Application Deployment in Distrib...
Arif A.
 
CI/CD Best Practices for Your DevOps Journey
CI/CD Best  Practices for Your DevOps JourneyCI/CD Best  Practices for Your DevOps Journey
CI/CD Best Practices for Your DevOps Journey
DevOps.com
 

Recently uploaded (20)

HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Ad

Kubernetes for Developers - 7 lessons learned from 7 data centers in 7 months (meetup)

  • 1. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Kubernetes - 7 lessons learned from 7 data centers in 7 months Kubernetes for Developers Meetup – May 13, 2019 Mike Tougeron – Senior Site Reliability Engineer @ Adobe
  • 2. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 $ whoami && id | grep Adobe  Mike Tougeron  Senior Site Reliability Engineer @ Adobe  Twitter: @mtougeron  Started using Kubernetes in 2015
  • 3. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Agenda  Quick Introduction to Adobe Advertising Cloud’s Kubernetes Infrastructure  Lesson 1: Communication, Teamwork & Training  Lesson 2: Code to production pipelines  Lesson 3: The ABCs of Production apps  Lesson 4: Multi-cloud challenges  Lesson 5: Knowing your application  Lesson 6: Metrics based monitoring  Lesson 7: Take a deep breath
  • 4. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 High Traffic 350 billion requests a day Latency <50ms @ 95th percentile Huge Datasets Billions of objects to store
  • 5. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Adobe Advertising Cloud’s Kubernetes Overview  ~225 worker nodes; growing to ~300 in May/June  6 OpenStack data centers across 4 regions  Running on VMs  No persistent storage  No autoscaling; “fixed” footprint  Smaller but growing  3 AWS clusters in us-east-1  Running on m5d.12xlarge ec2 instances  EBS volumes for persistent storage  Uses cluster-autoscaler  Autoscaling events many times per hour  Prometheus for monitoring  Dozens of Machine Learning workloads in AWS  Reason for frequent autoscaling events  Cluster updates done via new Image and rolling update of existing nodes  Updates are deployed approx every 4-6 weeks
  • 6. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019
  • 7. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 1: Communication, Teamwork & Training
  • 8. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Communication: Reaching large, distributed teams
  • 9. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Teamwork: Who’s responsible for what?
  • 10. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Abstraction vs Experts  Need understanding of core resources but also need easy onboarding  Pair programming training sessions  Remove need for boiler plate  Don’t duplicate efforts by avoiding abstraction  Don’t abstract to the point where you’re not using Kubernetes  kubectl should *not* be your entrypoint
  • 11. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 2: Code to production pipelines De v Pull Request maste r Unit testin g merge Deplo y bot Production Integration testing Insert your steps here!
  • 12. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Tools to help build application resources  Helm (templating and/or tiller)  Kustomize  Kapitan  and more…  We use a combination of Helm templating for infrastructure/3rd-party and Kustomize for application teams
  • 13. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 $> helm template --name opa --namespace opa --values ./values/globals.yaml --values ./values/mgmt/cluster.yaml --values ./values/mgmt/adcloud- opa/values.yaml --output-dir ../../../cloud/opa/mgmt charts/adcloud-opa versus $> ./build.py --chart adcloud-opa --cluster mgmt
  • 14. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 3: The ABCs of Production  HorizontalPodAutoscaler  PodDisruptionBudget  "DevOps"  Cluster Upgrades
  • 15. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 HorizontalPodAutoscaler  Easily scale on CPU or Memory usage  Also able to scale on custom metrics like http_requests from Ingress resources  Don’t set replicas in your Deployment
  • 16. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 PodDisruptionBudget  Not the same thing as a Deployment strategy  Helps prevent taking down so many Pods that the application is overwhelmed  Can set by minAvailable or maxUnavailable by number or percentage  Good for helping keep quorum  Doesn’t apply to manual deletions
  • 17. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 DevOps  Expertise/specialists  But empowerment & speed  Things get lost in shuffle  Everyone can do everything; aka don’t forget your guardrails
  • 18. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 deny[msg] { input.request.kind.kind = "Ingress" input.request.operation = "CREATE" host = input.request.object.spec.rules[_].host ingress = ingresses[other_ns][other_ingress] other_ns != input.request.namespace ingress.spec.rules[_].host = host msg = sprintf("invalid ingress host %q (conflicts with %v/%v)", [host, other_ns, other_ingress]) } patch[patchCode] { isCreateOrUpdate input.request.kind.kind == "Ingress" not hasAnnotation(input.request.object, "kubernetes.io/ingress.class") patchCode = makeAnnotationPatch("add", "kubernetes.io/ingress.class", "nginx- internal", "") }
  • 19. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Cluster Upgrades - Blue/Green or Canary?  Who really has the hardware to run a 2nd full Kubernetes cluster in their datacenter?  Public cloud is easier, but you still have cost considerations  Are the application team(s) able to handle deploying to a 2nd mirrored cluster?  Does it make more sense to run N workers of a different version/config for a period of time?  Do you have the visibility into the cluster to know how one performs vs the other?
  • 20. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 4: Multi-Cloud Challenges
  • 21. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Multiple code-bases but consistent infrastructure  Packer – Shared modular code base, different builders  Terraform – Separate but closely aligned code bases  Puppet – Same code base  Helm – Same modular code base  Leverage templating to build the same deployments for different (and future) clouds  Re-use, re-use, re-use!  Lab environments in all clouds  OSSIA for HV/rack metadata for region/zone
  • 22. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 5: Knowing your applications  Seems like an obvious statement but it’s easy to forget to think about  Kubernetes brings advantages, but not all the ones that bare metal and virtual machines bring out of the box  Think about how your app actually functions  Service Discovery  Persistent Storage  Shared Storage (e.g. replication, sharding, etc)  Scheduling / Restarting  Networking Ingress / Egress  Think about how your app is going to handle the way Kubernetes does things https://imgur.com/gallery/B4D7Lf1
  • 23. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Elasticsearch as Deployment (What We Did) https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly modified)
  • 24. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Oops…yeah Touge, I think something is wrong…
  • 25. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Elasticsearch as StatefulSet (What We Should Have Done) https://www.slideshare.net/JoergHenning/elasticsearch-on-kubernetes (slightly modified)
  • 26. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 6: Metrics-Based Monitoring
  • 27. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Lesson 7: Take a deep breath  Same team so we all learn & fix together  Experience has been enlightening & engineers have had fun  Teams already onboarded are moving faster than before  Dev cycle to production is faster as we integrate more automated testing
  • 28. © 2019 Adobe. Kubernetes for Developers Meetup – May 13, 2019 Thanks! Slides: https://touge.me/k8s-7lessons-meetup Mike Tougeron Email: [email protected] Twitter: @mtougeron Images from https://stock.adobe.com

Editor's Notes

  • #5: Adobe Advertising Cloud allows you to manage video, display, and search advertising across traditional TV and digital formats.
  • #7: ./deploy-ami.py master --context aws-lab
  • #9: Repeat, repeat, repeat There's always a medium that someone doesn't read even if they are supposed to Shout it from the mountain top Still drives me nuts
  • #10: Deploybot deploys yaml after being committed to git Team A wrote app, Team X had failure, Who gets alerts? Assumptions made by all parties involved Same type of problem with Registry server All boils down to lack of communication
  • #11: Don’t have good answer for everyone Balance is key to success
  • #12: Crucial to success Slow pipeline slows down adoption & Creates friction Easy pipeline creates the “that’s it?” question far too often :)
  • #20: We chose canary  -  app teams are not far enough to support cross-cluster LB
  • #21: Most data warehousing and analytics processing happens in AWS Bidding and ad serving then happen in via one of our six Openstack regions throughout the world Allows us the best of both worlds Burstable compute and storage when we need it Cheap, fast, low-latency compute that the majority of our workload needs
  • #22: We re-used much of the AWS code, and adapted it to be modular based on the target cloud Consistency across clusters and clouds Write once, target OSSIA – Open Stack Simple Inventory API Written in-house by Mykola Moglyenko Allows us to tag pods by their physical location in the cage, and make decisions that evenly spread out workloads Adobe will be open-sourcing this tool this spring
  • #23: Does a fixed hostname make a difference? For example zookeeper How does the app/service save its state? In memory or on disk? What about cluster data? Is it sharded? Replicated? How well does it handle rescheduling? How do other applications or teams access the app/service?
  • #24: How many people have run an elasticsearch cluster, or at least know about elasticsearch? We followed a blog post to set it up in K8s. Not a bad thing! We just didn’t think in a kubernetes way It looked like this. This lived in our AWS cluster, where our ML jobs causing a lot of auto-scaling up and down Fair amount of volatility When we first deployed it, it worked! Then we upgraded our nodes, which meant draining and replacing them one at a time Lots of app rescheduling Lots of autoscaler activity
  • #25: While deploying new worker images to our nodes, we noticed this happening to elasticsearch Everything was suddenly in CLBO Unassigned primary and replicas When we got things back up, we found we had lost 7% of our data (this was in dev)
  • #26: Converted es-master deployment to a StatefulSet Makes sure that master nodes are gracefully removed and re-added, without impacting quorum Adjusted cluster deployment scripts Respect the pod disruption budget for longer timeouts Pre-cordon nodes Increase size of cluster before draining nodes Disabled the cluster-autoscaler (so the cluster will stay inflated)