SlideShare a Scribd company logo
Unclouding Container Challenges
Apr 21st, 2021
Harpratap Singh Layal
Cloud Platform Department
Rakuten Group, Inc.
2
Background – Compute platforms
Bare metal as a Service (BMaaS)
16 core
32 GB
1 Gbps
16 core
32 GB
1 Gbps
32 core
128 GB
10 gbps
16 core
64 GB
10 gbps
Container as a Service (CaaS)
Cluster X
App 1 App 2
App 2
Cluster Y
App 3 App 2
App 4
3
Background – What is CaaS?
PaaS
(Heroku
12 factor
apps)
Managed K8s
control plane (GKE,
EKS, AKS Full
customization)
Simple
container
scheduler
(Fargate,
CloudRun)
Only expose
selected
K8s API
(CaaS)
Opioninated
(Less flexibility)
Developer control & Responsibility
Default Container Networking, CI/CD, monitoring, security for Stateless & Stateful apps, Cron, GPU workloads
4
Challenge #1 : Communication Cost
5
Challenge #1 : Communication Cost
Doing it the traditional way –
1. Communication lag – takes too long to formulate requirements from developers
2. XY problem – no idea what the real problem is
3. Validation and policy injection is manually done
6
Challenge #1 : Communication Cost
Solution: Create an opionated Internal Developer Platform and form an API based contract with
users
Philosophy :
• When you have APIs and their documentation users rarely need to communicate with you
• Easier to explicitly define what you provide and what you don’t
• Standardization = low re-invention of wheel, less pets, easier to propagate tech culture
Implementation :
• In CaaS we make use of K8s APIs to expose features to users. Custom Resource Definitions (CRDs)
and Operators fits us well.
• Admission control webhooks, podSecurityPolicy and networkPolicy
7
Challenge #1 : Communication Cost
Jiange : Validation without human communication
Jiange
etcd K8s API
8
Challenge #2: Day 2 Ops
9
Challenge #2 : Day 2 Ops
Day 1 Ops :
• Provisioning
• Step 1
• Step 2
• Step 3… N
• Procedural – easy to automate
Day 2 Ops:
• Maintainence
• Not always the same
• Improvements – need to keep an eye on various components
• Metrics
• Logs
• Traces
10
Challenge #2 : Day 2 Ops
Solution: Infrastructure as Data instead of Infrastructure as Code
Script
for X
Script
for Y
Script
for Z
IaC – run scripts one by one
Data
Store Infra
Infra
Control
Loop
Reconcile Spec
Reconcile Status
IaD – Store the state as Data and
reconcile until state is achieved
11
Challenge #2 : Day 2 Ops
Solution: Infrastructure as Data instead of Infrastructure as Code
In CaaS we have written controllers based on same approach
• Klone – Binary that provisions master nodes and system components based on git configs (written in
Go)
• Node operator – used for creating worker nodes
• Namespace operator – used for creating user namespaces with correct permissions, good defaults,
jenkins repositories, harbor projects etc when user on boards.
• Gateway controller – For creating istio ingress gateways
• Wildcard instant domain controller – For instantly creating simple domains to test out services
• Cloud controlller manager – for creating load balancers
• Endpoints controller – for creating container native load balancers
12
Challenge #3 : Day 2 Ops
Internet
Load Balancer
K8s API
Node
List
Cloud
Controller
Manager
K8s cluster nodes
13
Challenge #3: Container networks
14
Challenge #3 : Container Networks
• Kubernetes network != Host Network
• Pods are not first class citizens (not flat network)
• Pods are ephemeral
• Fair Load balancing does not happen when using NodePorts
• Additional hops (through K8s node Iptables)
• Source IP is not preserved
• Network is difficult to use
15
Challenge #3 : Container Networks
Solution: No one size fits all, provide all
solutions with good defaults and let users
choose
Shared Gateway +
Auto Assigned
Domain
Dedicated Gateway +
Custom Domain
Domain Auto Assigned Any Domain
Performance Not isolated Isolated
Maintainence (for
users)
Zero High
Customization Low Fully customizable
Cost Low High
16
Challenge #3 : Container Networks
Solution: Container Native Load balancing
Legacy Load
Balancer
Container Native
Load Balancer
Number of hops 2 1
IP preservation Remote IP lost Remote IP
preserved
Load Balancing Across nodes Across containers
Health checks Only for Nodes Application level
health checks
17
Future Challenges:
Multicluster CaaS -
Network
Deployments
IPv4 not enough (need IPv6 and/or VPCs)
Stateful apps -
Local persistence
Remote persistence
GPU
SRIOV
CPU pinning
Single Data proxy
 Unclouding  Container Challenges

More Related Content

What's hot (20)

PDF
Automate Your Kafka Cluster with Kubernetes Custom Resources
confluent
 
PDF
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
HostedbyConfluent
 
PDF
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
HostedbyConfluent
 
PPTX
Delivering Cloud Native Batch Solutions - Dodd Pfeffer
VMware Tanzu
 
PDF
The service mesh management plane
LibbySchulze
 
PDF
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
confluent
 
PPTX
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
HostedbyConfluent
 
PPTX
Kafka Deployment to Steel Thread
confluent
 
PDF
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
HostedbyConfluent
 
PDF
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
HostedbyConfluent
 
PPTX
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
HostedbyConfluent
 
PDF
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
HostedbyConfluent
 
PDF
Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...
HostedbyConfluent
 
PDF
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
HostedbyConfluent
 
PDF
Deploying Kafka Streams Applications with Docker and Kubernetes
confluent
 
PDF
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
HostedbyConfluent
 
PDF
Migrating to Apache Spark at Netflix
Databricks
 
PDF
dA Platform Overview
Robert Metzger
 
PPTX
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
Redis Labs
 
PPTX
Netflix Story of Embracing the Cloud
Kate Karniouchina
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
confluent
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
HostedbyConfluent
 
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
HostedbyConfluent
 
Delivering Cloud Native Batch Solutions - Dodd Pfeffer
VMware Tanzu
 
The service mesh management plane
LibbySchulze
 
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
confluent
 
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
HostedbyConfluent
 
Kafka Deployment to Steel Thread
confluent
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
HostedbyConfluent
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
HostedbyConfluent
 
One Click Streaming Data Pipelines & Flows | Leveraging Kafka & Spark | Ido F...
HostedbyConfluent
 
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
HostedbyConfluent
 
Taming a massive fleet of Python-based Kafka apps at Robinhood | Chandra Kuch...
HostedbyConfluent
 
Distributed Enterprise Monitoring and Management of Apache Kafka (William McL...
HostedbyConfluent
 
Deploying Kafka Streams Applications with Docker and Kubernetes
confluent
 
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
HostedbyConfluent
 
Migrating to Apache Spark at Netflix
Databricks
 
dA Platform Overview
Robert Metzger
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
Redis Labs
 
Netflix Story of Embracing the Cloud
Kate Karniouchina
 

Similar to Unclouding Container Challenges (20)

PPTX
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
PDF
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
Josef Adersberger
 
PDF
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
PDF
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
PPTX
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
PDF
Using Kubernetes to make cellular data plans cheaper for 50M users
Mirantis
 
PDF
Docker on docker leveraging kubernetes in docker ee
Docker, Inc.
 
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
PPTX
Container orchestration and microservices world
Karol Chrapek
 
PPTX
AWS and GKE Migration and Multicloud
Chris Gaun
 
PDF
Kubernetes Monitoring & Best Practices
Ajeet Singh Raina
 
PDF
Circonus: Design failures - A Case Study
Heinrich Hartmann
 
PDF
Santhosh Resume
Santhosh Ravisankar
 
PDF
Microservices @ Work - A Practice Report of Developing Microservices
QAware GmbH
 
PDF
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
PDF
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
Eleni Trouva
 
PPTX
AOUG_11Nov2016_Challenges_with_EBS12_2
Sean Braymen
 
PPTX
Containerized Hadoop beyond Kubernetes
DataWorks Summit
 
PDF
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
IBM France Lab
 
PDF
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
Josef Adersberger
 
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
QAware GmbH
 
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Ambassador Labs
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
Using Kubernetes to make cellular data plans cheaper for 50M users
Mirantis
 
Docker on docker leveraging kubernetes in docker ee
Docker, Inc.
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
Container orchestration and microservices world
Karol Chrapek
 
AWS and GKE Migration and Multicloud
Chris Gaun
 
Kubernetes Monitoring & Best Practices
Ajeet Singh Raina
 
Circonus: Design failures - A Case Study
Heinrich Hartmann
 
Santhosh Resume
Santhosh Ravisankar
 
Microservices @ Work - A Practice Report of Developing Microservices
QAware GmbH
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Guglielmo Iozzia
 
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
Eleni Trouva
 
AOUG_11Nov2016_Challenges_with_EBS12_2
Sean Braymen
 
Containerized Hadoop beyond Kubernetes
DataWorks Summit
 
IBM Bluemix Nice meetup #5 - 20170504 - Orchestrer Docker avec Kubernetes
IBM France Lab
 
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
Ad

More from Rakuten Group, Inc. (20)

PDF
EPSS (Exploit Prediction Scoring System)モニタリングツールの開発
Rakuten Group, Inc.
 
PPTX
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
Rakuten Group, Inc.
 
PDF
楽天における安全な秘匿情報管理への道のり
Rakuten Group, Inc.
 
PDF
What Makes Software Green?
Rakuten Group, Inc.
 
PDF
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Rakuten Group, Inc.
 
PDF
DataSkillCultureを浸透させる楽天の取り組み
Rakuten Group, Inc.
 
PDF
大規模なリアルタイム監視の導入と展開
Rakuten Group, Inc.
 
PDF
楽天における大規模データベースの運用
Rakuten Group, Inc.
 
PDF
楽天サービスを支えるネットワークインフラストラクチャー
Rakuten Group, Inc.
 
PDF
楽天の規模とクラウドプラットフォーム統括部の役割
Rakuten Group, Inc.
 
PDF
Rakuten Services and Infrastructure Team.pdf
Rakuten Group, Inc.
 
PDF
The Data Platform Administration Handling the 100 PB.pdf
Rakuten Group, Inc.
 
PDF
Supporting Internal Customers as Technical Account Managers.pdf
Rakuten Group, Inc.
 
PDF
Making Cloud Native CI_CD Services.pdf
Rakuten Group, Inc.
 
PDF
How We Defined Our Own Cloud.pdf
Rakuten Group, Inc.
 
PDF
Travel & Leisure Platform Department's tech info
Rakuten Group, Inc.
 
PDF
Travel & Leisure Platform Department's tech info
Rakuten Group, Inc.
 
PDF
OWASPTop10_Introduction
Rakuten Group, Inc.
 
PDF
Introduction of GORA API Group technology
Rakuten Group, Inc.
 
PDF
100PBを越えるデータプラットフォームの実情
Rakuten Group, Inc.
 
EPSS (Exploit Prediction Scoring System)モニタリングツールの開発
Rakuten Group, Inc.
 
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
Rakuten Group, Inc.
 
What Makes Software Green?
Rakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
Rakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
Rakuten Group, Inc.
 
楽天における大規模データベースの運用
Rakuten Group, Inc.
 
楽天サービスを支えるネットワークインフラストラクチャー
Rakuten Group, Inc.
 
楽天の規模とクラウドプラットフォーム統括部の役割
Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Group, Inc.
 
The Data Platform Administration Handling the 100 PB.pdf
Rakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Rakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Rakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
Rakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Rakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Rakuten Group, Inc.
 
OWASPTop10_Introduction
Rakuten Group, Inc.
 
Introduction of GORA API Group technology
Rakuten Group, Inc.
 
100PBを越えるデータプラットフォームの実情
Rakuten Group, Inc.
 
Ad

Recently uploaded (20)

PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PDF
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PDF
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
PDF
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PDF
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PDF
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
PPSX
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PDF
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
Simplify Your FME Flow Setup: Fault-Tolerant Deployment Made Easy with Packer...
Safe Software
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
Automating the Geo-Referencing of Historic Aerial Photography in Flanders
Safe Software
 
Proactive Server and System Monitoring with FME: Using HTTP and System Caller...
Safe Software
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
ArcGIS Utility Network Migration - The Hunter Water Story
Safe Software
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
Usergroup - OutSystems Architecture.ppsx
Kurt Vandevelde
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
FME as an Orchestration Tool with Principles From Data Gravity
Safe Software
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
Kubernetes - Architecture & Components.pdf
geethak285
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 

Unclouding Container Challenges

  • 1. Unclouding Container Challenges Apr 21st, 2021 Harpratap Singh Layal Cloud Platform Department Rakuten Group, Inc.
  • 2. 2 Background – Compute platforms Bare metal as a Service (BMaaS) 16 core 32 GB 1 Gbps 16 core 32 GB 1 Gbps 32 core 128 GB 10 gbps 16 core 64 GB 10 gbps Container as a Service (CaaS) Cluster X App 1 App 2 App 2 Cluster Y App 3 App 2 App 4
  • 3. 3 Background – What is CaaS? PaaS (Heroku 12 factor apps) Managed K8s control plane (GKE, EKS, AKS Full customization) Simple container scheduler (Fargate, CloudRun) Only expose selected K8s API (CaaS) Opioninated (Less flexibility) Developer control & Responsibility Default Container Networking, CI/CD, monitoring, security for Stateless & Stateful apps, Cron, GPU workloads
  • 4. 4 Challenge #1 : Communication Cost
  • 5. 5 Challenge #1 : Communication Cost Doing it the traditional way – 1. Communication lag – takes too long to formulate requirements from developers 2. XY problem – no idea what the real problem is 3. Validation and policy injection is manually done
  • 6. 6 Challenge #1 : Communication Cost Solution: Create an opionated Internal Developer Platform and form an API based contract with users Philosophy : • When you have APIs and their documentation users rarely need to communicate with you • Easier to explicitly define what you provide and what you don’t • Standardization = low re-invention of wheel, less pets, easier to propagate tech culture Implementation : • In CaaS we make use of K8s APIs to expose features to users. Custom Resource Definitions (CRDs) and Operators fits us well. • Admission control webhooks, podSecurityPolicy and networkPolicy
  • 7. 7 Challenge #1 : Communication Cost Jiange : Validation without human communication Jiange etcd K8s API
  • 9. 9 Challenge #2 : Day 2 Ops Day 1 Ops : • Provisioning • Step 1 • Step 2 • Step 3… N • Procedural – easy to automate Day 2 Ops: • Maintainence • Not always the same • Improvements – need to keep an eye on various components • Metrics • Logs • Traces
  • 10. 10 Challenge #2 : Day 2 Ops Solution: Infrastructure as Data instead of Infrastructure as Code Script for X Script for Y Script for Z IaC – run scripts one by one Data Store Infra Infra Control Loop Reconcile Spec Reconcile Status IaD – Store the state as Data and reconcile until state is achieved
  • 11. 11 Challenge #2 : Day 2 Ops Solution: Infrastructure as Data instead of Infrastructure as Code In CaaS we have written controllers based on same approach • Klone – Binary that provisions master nodes and system components based on git configs (written in Go) • Node operator – used for creating worker nodes • Namespace operator – used for creating user namespaces with correct permissions, good defaults, jenkins repositories, harbor projects etc when user on boards. • Gateway controller – For creating istio ingress gateways • Wildcard instant domain controller – For instantly creating simple domains to test out services • Cloud controlller manager – for creating load balancers • Endpoints controller – for creating container native load balancers
  • 12. 12 Challenge #3 : Day 2 Ops Internet Load Balancer K8s API Node List Cloud Controller Manager K8s cluster nodes
  • 14. 14 Challenge #3 : Container Networks • Kubernetes network != Host Network • Pods are not first class citizens (not flat network) • Pods are ephemeral • Fair Load balancing does not happen when using NodePorts • Additional hops (through K8s node Iptables) • Source IP is not preserved • Network is difficult to use
  • 15. 15 Challenge #3 : Container Networks Solution: No one size fits all, provide all solutions with good defaults and let users choose Shared Gateway + Auto Assigned Domain Dedicated Gateway + Custom Domain Domain Auto Assigned Any Domain Performance Not isolated Isolated Maintainence (for users) Zero High Customization Low Fully customizable Cost Low High
  • 16. 16 Challenge #3 : Container Networks Solution: Container Native Load balancing Legacy Load Balancer Container Native Load Balancer Number of hops 2 1 IP preservation Remote IP lost Remote IP preserved Load Balancing Across nodes Across containers Health checks Only for Nodes Application level health checks
  • 17. 17 Future Challenges: Multicluster CaaS - Network Deployments IPv4 not enough (need IPv6 and/or VPCs) Stateful apps - Local persistence Remote persistence GPU SRIOV CPU pinning Single Data proxy