SlideShare a Scribd company logo
Scheduling a Fuller House:
Container Management
Sharma Podila, Andrew Spyker - Senior Software Engineers
About Netflix
● 81.5M members
● 2000+ employees (1400 tech)
● 190+ countries
● > 100M hours watch per day
● > ⅓ NA internet download traffic
● 500+ Microservices
● Many 10’s of thousands VM’s
● 3 regions across the world
2
Agenda
● Why containers at Netflix?
● What did we build and what did we learn?
● What are our current and future workloads?
3
⇨
Why a 2nd edition of virtualization?
● Given our resilient cloud native, CI/CD devops enabled,
elastically scalable virtual machine based architecture,
did we really need containers? 4
Motivating factors for containers
● Simpler management of compute resources
● Simpler deployment packaging artifacts for compute jobs
● Need for a consistent local developer environment
5
Simpler compute, Management & Packaging
Batch/stream processing jobs
● Here are the files to run my process
● I need m cores, n disk, and o memory
● Please just run it for me!
6
Service style jobs (VM’s)
● Use tested/secure base AMI
● Bake an AMI
● Define launch config
● Choose t-shirt sized instance
● Canary & red/black ASG’s
Consistent developer experience
● Many years focused on
○ Build, bake / cloud deploy / operational experience
○ Not as much time focused on developer experience
● New Netflix local developer experience based on Docker
● Has had a benefit in both directions
○ Cloud like local development environment
○ Easier operational debugging of cloud workloads
7
What about resource optimization?
● Not absolutely required and easier to get wins at larger
scale across larger virtual machine fleet
● However, potential benefits to
○ Elastic resource pool for scaling batch & adhoc jobs
○ Reliable smaller instance sizes for NodeJS
○ Cross Netflix resource optimizations
■ Trough usage, instance type migration
8
Agenda
● Why containers at Netflix?
● What did we build and what did we learn?
● What are our current and future workloads?
9
⇨
VMVM
Lesson: Support containers by leveraging
existing Netflix IaaS focused cloud platform
10
Atlas
EC2
AWSAutoScaler
VMs
App
Cloud Platform
(metrics, IPC, health)
Eureka
VPC
Edda
Existing - VM’s
VMVM
Atlas
EC2
TitusJobControl
Containers
App
Cloud Platform
(metrics, IPC, health)
Eureka
VPC
Edda
Titus - Containers
VMVM
Batch
Containers
VMVM
11
EC2
AWSAutoScaler
VMs
App
Cloud Platform
(metrics, IPC, health)
VPC
Netflix Cloud Infrastructure (VM’s + Containers)
VMVM
Atlas
TitusJobControl
Containers
App
Cloud Platform
(metrics, IPC, health)
Eureka Edda
VMVM
Batch
Containers
Why - Single consistent cloud platform
Lesson: Buy vs. Build, Why build our own?
● Looking across other container management solutions
○ Mesos, Kubernetes, and Swarm
● Proven solutions are focused on the datacenter
● Newer solutions are
○ Working to abstract datacenter and cloud
○ Delivering more than cluster manager
■ PaaS, Service discovery, IPC
■ Continuous deployment
■ Metrics
○ Not yet at our level of scale
● Not appropriate for Netflix 12
“Project Titus” (Firehose peek)
13
Titus UITitus UI
Docker
Registry
Docker
Registry
Rhea
container
container
container
docker
Titus Agent
metrics agent
Titus executor
logging agent
zfs
mesos agent
docker
RheaTitus API
Cassandra
Titus Master
Job Management &
Scheduler
S3
Zookeeper
Docker
Registry
EC2 Autocaling
API
Mesos Master
Titus UI
Fenzo
container
Pod & VPC net
drivers
container
container
AWS container
metadata proxy
Integration
CI/CD Amazon VM’s
Is that all?
14
Container Execution
15
Titus UITitus UI
Docker
Registry
Docker
Registry
Rhea
container
container
container
docker
Titus Agent
metrics agent
Titus executor
logging agent
zfs
mesos agent
docker
RheaTitus API
Cassandra
Titus Master
Job Management &
Scheduler
S3
Zookeeper
Docker
Registry
EC2 Autocaling
API
Mesos Master
Titus UI
Fenzo
container
Pod & VPC net
drivers
container
container
AWS container
metadata proxy
CI/CD Amazon VM’s
Lesson: What you lose with Docker on EC2
16
+ <
● Networking: VPC
● Security: Security Groups, IAM Roles
● Context: Instance Metadata, User Data / Env Context
● Operational Visibility: Metrics, Health checking
● Resource Isolation: Networking, Local Storage
MULTI-TENANT
Lesson: Making Containers Act Like VM’s
17
● Built: EC2 Metadata Proxy
○ Provide overridden scheduled IAM role, instance id
○ Proxy other values
● Provided: Provide Environmental Context
○ Titus specific job and task info
○ ASG app, stack, sequence, other EC2 standard
● Why? Now:
○ Service discovery registration works
○ Amazon service SDK based applications work
Lesson: Networking will continue to evolve
18
● Started with batch
○ Started with “bridge” with port mapping
○ Added “host” with port resource mapping (for performance?)
○ Continue to use “bridge” without port mapping
● Service style apps added
○ Added “nfvpc” VPC IP/container with libnetwork plugin
○ Removed Host (no value over VPC IP/container)
○ Changed “nfvpc” VPC IP/container
■ Pod based with customer executor (no plugin)
○ Added security groups to “nfvpc”
Plumbing VPC Networking into Docker
19
No IP Needed
Task 0
SecGrp Y
Task 1 Task 2 Task 3
docker0 (*)
EC2 VMeth0
eni0
SG=Titus Agent
eth1
eni1
SecGrp=X
eth2
eni2
SG=Y
IP 1
IP 2
IP 3
pod root
veth<id>
app
SecGrp X
pod root
veth<id>
app
SecGrp X
pod root
veth<id>
appapp
veth<id>
Linux Policy
Based Routing
EC2
Metadata
Proxy
169.254.169.254
IPTables NAT (*)
* **
169.254.169.254
Lesson: Secure Multi-tenancy is Hard
20
Common to VM’s and tiered security needed
● Protect the reduced host IAM role, Allow containers to have specific IAM roles
● Needed to support same security groups in container networking as VM’s
User namespacing
● Docker 1.10 - Introduced User Namespaces
● Didn’t work /w shared networking NS
● Docker 1.11 - Fixed shared networking NS’s
● But, namespacing is per daemon
● Not per container, as hoped
● Waiting on Linux
● Considering mass chmod / ZFS clones
Operational Visibility Evolution
21
● What is “node” - containers on VM’s
● Soft limits / bursting a good thing?
○ Until percent util and outliers are considered
● System level metrics
○ Currently - hand coded cgroup scraping
○ Considering Intel Snap replacement
● Pollers - Metrics, Health, Discovery
○ Created Edda common “server group” view
Future Execution Focus
22
● Better Isolation (agents, networking, block I/O, etc.)
● Exposing our implementation of “Pod”’s to users
● Better resiliency (DNS dependencies reduced)
Job Management and Resource Scheduling
23
Titus UITitus UI
Docker
Registry
Docker
Registry
Rhea
container
container
container
docker
Titus Agent
metrics agent
Titus executor
logging agent
zfs
mesos agent
docker
RheaTitus API
Cassandra
Titus Master
Job Management &
Scheduler
S3
Zookeeper
Docker
Registry
EC2 Autocaling
API
Mesos Master
Titus UI
Fenzo
container
Pod & VPC net
drivers
container
container
AWS container
metadata proxy
CI/CD Amazon VM’s
Lesson: Complexity in scheduling
24
● Resilience
○ Balance instances across EC2 zones,
instances within a zone
● Security
○ Two level resource for ENIs
● Placement optimization
○ Resource affinity
○ Task locality
○ Bin packing (Auto Scaling)
Lesson: Keep resource scheduling extensible
25
Fenzo - Extensible Scheduling Library
Features:
● Heterogeneous resources & tasks
● Autoscaling of mesos cluster
○ Multiple instance types
● Plugins based scheduling objectives
○ Bin packing, etc.
● Plugins based constraints evaluator
○ Resource affinity, task locality, etc.
● Scheduling actions visibility
https://github.com/Netflix/Fenzo
Cluster Autoscaling Challenge
26
Host 4Host 3Host 1
vs.
For long running stateful services
Host 1 Host 2
Host 2
Host 3 Host 4
Resources assigned in Titus
27
● CPU, memory, disk capacity
● Per container AWS EC2 Security groups, IP, and
network bandwidth via custom driver
● Abstracting out EC2 instance types
Security groups and their resources
28
A two level resource per EC2 Instance: N ENIs, each with M IPs
ENI 0
Assigned Security Group: SG1 Used IPs Count: 2 of 7
ENI 1
Assigned Security Group: SG1,SG2 Used IPs Count: 1 of 7
ENI 2
Assigned Security Group: SG3 Used IPs Count: 7 of 7
Lesson: Scheduling Vs. Job Management
29
Scheduling resources to tasks is common.
Lifecycle management is not.
Lesson: Scheduling Vs. Job Management
30
Task scheduling concerns
● Assign resources to tasks
● Cluster wide optimizations
○ Bin packing
○ Global constraints, like SLAs
● Task preferences and constraints
○ Locality with other tasks
○ Resource affinity
Job manager concerns
● Managing task/instance counts
● Creating metadata, defining constraints
● Lifecycle management
○ Replace failed task executions
● Handle failures
○ Rate limit requeuing & relaunching
○ Time out tasks in transitionary states
Future Job Management & Scheduling Focus
31
● More resources to track: GPUs
● Automatic resource affinity with heterogenous instances
● SLAs
○ Latencies for services
○ Throughput for batch
○ Task preemptions
Things we didn’t cover in this talk
● Overall integration
○ Chaos, continuous delivery, performance insight
● Container Execution
○ Logging (live log access & S3 log rotation)
○ Liveness and health checking
○ Isolation (disk usage, networking, block I/O)
○ Image registry (metrics, security scanning)
● Scheduling
○ Autoscaling heterogeneous pools
○ Host-task fitness criteria
● API
○ Extensibility, polymorphic, SLA and job/container ownership 32
Agenda
● Why containers at Netflix?
● What did we build and what did we learn?
● What are our current and future workloads?
33
⇨
Current Titus Production Usage
34
● Autoscaling
○ 100’s of r3.8xl’s
○ Each 32 vCPU, 244G
● Peak
○ Thousands of cores
○ Tens of TB’s memory
● Thousands containers/day
○ ~ 100 different images
Workloads, Past
● Most current usage is batch
○ Algorithm training, adhoc reporting jobs
● Sampling:
○ Training of “sims” and A/B test models
○ Open Connect Device/IX reporting
○ Web security scanning and analysis
○ Social media analytics updates
35
Workloads, Now
● Spent last five months adding service style support
● First line of fire customer requests already received
● Larger scale shadow and trickle traffic throughout 2Q
● First service style apps
○ Finer grained instances - NodeJS
○ Docker provided local developer experience
36
Workloads, Coming
● Media Encoding
○ Thousands of VM’s
○ VM based resource scheduling
○ Considering containers to have faster start-up
○ Internal spot-market - trough borrowing
● SPaaS
○ 10’s of thousands of containers
○ Stream Processing as a Service
○ Convert scheduling systems to Titus
37
Questions?
38
Other Netflix QCon Talks
39
Title Time Speaker(s)
The Netflix API Platform for
Server-Side Scripting
Monday 10:35 Katharina Probst
Scheduling A Fuller House:
Container Mgmt @ Netflix
Tuesday 10:35 Andrew Spyker &
Sharma Podila
Chaos Kong - Endowing
Netflix with Antifragility
Tuesday 11:50 Luke Kosewski
The Evolution of the
JavaScript
Wednesday 4:10 Jafar Husain
Async Programming in JS:
The End of the Loop
Friday 9:00 Jafar Husain
Ad

More Related Content

What's hot (20)

CNCF Projects Overview
CNCF Projects OverviewCNCF Projects Overview
CNCF Projects Overview
Neependra Khare
 
3 - Delen Private Bank: FOSS adventures in a Cloud Native world
3 - Delen Private Bank: FOSS adventures in a Cloud Native world3 - Delen Private Bank: FOSS adventures in a Cloud Native world
3 - Delen Private Bank: FOSS adventures in a Cloud Native world
Kangaroot
 
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Codemotion
 
Integrating Applications: the Reactive Way
Integrating Applications: the Reactive WayIntegrating Applications: the Reactive Way
Integrating Applications: the Reactive Way
Nicola Ferraro
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Docker, Inc.
 
How Docker EE Helps Open Doors at Assa Abloy
How Docker EE Helps Open Doors at Assa AbloyHow Docker EE Helps Open Doors at Assa Abloy
How Docker EE Helps Open Doors at Assa Abloy
Docker, Inc.
 
KubeCon US 2021 - Recap - DCMeetup
KubeCon US 2021 - Recap - DCMeetupKubeCon US 2021 - Recap - DCMeetup
KubeCon US 2021 - Recap - DCMeetup
Faheem Memon
 
How to Integrate Kubernetes in OpenStack
 How to Integrate Kubernetes in OpenStack  How to Integrate Kubernetes in OpenStack
How to Integrate Kubernetes in OpenStack
Meng-Ze Lee
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
Docker, Inc.
 
Cncf storage-final-filip
Cncf storage-final-filipCncf storage-final-filip
Cncf storage-final-filip
Juraj Hantak
 
Docker Federal Summit 2017 General Session
Docker Federal Summit 2017 General SessionDocker Federal Summit 2017 General Session
Docker Federal Summit 2017 General Session
Docker, Inc.
 
Getting started with kubernetes
Getting started with kubernetesGetting started with kubernetes
Getting started with kubernetes
Janakiram MSV
 
Cloud Native User Group: Shift-Left Testing IaC With PaC
Cloud Native User Group: Shift-Left Testing IaC With PaCCloud Native User Group: Shift-Left Testing IaC With PaC
Cloud Native User Group: Shift-Left Testing IaC With PaC
smalltown
 
Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStack
Victor Palma
 
Container World 2017 - Characterizing and Contrasting Container Orchestrators
Container World 2017 - Characterizing and Contrasting Container OrchestratorsContainer World 2017 - Characterizing and Contrasting Container Orchestrators
Container World 2017 - Characterizing and Contrasting Container Orchestrators
Lee Calcote
 
Container World 2017!
Container World 2017!Container World 2017!
Container World 2017!
kgraham32
 
Building Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and DockerBuilding Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and Docker
Steve Watt
 
How Kubernetes make OpenStack & Ceph better
How Kubernetes make OpenStack & Ceph betterHow Kubernetes make OpenStack & Ceph better
How Kubernetes make OpenStack & Ceph better
TeK Charnsilp Chinprasert
 
Promise of DevOps
Promise of DevOpsPromise of DevOps
Promise of DevOps
Juraj Hantak
 
PTG recap
PTG recapPTG recap
PTG recap
Vietnam Open Infrastructure User Group
 
3 - Delen Private Bank: FOSS adventures in a Cloud Native world
3 - Delen Private Bank: FOSS adventures in a Cloud Native world3 - Delen Private Bank: FOSS adventures in a Cloud Native world
3 - Delen Private Bank: FOSS adventures in a Cloud Native world
Kangaroot
 
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Container orchestration: the cold war - Giulio De Donato - Codemotion Rome 2017
Codemotion
 
Integrating Applications: the Reactive Way
Integrating Applications: the Reactive WayIntegrating Applications: the Reactive Way
Integrating Applications: the Reactive Way
Nicola Ferraro
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Docker, Inc.
 
How Docker EE Helps Open Doors at Assa Abloy
How Docker EE Helps Open Doors at Assa AbloyHow Docker EE Helps Open Doors at Assa Abloy
How Docker EE Helps Open Doors at Assa Abloy
Docker, Inc.
 
KubeCon US 2021 - Recap - DCMeetup
KubeCon US 2021 - Recap - DCMeetupKubeCon US 2021 - Recap - DCMeetup
KubeCon US 2021 - Recap - DCMeetup
Faheem Memon
 
How to Integrate Kubernetes in OpenStack
 How to Integrate Kubernetes in OpenStack  How to Integrate Kubernetes in OpenStack
How to Integrate Kubernetes in OpenStack
Meng-Ze Lee
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
Docker, Inc.
 
Cncf storage-final-filip
Cncf storage-final-filipCncf storage-final-filip
Cncf storage-final-filip
Juraj Hantak
 
Docker Federal Summit 2017 General Session
Docker Federal Summit 2017 General SessionDocker Federal Summit 2017 General Session
Docker Federal Summit 2017 General Session
Docker, Inc.
 
Getting started with kubernetes
Getting started with kubernetesGetting started with kubernetes
Getting started with kubernetes
Janakiram MSV
 
Cloud Native User Group: Shift-Left Testing IaC With PaC
Cloud Native User Group: Shift-Left Testing IaC With PaCCloud Native User Group: Shift-Left Testing IaC With PaC
Cloud Native User Group: Shift-Left Testing IaC With PaC
smalltown
 
Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStack
Victor Palma
 
Container World 2017 - Characterizing and Contrasting Container Orchestrators
Container World 2017 - Characterizing and Contrasting Container OrchestratorsContainer World 2017 - Characterizing and Contrasting Container Orchestrators
Container World 2017 - Characterizing and Contrasting Container Orchestrators
Lee Calcote
 
Container World 2017!
Container World 2017!Container World 2017!
Container World 2017!
kgraham32
 
Building Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and DockerBuilding Clustered Applications with Kubernetes and Docker
Building Clustered Applications with Kubernetes and Docker
Steve Watt
 

Similar to Netflix Container Scheduling and Execution - QCon New York 2016 (20)

Netflix Titus WASP October 2017
Netflix Titus WASP October 2017Netflix Titus WASP October 2017
Netflix Titus WASP October 2017
Andrew Leung
 
Netflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger ThingsNetflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger Things
All Things Open
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
aspyker
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
aspyker
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
Alok Patra
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
DigitalOcean
 
Scaling Open edX with Kubernetes
Scaling Open edX with KubernetesScaling Open edX with Kubernetes
Scaling Open edX with Kubernetes
Appsembler
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
Ambassador Labs
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
Ryan Hunter
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
C4Media
 
Introduction to rook
Introduction to rookIntroduction to rook
Introduction to rook
Rohan Gupta
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
SeungYong Oh
 
Unleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platformUnleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platform
Lakmal Warusawithana
 
GCCP JSCOE Session 2
GCCP JSCOE Session 2GCCP JSCOE Session 2
GCCP JSCOE Session 2
GDSC
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS
 
Netflix Titus WASP October 2017
Netflix Titus WASP October 2017Netflix Titus WASP October 2017
Netflix Titus WASP October 2017
Andrew Leung
 
Netflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger ThingsNetflix and Containers: Not Stranger Things
Netflix and Containers: Not Stranger Things
All Things Open
 
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and DaemonsQConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemons
aspyker
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
NetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & ContainersNetflixOSS Meetup S6E1 - Titus & Containers
NetflixOSS Meetup S6E1 - Titus & Containers
aspyker
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kevin Lynch
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
Alok Patra
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
DigitalOcean
 
Scaling Open edX with Kubernetes
Scaling Open edX with KubernetesScaling Open edX with Kubernetes
Scaling Open edX with Kubernetes
Appsembler
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
Ambassador Labs
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
Ryan Hunter
 
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsDisenchantment: Netflix Titus, Its Feisty Team, and Daemons
Disenchantment: Netflix Titus, Its Feisty Team, and Daemons
C4Media
 
Introduction to rook
Introduction to rookIntroduction to rook
Introduction to rook
Rohan Gupta
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
SeungYong Oh
 
Unleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platformUnleashing k8 s to reduce complexities of an entire middleware platform
Unleashing k8 s to reduce complexities of an entire middleware platform
Lakmal Warusawithana
 
GCCP JSCOE Session 2
GCCP JSCOE Session 2GCCP JSCOE Session 2
GCCP JSCOE Session 2
GDSC
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS
 
Ad

More from aspyker (20)

Herding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes PublicHerding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes Public
aspyker
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
aspyker
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
aspyker
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
aspyker
 
SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talk
aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
aspyker
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
aspyker
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
aspyker
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
aspyker
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
aspyker
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
aspyker
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
aspyker
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
aspyker
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talk
aspyker
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
aspyker
 
NetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talk
aspyker
 
Herding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes PublicHerding Kats - Netflix’s Journey to Kubernetes Public
Herding Kats - Netflix’s Journey to Kubernetes Public
aspyker
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
aspyker
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
aspyker
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
aspyker
 
SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talk
aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
aspyker
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
aspyker
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
aspyker
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
aspyker
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
aspyker
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
aspyker
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
aspyker
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
aspyker
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talk
aspyker
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
aspyker
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
aspyker
 
NetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker TalkNetflixOSS and ZeroToDocker Talk
NetflixOSS and ZeroToDocker Talk
aspyker
 
Ad

Recently uploaded (20)

DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 

Netflix Container Scheduling and Execution - QCon New York 2016

  • 1. Scheduling a Fuller House: Container Management Sharma Podila, Andrew Spyker - Senior Software Engineers
  • 2. About Netflix ● 81.5M members ● 2000+ employees (1400 tech) ● 190+ countries ● > 100M hours watch per day ● > ⅓ NA internet download traffic ● 500+ Microservices ● Many 10’s of thousands VM’s ● 3 regions across the world 2
  • 3. Agenda ● Why containers at Netflix? ● What did we build and what did we learn? ● What are our current and future workloads? 3 ⇨
  • 4. Why a 2nd edition of virtualization? ● Given our resilient cloud native, CI/CD devops enabled, elastically scalable virtual machine based architecture, did we really need containers? 4
  • 5. Motivating factors for containers ● Simpler management of compute resources ● Simpler deployment packaging artifacts for compute jobs ● Need for a consistent local developer environment 5
  • 6. Simpler compute, Management & Packaging Batch/stream processing jobs ● Here are the files to run my process ● I need m cores, n disk, and o memory ● Please just run it for me! 6 Service style jobs (VM’s) ● Use tested/secure base AMI ● Bake an AMI ● Define launch config ● Choose t-shirt sized instance ● Canary & red/black ASG’s
  • 7. Consistent developer experience ● Many years focused on ○ Build, bake / cloud deploy / operational experience ○ Not as much time focused on developer experience ● New Netflix local developer experience based on Docker ● Has had a benefit in both directions ○ Cloud like local development environment ○ Easier operational debugging of cloud workloads 7
  • 8. What about resource optimization? ● Not absolutely required and easier to get wins at larger scale across larger virtual machine fleet ● However, potential benefits to ○ Elastic resource pool for scaling batch & adhoc jobs ○ Reliable smaller instance sizes for NodeJS ○ Cross Netflix resource optimizations ■ Trough usage, instance type migration 8
  • 9. Agenda ● Why containers at Netflix? ● What did we build and what did we learn? ● What are our current and future workloads? 9 ⇨
  • 10. VMVM Lesson: Support containers by leveraging existing Netflix IaaS focused cloud platform 10 Atlas EC2 AWSAutoScaler VMs App Cloud Platform (metrics, IPC, health) Eureka VPC Edda Existing - VM’s VMVM Atlas EC2 TitusJobControl Containers App Cloud Platform (metrics, IPC, health) Eureka VPC Edda Titus - Containers VMVM Batch Containers
  • 11. VMVM 11 EC2 AWSAutoScaler VMs App Cloud Platform (metrics, IPC, health) VPC Netflix Cloud Infrastructure (VM’s + Containers) VMVM Atlas TitusJobControl Containers App Cloud Platform (metrics, IPC, health) Eureka Edda VMVM Batch Containers Why - Single consistent cloud platform
  • 12. Lesson: Buy vs. Build, Why build our own? ● Looking across other container management solutions ○ Mesos, Kubernetes, and Swarm ● Proven solutions are focused on the datacenter ● Newer solutions are ○ Working to abstract datacenter and cloud ○ Delivering more than cluster manager ■ PaaS, Service discovery, IPC ■ Continuous deployment ■ Metrics ○ Not yet at our level of scale ● Not appropriate for Netflix 12
  • 13. “Project Titus” (Firehose peek) 13 Titus UITitus UI Docker Registry Docker Registry Rhea container container container docker Titus Agent metrics agent Titus executor logging agent zfs mesos agent docker RheaTitus API Cassandra Titus Master Job Management & Scheduler S3 Zookeeper Docker Registry EC2 Autocaling API Mesos Master Titus UI Fenzo container Pod & VPC net drivers container container AWS container metadata proxy Integration CI/CD Amazon VM’s
  • 15. Container Execution 15 Titus UITitus UI Docker Registry Docker Registry Rhea container container container docker Titus Agent metrics agent Titus executor logging agent zfs mesos agent docker RheaTitus API Cassandra Titus Master Job Management & Scheduler S3 Zookeeper Docker Registry EC2 Autocaling API Mesos Master Titus UI Fenzo container Pod & VPC net drivers container container AWS container metadata proxy CI/CD Amazon VM’s
  • 16. Lesson: What you lose with Docker on EC2 16 + < ● Networking: VPC ● Security: Security Groups, IAM Roles ● Context: Instance Metadata, User Data / Env Context ● Operational Visibility: Metrics, Health checking ● Resource Isolation: Networking, Local Storage MULTI-TENANT
  • 17. Lesson: Making Containers Act Like VM’s 17 ● Built: EC2 Metadata Proxy ○ Provide overridden scheduled IAM role, instance id ○ Proxy other values ● Provided: Provide Environmental Context ○ Titus specific job and task info ○ ASG app, stack, sequence, other EC2 standard ● Why? Now: ○ Service discovery registration works ○ Amazon service SDK based applications work
  • 18. Lesson: Networking will continue to evolve 18 ● Started with batch ○ Started with “bridge” with port mapping ○ Added “host” with port resource mapping (for performance?) ○ Continue to use “bridge” without port mapping ● Service style apps added ○ Added “nfvpc” VPC IP/container with libnetwork plugin ○ Removed Host (no value over VPC IP/container) ○ Changed “nfvpc” VPC IP/container ■ Pod based with customer executor (no plugin) ○ Added security groups to “nfvpc”
  • 19. Plumbing VPC Networking into Docker 19 No IP Needed Task 0 SecGrp Y Task 1 Task 2 Task 3 docker0 (*) EC2 VMeth0 eni0 SG=Titus Agent eth1 eni1 SecGrp=X eth2 eni2 SG=Y IP 1 IP 2 IP 3 pod root veth<id> app SecGrp X pod root veth<id> app SecGrp X pod root veth<id> appapp veth<id> Linux Policy Based Routing EC2 Metadata Proxy 169.254.169.254 IPTables NAT (*) * ** 169.254.169.254
  • 20. Lesson: Secure Multi-tenancy is Hard 20 Common to VM’s and tiered security needed ● Protect the reduced host IAM role, Allow containers to have specific IAM roles ● Needed to support same security groups in container networking as VM’s User namespacing ● Docker 1.10 - Introduced User Namespaces ● Didn’t work /w shared networking NS ● Docker 1.11 - Fixed shared networking NS’s ● But, namespacing is per daemon ● Not per container, as hoped ● Waiting on Linux ● Considering mass chmod / ZFS clones
  • 21. Operational Visibility Evolution 21 ● What is “node” - containers on VM’s ● Soft limits / bursting a good thing? ○ Until percent util and outliers are considered ● System level metrics ○ Currently - hand coded cgroup scraping ○ Considering Intel Snap replacement ● Pollers - Metrics, Health, Discovery ○ Created Edda common “server group” view
  • 22. Future Execution Focus 22 ● Better Isolation (agents, networking, block I/O, etc.) ● Exposing our implementation of “Pod”’s to users ● Better resiliency (DNS dependencies reduced)
  • 23. Job Management and Resource Scheduling 23 Titus UITitus UI Docker Registry Docker Registry Rhea container container container docker Titus Agent metrics agent Titus executor logging agent zfs mesos agent docker RheaTitus API Cassandra Titus Master Job Management & Scheduler S3 Zookeeper Docker Registry EC2 Autocaling API Mesos Master Titus UI Fenzo container Pod & VPC net drivers container container AWS container metadata proxy CI/CD Amazon VM’s
  • 24. Lesson: Complexity in scheduling 24 ● Resilience ○ Balance instances across EC2 zones, instances within a zone ● Security ○ Two level resource for ENIs ● Placement optimization ○ Resource affinity ○ Task locality ○ Bin packing (Auto Scaling)
  • 25. Lesson: Keep resource scheduling extensible 25 Fenzo - Extensible Scheduling Library Features: ● Heterogeneous resources & tasks ● Autoscaling of mesos cluster ○ Multiple instance types ● Plugins based scheduling objectives ○ Bin packing, etc. ● Plugins based constraints evaluator ○ Resource affinity, task locality, etc. ● Scheduling actions visibility https://github.com/Netflix/Fenzo
  • 26. Cluster Autoscaling Challenge 26 Host 4Host 3Host 1 vs. For long running stateful services Host 1 Host 2 Host 2 Host 3 Host 4
  • 27. Resources assigned in Titus 27 ● CPU, memory, disk capacity ● Per container AWS EC2 Security groups, IP, and network bandwidth via custom driver ● Abstracting out EC2 instance types
  • 28. Security groups and their resources 28 A two level resource per EC2 Instance: N ENIs, each with M IPs ENI 0 Assigned Security Group: SG1 Used IPs Count: 2 of 7 ENI 1 Assigned Security Group: SG1,SG2 Used IPs Count: 1 of 7 ENI 2 Assigned Security Group: SG3 Used IPs Count: 7 of 7
  • 29. Lesson: Scheduling Vs. Job Management 29 Scheduling resources to tasks is common. Lifecycle management is not.
  • 30. Lesson: Scheduling Vs. Job Management 30 Task scheduling concerns ● Assign resources to tasks ● Cluster wide optimizations ○ Bin packing ○ Global constraints, like SLAs ● Task preferences and constraints ○ Locality with other tasks ○ Resource affinity Job manager concerns ● Managing task/instance counts ● Creating metadata, defining constraints ● Lifecycle management ○ Replace failed task executions ● Handle failures ○ Rate limit requeuing & relaunching ○ Time out tasks in transitionary states
  • 31. Future Job Management & Scheduling Focus 31 ● More resources to track: GPUs ● Automatic resource affinity with heterogenous instances ● SLAs ○ Latencies for services ○ Throughput for batch ○ Task preemptions
  • 32. Things we didn’t cover in this talk ● Overall integration ○ Chaos, continuous delivery, performance insight ● Container Execution ○ Logging (live log access & S3 log rotation) ○ Liveness and health checking ○ Isolation (disk usage, networking, block I/O) ○ Image registry (metrics, security scanning) ● Scheduling ○ Autoscaling heterogeneous pools ○ Host-task fitness criteria ● API ○ Extensibility, polymorphic, SLA and job/container ownership 32
  • 33. Agenda ● Why containers at Netflix? ● What did we build and what did we learn? ● What are our current and future workloads? 33 ⇨
  • 34. Current Titus Production Usage 34 ● Autoscaling ○ 100’s of r3.8xl’s ○ Each 32 vCPU, 244G ● Peak ○ Thousands of cores ○ Tens of TB’s memory ● Thousands containers/day ○ ~ 100 different images
  • 35. Workloads, Past ● Most current usage is batch ○ Algorithm training, adhoc reporting jobs ● Sampling: ○ Training of “sims” and A/B test models ○ Open Connect Device/IX reporting ○ Web security scanning and analysis ○ Social media analytics updates 35
  • 36. Workloads, Now ● Spent last five months adding service style support ● First line of fire customer requests already received ● Larger scale shadow and trickle traffic throughout 2Q ● First service style apps ○ Finer grained instances - NodeJS ○ Docker provided local developer experience 36
  • 37. Workloads, Coming ● Media Encoding ○ Thousands of VM’s ○ VM based resource scheduling ○ Considering containers to have faster start-up ○ Internal spot-market - trough borrowing ● SPaaS ○ 10’s of thousands of containers ○ Stream Processing as a Service ○ Convert scheduling systems to Titus 37
  • 39. Other Netflix QCon Talks 39 Title Time Speaker(s) The Netflix API Platform for Server-Side Scripting Monday 10:35 Katharina Probst Scheduling A Fuller House: Container Mgmt @ Netflix Tuesday 10:35 Andrew Spyker & Sharma Podila Chaos Kong - Endowing Netflix with Antifragility Tuesday 11:50 Luke Kosewski The Evolution of the JavaScript Wednesday 4:10 Jafar Husain Async Programming in JS: The End of the Loop Friday 9:00 Jafar Husain