Running Containers at Scale at Netflix. An update on the usage of containers at Netflix. Technical discussions on new features and concepts we've added across container scheduling and execution.
A basic introductory slide set on Kubernetes: What does Kubernetes do, what does Kubernetes not do, which terms are used (Containers, Pods, Services, Replica Sets, Deployments, etc...) and how basic interaction with a Kubernetes cluster is done.
Kubernetes Webinar - Using ConfigMaps & Secrets Janakiram MSV
Many applications require configuration using some combination of configuration files, command line arguments, and environment variables. ConfigMaps in Kubernetes provide mechanisms to inject containers with configuration data while keeping them portable. Secrets decouple sensitive content from the pods using a volume plug-in. This webinar will discuss the use cases and scenarios for using ConfigMaps and Secrets.
Cluster API is a Kubernetes sub-project that provides declarative APIs and tooling to simplify provisioning, upgrading, and operating multiple Kubernetes clusters on any infrastructure. It works by having core Cluster API components along with plugins for different bootstrap, control-plane and infrastructure providers like Openstack, AWS, GCP etc. The presentation discusses Cluster API integration with Openstack, considerations for using it in production including separate internal and public connections and reusing Openstack networking, and proposes a time-saving deployment model leveraging various Cluster API and Gardener projects.
Kubernetes is an open source container orchestration system that automates the deployment, maintenance, and scaling of containerized applications. It groups related containers into logical units called pods and handles scheduling pods onto nodes in a compute cluster while ensuring their desired state is maintained. Kubernetes uses concepts like labels and pods to organize containers that make up an application for easy management and discovery.
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
This document summarizes a presentation about Netflix's use of containers and the Titus container management platform. It discusses:
1. Why Netflix uses containers to increase innovation velocity for tasks like media encoding and software development. Containers allow for faster iteration and simpler deployment.
2. How Titus was developed to manage containers at Netflix's scale of over 100,000 VMs and 500+ microservices, since existing solutions were not suitable. Titus integrates with AWS for resources like VPC networking and EC2 instances.
3. How Titus supports both batch jobs and long-running services, with challenges like networking, autoscaling, and upgrades that services introduce beyond batch. Collaboration with Amazon on ECS
This document provides an overview of Ingress in Kubernetes, including:
1) It describes the different types of Kubernetes services - ClusterIP, NodePort, LoadBalancer, ExternalName, and Headless - and examples of using each type.
2) It explains that Ingress resources define routing rules to services, and Ingress controllers watch for Ingress resources and update rules to satisfy conditions.
3) Ingress allows for name-based and path-based routing to services, and controllers provide a default backend for requests not handled by Ingress rules.
Hands-On Introduction to Kubernetes at LISA17Ryan Jarvinen
This document provides an agenda and instructions for a hands-on introduction to Kubernetes tutorial. The tutorial will cover Kubernetes basics like pods, services, deployments and replica sets. It includes steps for setting up a local Kubernetes environment using Minikube and demonstrates features like rolling updates, rollbacks and self-healing. Attendees will learn how to develop container-based applications locally with Kubernetes and deploy changes to preview them before promoting to production.
An in depth overview of Kubernetes and it's various components.
NOTE: This is a fixed version of a previous presentation (a draft was uploaded with some errors)
Prometheus has become the defacto monitoring system for cloud native applications, with systems like Kubernetes and Etcd natively exposing Prometheus metrics. In this talk Tom will explore all the moving part for a working Prometheus-on-Kubernetes monitoring system, including kube-state-metrics, node-exporter, cAdvisor and Grafana. You will learn about the various methods for getting to a working setup: the manual approach, using CoreOSs Prometheus Operator, or using Prometheus Ksonnet Mixin. Tom will also share some little tips and tricks for getting the most out of your Prometheus monitoring, including the common pitfalls and what you should be alerting on.
This document provides an overview of Kubernetes, a container orchestration system. It begins with background on Docker containers and orchestration tools prior to Kubernetes. It then covers key Kubernetes concepts including pods, labels, replication controllers, and services. Pods are the basic deployable unit in Kubernetes, while replication controllers ensure a specified number of pods are running. Services provide discovery and load balancing for pods. The document demonstrates how Kubernetes can be used to scale, upgrade, and rollback deployments through replication controllers and services.
Red Hat OpenStack - Open Cloud InfrastructureAlex Baretto
This document provides an overview of Red Hat OpenStack. It discusses market dynamics driving adoption of cloud infrastructure, describes Red Hat's leadership and contributions to the OpenStack community, reviews the core OpenStack components, and demonstrates how an instance is launched across multiple OpenStack services. Red Hat brings enterprise-grade support, stability, and lifecycle management to OpenStack through Red Hat OpenStack.
This document summarizes a presentation about deploying applications on Kubernetes with GitOps. The presentation covers GitOps workflows and tools like FluxCD and ArgoCD for managing Helm charts from Git repositories. It also discusses integrating continuous integration pipelines with ArgoCD and provides best practices for areas like secret management, scaling, and microservices. The presenter concludes by taking questions and inviting interested parties to join their company.
Everything You Need To Know About Persistent Storage in KubernetesThe {code} Team
This document discusses Kubernetes persistent storage options for stateful applications. It covers common use cases that require persistence like databases, messaging systems, and content management systems. It then describes Kubernetes persistent volume (PV), persistent volume claim (PVC), and storage class objects that are used to provision and consume persistent storage. Finally, it compares deployments with statefulsets and covers other volume types like emptyDir, hostPath, daemonsets and their use cases.
The Power of GitOps with Flux & GitOps ToolkitWeaveworks
GitOps Days Community Special
Watch the video here: https://youtu.be/0v5bjysXTL8
New to GitOps or been a long-time Flux user?
We'll walk you through the benefits of GitOps and then demo it in action with a sneak peak into the next gen Flux and GitOps Toolkit!
* Automation!
* Visibility!
* Reconciliation!
* Powerful use of Prometheus and Grafana!
* GitOps for Helm!
For Flux users, Flux v1 is decoupled into Flux v2 and GitOps Toolkit. We'll demo how this decoupling gives you more control over how you can do GitOps and with fewer steps!
Join Leigh Capili and Tamao Nakahara as they show you GitOps in action with Flux and GitOps Toolkit.
Note to our Flux community that Flux v2 and the GitOps Toolkit is in development and Flux v1 is in maintenance mode. These talks and upcoming guides will give you the most up-to-date info and steps to migrate once we reach feature parity and start the migration process. We are dedicated to the smoothest experience possible for our Flux community, so please join us if you'd like early access and to give us feedback for the migration process.
We are really excited by the improvements and want to take this opportunity to show you what the GitOps Toolkit is all about, walk you through the guides and get your feedback!
For more info, see https://toolkit.fluxcd.io/.
Here's our latest blog post on Flux v2 and GitOps Toolkit updates: https://www.weave.works/blog/the-road-to-flux-v2-october-update
This document discusses how VXLAN works on Linux in 3 parts: (1) it explains the basic mechanism of VXLAN including packet encapsulation and ARP resolution, (2) it describes how OpenStack Neutron implements VXLAN using the OVS plugin and ML2 l2population driver, and (3) it discusses the Flannel implementation of VTEP using Linux kernel extensions and an etcd key-value store.
KFServing - Serverless Model InferencingAnimesh Singh
Deep dive into KFServing: Serverless Model Inferencing Platform built on top of KNative and Istio. Part of the Kubeflow project, and deployed in production across organizations.
These are the slides for a talk/workshop delivered to the Cloud Native Wales user group (@CloudNativeWal) on 2019-01-10.
In these slides, we go over some principles of gitops and a hands on session to apply these to manage a microservice.
You can find out more about GitOps online https://www.weave.works/technologies/gitops/
Kubernetes has two simple but powerful network concepts: every Pod is connected to the same network, and Services let you talk to a Pod by name. Bryan will take you through how these concepts are implemented - Pod Networks via the Container Network Interface (CNI), Service Discovery via kube-dns and Service virtual IPs, then on to how Services are exposed to the rest of the world.
Slides of talk given at London Study of Enterprise Agile Meetup in June 2019.
We go over GitOps and how it affects delivery speed in software development and release.
This presentation covers how app deployment model evolved from bare metal servers to Kubernetes World.
In addition to theoretical information, you will find free KATACODA workshops url to perform practices to understand the details of the each topics.
Christian Kniep from Docker Inc. gave this talk at the Stanford HPC Conference.
"This talk will recap the history of and what constitutes Linux Containers, before laying out how the technology is employed by various engines and what problems these engines have to solve. Afterward, Christian will elaborate on why the advent of standards for images and runtimes moved the discussion from building and distributing containers to orchestrating containerized applications at scale. In conclusion, attendees will get an update on what problems still hinder the adoption of containers for distributed high performance workloads and how Docker is addressing these issues."
Christian Kniep is a Technical Account Manager at Docker, Inc. With a 10 year journey rooted in the HPC parts of the german automotive industry, Christian Kniep started to support CAE applications and VR installations. When told at a conference that HPC can not learn anything from the emerging Cloud and BigData companies, he became curious and was leading the containerization effort of the cloud-stack at Playstation Now. Christian joined Docker Inc in 2017 to help push the adoption forward and be part of the innovation instead of an external bystander. During the day he helps Docker customers in the EMEA region to fully utilize the power of containers; at night he likes to explore new emerging trends by containerizing them first and seek application in the nebulous world of DevOps.
Watch the video: https://wp.me/p3RLHQ-i4X
Learn more: http://docker.com
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com
Microservices Architectures: Become a Unicorn like Netflix, Twitter and Hailogjuljo
Full day workshop about Microservices Architectures, from the basis to advanced topics like Service Discovery, Load Balancing, Fault Tolerance and Centralized Logging.
Many technologies are involved, like Spring Cloud Netflix, Docker, Cloud Foundry and ELK.
A separate deck describes all the lab exercises.
In this session, we will discuss the architecture of a Kubernetes cluster. we will go through all the master and worker components of a kubernetes cluster. We will also discuss the basic terminology of Kubernetes cluster such as Pods, Deployments, Service etc. We will also cover networking inside Kuberneets. In the end, we will discuss options available for the setup of a Kubernetes cluster.
Modern cloud-native applications are incredibly complex systems. Keeping the systems healthy and meeting SLAs for our customers is crucial for long-term success. In this session, we will dive into the three pillars of observability - metrics, logs, tracing - the foundation of successful troubleshooting in distributed systems. You'll learn the gotchas and pitfalls of rolling out the OpenTelemetry stack on Kubernetes to effectively collect all your signals without worrying about a vendor lock in. Additionally we will replace parts of the Prometheus stack to scrape metrics with OpenTelemetry collector and operator.
Integrating microservices with apache camel on kubernetesClaus Ibsen
Apache Camel has fundamentally changed the way Java developers build system-to-system integrations by using enterprise integration patterns (EIP) with modern microservice architectures. In this session, we’ll show you best practices with Camel and EIPs, in the world of Spring Boot microservices running on Kubernetes. We'll also discuss practices how to build truly cloud-native distributed and fault-tolerant microservices and we’ll introduce the upcoming Camel 3.0 release, which includes serverless capabilities via Camel K. This talk is a mix with slides and live demos.
Getting Started: Intro to Telegraf - July 2021InfluxData
In this training webinar, Samantha Wang will walk you through the basics of Telegraf. Telegraf is the open source server agent which is used to collect metrics from your stacks, sensors and systems. It is InfluxDB’s native data collector that supports nearly 300 inputs and outputs. Learn how to send data from a variety of systems, apps, databases and services in the appropriate format to InfluxDB. Discover tips and tricks on how to write your own plugins. The know-how learned here can be applied to a multitude of use cases and sectors. This one-hour session will include the training and time for live Q&A.
Join this training as Samantha Wang dives into:
Types of Telegraf plugins (i.e. input, output, aggregator and processor)
Specific plugins including Execd input plugins and the Starlark processor plugin
How to install and start using Telegraf
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
Come hear about our container management platform, Titus. Titus launches over 2 millions containers per week for service and batch workloads. Come to learn what applications are powered by Titus and what values the developers are getting from containers. Also, we will cover some of the Titus unique aspects of reliability, control plane, scheduling, and container runtime technologies. We will also cover our integrations with Netflix systems such as Spinnaker as well as Amazon concepts such as VPC and IAM.
https://www.meetup.com/Netflix-Open-Source-Platform/events/247776324/
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
Disenchantment is a Netflix show following the medieval misadventures of a hard-drinking princess, her feisty elf, and her personal demon. In this talk, we will follow the story of Netflix’s container management platform, Titus, which powers critical aspects of the Netflix business (video encoding & streaming, big data, recommendations & machine learning, and other workloads). We’ll cover the challenges growing Titus from 10’s to 1000’s of workloads. We’ll talk about our feisty team’s work across container runtimes, scheduling & control plane, and cloud infrastructure integration. We’ll talk about the demons we’ve found on this journey covering operability, security, reliability and performance.
An in depth overview of Kubernetes and it's various components.
NOTE: This is a fixed version of a previous presentation (a draft was uploaded with some errors)
Prometheus has become the defacto monitoring system for cloud native applications, with systems like Kubernetes and Etcd natively exposing Prometheus metrics. In this talk Tom will explore all the moving part for a working Prometheus-on-Kubernetes monitoring system, including kube-state-metrics, node-exporter, cAdvisor and Grafana. You will learn about the various methods for getting to a working setup: the manual approach, using CoreOSs Prometheus Operator, or using Prometheus Ksonnet Mixin. Tom will also share some little tips and tricks for getting the most out of your Prometheus monitoring, including the common pitfalls and what you should be alerting on.
This document provides an overview of Kubernetes, a container orchestration system. It begins with background on Docker containers and orchestration tools prior to Kubernetes. It then covers key Kubernetes concepts including pods, labels, replication controllers, and services. Pods are the basic deployable unit in Kubernetes, while replication controllers ensure a specified number of pods are running. Services provide discovery and load balancing for pods. The document demonstrates how Kubernetes can be used to scale, upgrade, and rollback deployments through replication controllers and services.
Red Hat OpenStack - Open Cloud InfrastructureAlex Baretto
This document provides an overview of Red Hat OpenStack. It discusses market dynamics driving adoption of cloud infrastructure, describes Red Hat's leadership and contributions to the OpenStack community, reviews the core OpenStack components, and demonstrates how an instance is launched across multiple OpenStack services. Red Hat brings enterprise-grade support, stability, and lifecycle management to OpenStack through Red Hat OpenStack.
This document summarizes a presentation about deploying applications on Kubernetes with GitOps. The presentation covers GitOps workflows and tools like FluxCD and ArgoCD for managing Helm charts from Git repositories. It also discusses integrating continuous integration pipelines with ArgoCD and provides best practices for areas like secret management, scaling, and microservices. The presenter concludes by taking questions and inviting interested parties to join their company.
Everything You Need To Know About Persistent Storage in KubernetesThe {code} Team
This document discusses Kubernetes persistent storage options for stateful applications. It covers common use cases that require persistence like databases, messaging systems, and content management systems. It then describes Kubernetes persistent volume (PV), persistent volume claim (PVC), and storage class objects that are used to provision and consume persistent storage. Finally, it compares deployments with statefulsets and covers other volume types like emptyDir, hostPath, daemonsets and their use cases.
The Power of GitOps with Flux & GitOps ToolkitWeaveworks
GitOps Days Community Special
Watch the video here: https://youtu.be/0v5bjysXTL8
New to GitOps or been a long-time Flux user?
We'll walk you through the benefits of GitOps and then demo it in action with a sneak peak into the next gen Flux and GitOps Toolkit!
* Automation!
* Visibility!
* Reconciliation!
* Powerful use of Prometheus and Grafana!
* GitOps for Helm!
For Flux users, Flux v1 is decoupled into Flux v2 and GitOps Toolkit. We'll demo how this decoupling gives you more control over how you can do GitOps and with fewer steps!
Join Leigh Capili and Tamao Nakahara as they show you GitOps in action with Flux and GitOps Toolkit.
Note to our Flux community that Flux v2 and the GitOps Toolkit is in development and Flux v1 is in maintenance mode. These talks and upcoming guides will give you the most up-to-date info and steps to migrate once we reach feature parity and start the migration process. We are dedicated to the smoothest experience possible for our Flux community, so please join us if you'd like early access and to give us feedback for the migration process.
We are really excited by the improvements and want to take this opportunity to show you what the GitOps Toolkit is all about, walk you through the guides and get your feedback!
For more info, see https://toolkit.fluxcd.io/.
Here's our latest blog post on Flux v2 and GitOps Toolkit updates: https://www.weave.works/blog/the-road-to-flux-v2-october-update
This document discusses how VXLAN works on Linux in 3 parts: (1) it explains the basic mechanism of VXLAN including packet encapsulation and ARP resolution, (2) it describes how OpenStack Neutron implements VXLAN using the OVS plugin and ML2 l2population driver, and (3) it discusses the Flannel implementation of VTEP using Linux kernel extensions and an etcd key-value store.
KFServing - Serverless Model InferencingAnimesh Singh
Deep dive into KFServing: Serverless Model Inferencing Platform built on top of KNative and Istio. Part of the Kubeflow project, and deployed in production across organizations.
These are the slides for a talk/workshop delivered to the Cloud Native Wales user group (@CloudNativeWal) on 2019-01-10.
In these slides, we go over some principles of gitops and a hands on session to apply these to manage a microservice.
You can find out more about GitOps online https://www.weave.works/technologies/gitops/
Kubernetes has two simple but powerful network concepts: every Pod is connected to the same network, and Services let you talk to a Pod by name. Bryan will take you through how these concepts are implemented - Pod Networks via the Container Network Interface (CNI), Service Discovery via kube-dns and Service virtual IPs, then on to how Services are exposed to the rest of the world.
Slides of talk given at London Study of Enterprise Agile Meetup in June 2019.
We go over GitOps and how it affects delivery speed in software development and release.
This presentation covers how app deployment model evolved from bare metal servers to Kubernetes World.
In addition to theoretical information, you will find free KATACODA workshops url to perform practices to understand the details of the each topics.
Christian Kniep from Docker Inc. gave this talk at the Stanford HPC Conference.
"This talk will recap the history of and what constitutes Linux Containers, before laying out how the technology is employed by various engines and what problems these engines have to solve. Afterward, Christian will elaborate on why the advent of standards for images and runtimes moved the discussion from building and distributing containers to orchestrating containerized applications at scale. In conclusion, attendees will get an update on what problems still hinder the adoption of containers for distributed high performance workloads and how Docker is addressing these issues."
Christian Kniep is a Technical Account Manager at Docker, Inc. With a 10 year journey rooted in the HPC parts of the german automotive industry, Christian Kniep started to support CAE applications and VR installations. When told at a conference that HPC can not learn anything from the emerging Cloud and BigData companies, he became curious and was leading the containerization effort of the cloud-stack at Playstation Now. Christian joined Docker Inc in 2017 to help push the adoption forward and be part of the innovation instead of an external bystander. During the day he helps Docker customers in the EMEA region to fully utilize the power of containers; at night he likes to explore new emerging trends by containerizing them first and seek application in the nebulous world of DevOps.
Watch the video: https://wp.me/p3RLHQ-i4X
Learn more: http://docker.com
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com
Microservices Architectures: Become a Unicorn like Netflix, Twitter and Hailogjuljo
Full day workshop about Microservices Architectures, from the basis to advanced topics like Service Discovery, Load Balancing, Fault Tolerance and Centralized Logging.
Many technologies are involved, like Spring Cloud Netflix, Docker, Cloud Foundry and ELK.
A separate deck describes all the lab exercises.
In this session, we will discuss the architecture of a Kubernetes cluster. we will go through all the master and worker components of a kubernetes cluster. We will also discuss the basic terminology of Kubernetes cluster such as Pods, Deployments, Service etc. We will also cover networking inside Kuberneets. In the end, we will discuss options available for the setup of a Kubernetes cluster.
Modern cloud-native applications are incredibly complex systems. Keeping the systems healthy and meeting SLAs for our customers is crucial for long-term success. In this session, we will dive into the three pillars of observability - metrics, logs, tracing - the foundation of successful troubleshooting in distributed systems. You'll learn the gotchas and pitfalls of rolling out the OpenTelemetry stack on Kubernetes to effectively collect all your signals without worrying about a vendor lock in. Additionally we will replace parts of the Prometheus stack to scrape metrics with OpenTelemetry collector and operator.
Integrating microservices with apache camel on kubernetesClaus Ibsen
Apache Camel has fundamentally changed the way Java developers build system-to-system integrations by using enterprise integration patterns (EIP) with modern microservice architectures. In this session, we’ll show you best practices with Camel and EIPs, in the world of Spring Boot microservices running on Kubernetes. We'll also discuss practices how to build truly cloud-native distributed and fault-tolerant microservices and we’ll introduce the upcoming Camel 3.0 release, which includes serverless capabilities via Camel K. This talk is a mix with slides and live demos.
Getting Started: Intro to Telegraf - July 2021InfluxData
In this training webinar, Samantha Wang will walk you through the basics of Telegraf. Telegraf is the open source server agent which is used to collect metrics from your stacks, sensors and systems. It is InfluxDB’s native data collector that supports nearly 300 inputs and outputs. Learn how to send data from a variety of systems, apps, databases and services in the appropriate format to InfluxDB. Discover tips and tricks on how to write your own plugins. The know-how learned here can be applied to a multitude of use cases and sectors. This one-hour session will include the training and time for live Q&A.
Join this training as Samantha Wang dives into:
Types of Telegraf plugins (i.e. input, output, aggregator and processor)
Specific plugins including Execd input plugins and the Starlark processor plugin
How to install and start using Telegraf
NetflixOSS Meetup S6E1 - Titus & Containersaspyker
Come hear about our container management platform, Titus. Titus launches over 2 millions containers per week for service and batch workloads. Come to learn what applications are powered by Titus and what values the developers are getting from containers. Also, we will cover some of the Titus unique aspects of reliability, control plane, scheduling, and container runtime technologies. We will also cover our integrations with Netflix systems such as Spinnaker as well as Amazon concepts such as VPC and IAM.
https://www.meetup.com/Netflix-Open-Source-Platform/events/247776324/
QConSF18 - Disenchantment: Netflix Titus, its Feisty Team, and Daemonsaspyker
Disenchantment is a Netflix show following the medieval misadventures of a hard-drinking princess, her feisty elf, and her personal demon. In this talk, we will follow the story of Netflix’s container management platform, Titus, which powers critical aspects of the Netflix business (video encoding & streaming, big data, recommendations & machine learning, and other workloads). We’ll cover the challenges growing Titus from 10’s to 1000’s of workloads. We’ll talk about our feisty team’s work across container runtimes, scheduling & control plane, and cloud infrastructure integration. We’ll talk about the demons we’ve found on this journey covering operability, security, reliability and performance.
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
Scheduling a Fuller House: Container Management At Netflix
Customers from over all over the world streamed Forty Two Billion hours of Netflix content last year. Various Netflix batch jobs and an increasing number of service applications use containers for their processing. In this talk Netflix will present a deep dive on the motivations and the technology powering container deployment on top of the AWS EC2 service. The talk will cover our approach to cloud resource management and scheduling with the open source Fenzo library, along with details on docker execution engine as a part of project Titus. As well, the talk will share some of the results so far, lessons learned, and end with a brief look at the developer experience for containers.
This document discusses Netflix's use of containers and the Titus container management platform. It provides the following key points:
1. Titus provides container scheduling, resource management, and execution capabilities. It helps Netflix achieve consistent developer experiences, faster innovation, and simpler deployments.
2. Titus has scaled to support over 1,000,000 daily container launches across multiple AWS regions. It integrates containers into Netflix's infrastructure and provides features like VPC networking and AWS integration.
3. Looking ahead, Netflix aims to further improve Titus' performance, operations, reliability, and scheduling capabilities to support even larger container workloads. Deeper security isolation and integration with services like ALB are also goals.
This summary provides an overview of the key points from the document in 3 sentences:
The document outlines the agenda for Season 3 Episode 1 of the Netflix OSS podcast, which includes lightning talks on 8 new projects including Atlas, Prana, Raigad, Genie 2, Inviso, Dynomite, Nicobar, and MSL. Representatives from Netflix, IBM Watson, Nike Digital, and Pivotal then each provide a 3-5 minute presentation on their featured project. The presentations describe the motivation, features and benefits of each project for observability, integration with the Netflix ecosystem, automation of Elasticsearch deployments, job scheduling, dynamic scripting for Java, message security, and developing microservices
Agenda:
What is Software Defined Storage?
What is Ceph?
What is Rook?
Storage for Kubernetes
Storage Classes
Storage on Kubernetes
Operator Pattern
Custom Resource Definition
Rook Operator
Rook architecture
Ceph on Kubernetes with Rook
Demo
Rook Framework for Storage solutions
How to Get Involved?
Herding Kats - Netflix’s Journey to Kubernetes Publicaspyker
An update from Netflix Compute's container management platform, Titus, covering the work to move from Mesos to Kubernetes. Lessons learned, next steps, and challenges.
This document provides an overview and summary of OpenShift v3 and containers. It discusses how OpenShift v3 uses Docker containers and Kubernetes for orchestration instead of the previous "Gears" system. It also summarizes the key architectural changes in OpenShift v3, including using immutable Docker images, separating development and operations, and abstracting operational complexity.
The Icehouse release of OpenStack focused on improving the user experience and operational capabilities. It included stability enhancements and bug fixes for core projects like Nova, Neutron, Glance, Cinder, and Swift. New features were added for many services, such as scheduler improvements in Nova, policy-based storage in Swift, and alarming capabilities in Ceilometer. The release also incubated several new projects, including Sahara, Barbican, Marconi, and continued development of TripleO, Ironic, and other underlying projects.
Andrew Spyker
Senior Software Engineer for Netflix
Find more by Andrew Spyker: http://www.slideshare.net/aspyker
All Things Open
October 26-27, 2016
Raleigh, North Carolina
Netflix and Containers: Not A Stranger Thingaspyker
Customers from over all over the world streamed Forty Two Billion hours of Netflix content last year. The Netflix streaming service had been powered by the Amazon cloud with virtual machines for over five years, blazing a trail for similar architectures. In the last year, it invested in containers for batch-style jobs and service-style applications. Andrew Spyker will explain the potential containers have to help Netflix create a more productive development experience while simultaneously deepening its control over resource management. Join Andrew to see why Netflix is moving forward with containers, how it can leverage its existing operational machinery, and how it’s running containers with a similar guarantee of high availability as current Netflix infrastructure provides.
[KubeCon EU 2021] Introduction and Deep Dive Into ContainerdAkihiro Suda
Join containerd maintainers and reviewers in a combined introduction and deep dive session. They will discuss the overview and the recent updates of containerd as well as how it is being used by Kubernetes, Docker and other container-based systems. The brief introduction about its architecture and service design will be included. The talk will also deep dive into how to leverage contained by extending and customizing it for your use case with low-level plugins like remote snapshotters, as well as by implementing your own containerd client. Upcoming features and recent discussion in containerd community will also be covered.
- - -
https://kccnceu2021.sched.com/event/iE6v/introduction-and-deep-dive-into-containerd-kohei-tokunaga-akihiro-suda-ntt-corporation?iframe=no
Introduction and Deep Dive Into ContainerdKohei Tokunaga
Talked at KubeCon + CloudNativeCon Europe 2021 Virtual about containerd (May 5, 2021).
https://kccnceu2021.sched.com/event/iE6v
Introduction to containers, k8s, Microservices & Cloud NativeTerry Wang
Slides built to upskill and enable internal team and/or partners on foundational infra skills to work in a containerized world.
Topics covered
- Container / Containerization
- Docker
- k8s / container orchestration
- Microservices
- Service Mesh / Serverless
- Cloud Native (apps & infra)
- Relationship between Kubernetes and Runtime Fabric
Audiences: MuleSoft internal technical team, partners, Runtime Fabric users.
Monitoring kubernetes across data center and cloudDatadog
This document summarizes a presentation about monitoring Kubernetes clusters across data centers and cloud platforms using Datadog. It discusses how Kubernetes provides container-centric infrastructure and flexibility for hybrid cloud deployments. It also describes how monitoring works in Google Container Engine using cAdvisor, Heapster, and Stackdriver. Finally, it discusses how Datadog and Tectonic can be used to extend Kubernetes monitoring capabilities for enterprises.
Disenchantment: Netflix Titus, Its Feisty Team, and DaemonsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Gmuwlg.
Andrew Spyker talks about Netflix's feisty team’s work across container runtimes, scheduling & control plane, and cloud infrastructure integration. He also talks about the demons they’ve found on this journey covering operability, security, reliability and performance. Filmed at qconsf.com.
Andrew Spyker worked to mature the technology base of Netflix Container Cloud (Project Titus) within the development team. Recently, he moved into a product management role collaborating with supporting Netflix infrastructure dependencies as well as supporting new container cloud usage scenarios including user on-boarding, feature prioritization/delivery and relationship management.
The journey to container adoption in enterpriseIgor Moochnick
This document discusses the journey to container adoption in enterprises. It begins by describing traditional monolithic architectures and then discusses how container technologies like Docker and Mesos enable new paradigms like microservices that emphasize speed, agility and loose coupling. It covers challenges around deployment, testing, monitoring and failure handling with containers and discusses emerging tools and approaches to address these challenges. Finally, it considers future directions like Kubernetes and stream processing architectures.
Season 7 Episode 1 - Tools for Data Scientistsaspyker
Metaflow (Ville Tuulos)
Data scientists at Netflix are expected to develop and operate large machine learning workflows autonomously. However, we do not expect that all our scientists are deeply experienced with distributed systems and data engineering. Metaflow was created to make it delightfully easy to build and operate ML workflows in the cloud using idiomatic Python and off-the-shelf ML libraries, covering the whole lifecycle of an ML project from prototype to production.
Polynote (Jeremy Smith)
Polynote is a new notebook tool we created from scratch to address some of the pain points we've run into while using Scala in machine-learning notebooks at Netflix. It provides essential code editing features other tools lack like interactive auto-completes, support for mixing multiple languages and sharing data between them within a single notebook, and encourages reproducible notebooks with its immutable data model.
Papermill (Matthew Seal)
Nteract is an open source organization under which there are several libraries and applications that Netflix and many other companies and individuals contribute to. One of these libraries is Papermill, a library used to programmatically parameterize and execute Jupyter Notebooks. Papermill provides a CLI and Python interface that we'll explore during the session to see how it can be used and what value it adds. Using this pattern we'll also briefly talk about how we've integrated papermill at Netflix and how it interfaces with other Jupyter and nteract services.
CMP376 - Another Week, Another Million Containers on Amazon EC2aspyker
Netflix’s container management platform, Titus, powers critical aspects of the Netflix business, including video streaming, recommendations, machine learning, big data, content encoding, studio technology, internal engineering tools, and other Netflix workloads. Titus offers a convenient model for managing compute resources, enables developers to maintain just their application artifacts, and provides a consistent developer experience from a developer’s laptop to production by leveraging Netflix container-focused engineering tools.
In this episode, we will focus on continuous delivery and how Netflix uses Spinnaker and Kayenta to safely deliver changes to the cloud and beyond. Kayenta is a platform for Automated Canary Analysis (ACA). It is used by Spinnaker to enable automated canary deployments. We will also discuss how Spinnaker is used at Netflix to deploy targets beyond cloud VMs and containers --- batch jobs, CDNs, fast properties and Open Connect appliances.
Slides for SRECon 2018 talk on https://medium.com/@awspyker/why-as-a-netflix-infrastructure-manager-am-i-on-call-bdc551ac01fe
Netflix has over 109 million members and uses over 500 microservices running on 100,000 virtual machines across 3 regions to stream over 100 million hours of content per day. Netflix open sources many of its cloud projects to improve engineering, recruit talent, and align with industry standards. Some of Netflix's notable open source projects include Chaos Monkey for testing high availability, Spinnaker for continuous delivery, and Security Monkey for monitoring security policies. While Netflix's cloud architecture and security practices were discussed, areas like big data, data persistence, UI engineering, personalization algorithms, and studio applications were not covered.
Topics:
• RepoKid
Netflix’s Open-source Strategy to Rightsizing Cloud Permissions at Scale
• BetterTLS
A test suite for HTTPS clients implementing verification of the Name Constraints certificate extension
• Authorization at Netflix
Netflix’s architecture for implementing Authorization at scale
• Open Policy Agent
An open source, general-purpose policy engine that enables unified, context-aware policy enforcement across the entire stack. (www.openpolicyagent.org)
• Introducing PADME (Policy Access Decision Management Engine)
A modern policy management for distributed heterogenous systems. (www.padme.io)
Demo Stations:
• Stethoscope
Personalized, user-focused recommendations for employee information security.
• HubCommander
Slack bot for GitHub organization management -- and other things too!
• Open Policy Agent
An open source, general-purpose policy engine that enables unified, context-aware policy enforcement across the entire stack.
Series of Unfortunate Netflix Container Events - QConNYC17aspyker
Project Titus is Netflix's container runtime on top of Amazon EC2. Titus powers algorithm research through massively parallel model training, media encoding, data research notebooks, ad hoc reporting, NodeJS UI services, stream processing and general micro-services. As an update from last year's talk, we will focus on the lessons learned operating one of the largest container runtimes on a public cloud. We'll cover the migration we've seen of applications and frameworks from VM's to containers. We will cover the operational issues with containers that only showed after we reached the large scale (1000's of container hosts, 100's of thousands of containers launched weekly) we are currently supporting. We'll touch base on the unique features we have added to help both batch and microservices run across a variety of runtimes (Java, R, NodeJS, Python, etc) and how higher level frameworks have taken advantage of Titus's scheduling capabilities.
In this episode, we will focus on open sourcing how we run Netflix's open source program. Netflix has been using and contributing to open source for several years. Over the years, Netflix has released over one hundred Netflix Open Source (aka NetflixOSS) libraries, servers, and technologies. Netflix engineers benefit by accepting contributions and gathering feedback with key collaborators around the world. Users of NetflixOSS from many industries benefit from our solutions including Big Data, Build and Delivery Tools, Runtime Services and Libraries, Data Persistence, Insight, Reliability and Performance, Security and User Interface. With such a large and mature open source program, Netflix has worked on approaches and tools that help manage and improve the NetflixOSS source offerings and communities. Netflix has taken a different approach to building support for open source as compared to other Internet scale companies. Come to this session to learn about the unique approaches Netflix has taken to both distribute and automate the responsibilities of building a world-class open source program.
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
Netflix has been using and contributing to open source for several years. Over the years, Netflix has released over one hundred Netflix Open Source (aka NetflixOSS) libraries, servers, and technologies. Netflix engineers benefit by accepting contributions and gathering feedback with key collaborators around the world. Users of NetflixOSS from many industries benefit from our solutions including Big Data, Build and Delivery Tools, Runtime Services and Libraries, Data Persistence, Insight, Reliability and Performance, Security and User Interface. With such a large and mature open source program, Netflix has worked on approaches and tools that help manage and improve the NetflixOSS source offerings and communities. Netflix has taken a different approach to building support for open source as compared to other Internet scale companies. Come to this session to learn about the unique approaches Netflix has taken to both distribute and automate the responsibilities of building a world-class open source program.
Netflix uses containers to run both batch jobs and services. For batch jobs, containers simplify resource management and allow jobs like model training and media encoding to easily share resources. Services are more complex to run in containers due to challenges like constant resizing, statefulness, and networking. Netflix addresses these challenges through solutions like a VPC networking driver and reusing existing infrastructure services for containers. Looking ahead, Netflix aims to run more containers at larger scale for areas like developer experience, continuous integration, and internal resource optimization.
Netflix Open Source Meetup Season 4 Episode 3aspyker
In this episode, we will focus on security in the cloud at scale. We’ll have Netflix speakers discussing existing and upcoming security-related OSS releases, and we’ll also have external speakers from organizations that are using and contributing to Netflix security OSS.
First, Patrick Kelley from Netflix’s Security Operations team will speak about RepoMan, an upcoming OSS release designed to right-size AWS permissions. Then, Wes Miaw from Netflix’s Security Engineering team will discuss MSL (Message Security Layer).
We have two external speakers for this event - Chris Dorros from OpenDNS/Cisco will talk about his use of and contributions to Lemur, and Ryan Lane from Lyft will talk about their use of BLESS.
After the talks, we’ll have OSS authors at demo stations to answer questions and provide demos of Netflix security OSS, including Lemur, MSL, and Security Monkey.
Netflix Open Source Meetup Season 4 Episode 2aspyker
In this episode, we will take a close look at 2 different approaches to high-throughput/low-latency data stores, developed by Netflix.
The first, EVCache, is a battle-tested distributed memcached-backed data store, optimized for the cloud. You will also hear about the road ahead for EVCache it evolves into an L1/L2 cache over RAM and SSDs.
The second, Dynomite, is a framework to make any non-distributed data-store, distributed. Netflix's first implementation of Dynomite is based on Redis.
Come learn about the products' features and hear from Thomson and Reuters, Diego Pacheco from Ilegra and other third party speakers, internal and external to Netflix, on how these products fit in their stack and roadmap.
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
This document summarizes Netflix's Titus container cloud platform. It discusses Titus' high-level architecture including job management, elastic resource management and optimization, container execution, and integration capabilities. It also provides details on the Titus user interface, underlying technologies like Docker and Mesos, and current metrics like autoscaling hundreds of large EC2 instances and supporting thousands of containers per day across tens of terabytes of memory.
Netflix Open Source Meetup Season 4 Episode 1aspyker
This document summarizes Netflix's efforts to evolve their open source projects. It discusses establishing clear ownership and lifecycles for projects (active, retired, experimental). It also describes a new dashboard called the Netflix OSS Tracker to monitor project health metrics. The rest of the document demonstrates this Spinnaker continuous delivery platform that Netflix has open sourced and discusses Google's involvement in contributing to and adopting Spinnaker.
Netflix is a large streaming company with over 75 million members and 42.5 billion hours watched in 2015. The company has thousands of microservices and many tens of thousands of virtual machines across 3 regions worldwide. Netflix open sources much of its cloud platform technologies to get feedback, collaborate with others, and improve proven open source projects for its scale and availability. Open sourcing also helps with recruiting and retention by allowing candidates and engineers to work on the same projects they could at Netflix. Netflix's open source offerings like Spring Cloud and container technologies are widely used both publicly and internally at other large companies.
Triangle Devops Meetup covering Netflix open source, cloud architecture, and what Andrew did in his first year working as a senior software engineer in the cloud platform group.
A presentation on the Netflix Cloud Architecture and NetflixOSS open source. For the All Things Open 2015 conference in Raleigh 2015/10/19. #ATO2015 #NetflixOSS
Andrew Spyker presented on Netflix's cloud platform and open source projects. Some key points included:
- Netflix has migrated from monolithic architectures to microservices and continuous delivery enabled by their open source libraries and services.
- Their platform focuses on elasticity, high availability through automation, and operational visibility.
- Netflix uses technologies like Eureka, Ribbon, Hystrix, and Servo to enable scalability, resilience, and monitoring across their distributed systems.
- They contribute over 50 open source projects to help others adopt their cloud-native approaches and are working on data and UI related projects.
Andrew Spyker presented on the Netflix Cloud Platform and ZeroToDocker project. The following key points were discussed:
- ZeroToDocker provides Docker images of Netflix OSS projects like Eureka, Zuul and Asgard to more easily evaluate the technologies. However, the images are not intended for direct production use.
- A demo showed running a microservices application and supporting Netflix OSS services like Eureka and Zuul using Docker containers on a single machine.
- While Docker aids development and evaluation, additional tooling is needed to operationalize containers at production scale across multiple hosts for tasks like networking, security, logging and scheduling. Competing ecosystems are emerging to address these needs.
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
Quantum Computing Quick Research Guide by Arthur MorganArthur Morgan
This is a Quick Research Guide (QRG).
QRGs include the following:
- A brief, high-level overview of the QRG topic.
- A milestone timeline for the QRG topic.
- Links to various free online resource materials to provide a deeper dive into the QRG topic.
- Conclusion and a recommendation for at least two books available in the SJPL system on the QRG topic.
QRGs planned for the series:
- Artificial Intelligence QRG
- Quantum Computing QRG
- Big Data Analytics QRG
- Spacecraft Guidance, Navigation & Control QRG (coming 2026)
- UK Home Computing & The Birth of ARM QRG (coming 2027)
Any questions or comments?
- Please contact Arthur Morgan at [email protected].
100% human made.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Artificial Intelligence is providing benefits in many areas of work within the heritage sector, from image analysis, to ideas generation, and new research tools. However, it is more critical than ever for people, with analogue intelligence, to ensure the integrity and ethical use of AI. Including real people can improve the use of AI by identifying potential biases, cross-checking results, refining workflows, and providing contextual relevance to AI-driven results.
News about the impact of AI often paints a rosy picture. In practice, there are many potential pitfalls. This presentation discusses these issues and looks at the role of analogue intelligence and analogue interfaces in providing the best results to our audiences. How do we deal with factually incorrect results? How do we get content generated that better reflects the diversity of our communities? What roles are there for physical, in-person experiences in the digital world?
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, presentation slides, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersToradex
Toradex brings robust Linux support to SMARC (Smart Mobility Architecture), ensuring high performance and long-term reliability for embedded applications. Here’s how:
• Optimized Torizon OS & Yocto Support – Toradex provides Torizon OS, a Debian-based easy-to-use platform, and Yocto BSPs for customized Linux images on SMARC modules.
• Seamless Integration with i.MX 8M Plus and i.MX 95 – Toradex SMARC solutions leverage NXP’s i.MX 8 M Plus and i.MX 95 SoCs, delivering power efficiency and AI-ready performance.
• Secure and Reliable – With Secure Boot, over-the-air (OTA) updates, and LTS kernel support, Toradex ensures industrial-grade security and longevity.
• Containerized Workflows for AI & IoT – Support for Docker, ROS, and real-time Linux enables scalable AI, ML, and IoT applications.
• Strong Ecosystem & Developer Support – Toradex offers comprehensive documentation, developer tools, and dedicated support, accelerating time-to-market.
With Toradex’s Linux support for SMARC, developers get a scalable, secure, and high-performance solution for industrial, medical, and AI-driven applications.
Do you have a specific project or application in mind where you're considering SMARC? We can help with Free Compatibility Check and help you with quick time-to-market
For more information: https://www.toradex.com/computer-on-modules/smarc-arm-family
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
What is Model Context Protocol(MCP) - The new technology for communication bw...Vishnu Singh Chundawat
The MCP (Model Context Protocol) is a framework designed to manage context and interaction within complex systems. This SlideShare presentation will provide a detailed overview of the MCP Model, its applications, and how it plays a crucial role in improving communication and decision-making in distributed systems. We will explore the key concepts behind the protocol, including the importance of context, data management, and how this model enhances system adaptability and responsiveness. Ideal for software developers, system architects, and IT professionals, this presentation will offer valuable insights into how the MCP Model can streamline workflows, improve efficiency, and create more intuitive systems for a wide range of use cases.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
3. Netflix’s Container Management Platform
Titus
Scheduling
● Service & batch jobs
● Resource management
Container Execution
● Docker/AWS Integration
● Netflix Infra Support
Service
Job and Fleet Management
Resource Management & Optimization
Container Execution
Integration
Batch
4. ● 1000+ Applications
● Netflix API, NodeJS Backend UI Scripts
● Machine Learning (GPUs) for personalization
● Encoding and Content use cases
● Netflix Studio use cases
● CDN tracking and planning
● Massively parallel CI system
● Data Pipeline routing and SPaaS
● Big Data platform use cases
Growing set of container use cases
Batch
Q4 15
Basic
Service
1Q 16
Production
Service
4Q 16
Customer
Facing
Service
2Q 17
shadow
5. High Level Titus Architecture
Cassandra
Titus Control Plane
● API
● Scheduling
● Job Lifecycle Control
EC2 Autoscaling
Fenzo
container
container
container
docker
Titus Agents
Mesos agent
Docker
Docker Registry
containercontainerUser Containers
AWS Virtual Machines
Mesos
Titus System ServicesBatch/Workflow
Systems
Service
CI/CD
6. Q1 2018 Container Usage
Common
Jobs Launched 176K jobs / day
Different applications 1K+ different images
Regional isolated Titus stacks 7
Services
Single App Cluster Size 5K (real), 12K containers (benchmark)
Agents managed 16K VMs
Batch
Containers launched 430K / day
Agents autoscaled 350K VMs / month
7. Leveraging existing Netflix and AWS Infrastructure
Single consistent cloud environment between VMs and containers
VMVM
EC2
AWSAutoScaler
VMs
Service App
Cloud Platform
(metrics, IPC, health)
VPC
VMVM
Atlas
TitusJobControl
Containers
Service App
Cloud Platform
(metrics, IPC, health)
Eureka Edda
VMVMContainers
Batch App
Cloud Platform
(metrics, IPC, health)
8. Most Native AWS Container Platform
IP per container
● VPC IP, ENI and security group
● Optimized to share ENIs
● ENI pre-attaching, opportunistic batching of IPs (bursty deploys)
IAM Roles and Metadata Endpoint per container
● Container view of 169.254.169.254
Cryptographic identity per container
● Using Amazon instance identity document
Service job container autoscaling
● Using Native AWS Cloudwatch and Autoscaling policies and engine
Application Load Balancing (ALB)
10. Scheduling / Placement
Considering the realities of …
● Docker, Linux, Image Pulling, etc.
● Complex resources (ENIs)
● Amazon rate limiting
● Filtering (constraints) and ranking (fitness)
● Different profiles for service | batch, critical | operational, etc.
Reliability
Provisioning
Time
Cost
Trade
offs
11. Capacity Management
User configures “capacity groups” based on workload type
Critical (RIs)
● Preallocated instances in order to achieve low provisioning time
● Buffer to support temporary extra capacity needs for deployments
Flex (On-Demand)
● Autoscaled instances based on demand
Opportunistic (Spot) - Coming
● Utilize extra instances with the ability to preempt or evict the workload
12. Centralized Agent Management
Agent Management
Other subsystems
Health checks
Cluster lifecycle
Other signals
Unified component for tracking agent
information, Powers other systems like task
migration, canaries, agent remediations
Cluster B
Agents states =
schedulable
For example: Task Migration
Cluster A
Agent state =
non-schedulable,
drain tasks
Agent Management
Task Migration
Cluster state
15. Multi-tenant networking is hard
Decided early on we wanted full IP stacks per container
But what about?
● Security group support
● IAM role support
● Network bandwidth isolation
● Leverage VPC
16. Virtual Machine Host
Titus Networking
sg=A,B
IP 2
sg=B,C
IP 3
Metadata
service
IPVlan, BPF, IFBs to route app traffic
Container 1 Container 2
sg=A,B
IP 4
Container 3
eth 0
sg=Titus
control plane
eth1
sg=A,B
eth2
sg=B,C
eth-mdeth-md
Titus executor
eth0eth0eth0
IP 2
IP 4
IP 3
IP 1
Metadata
service
eth-md
Metadata
service
169.254.169.254
17. Next challenge: Speed limits of EC2 Networking
Largest EC2 challenge: speed of networking reconfiguration
Changes in how we work with EC2 API’s
● Work with Amazon to redefine networking related API rate limits, buckets
● Pre-attach all networking interfaces
● IPs are asked for in bulk opportunistically
Also, coordination with scheduler
● Prefer instances with containers already in the same security group
For large scale failovers
● Before … hours, after ... minutes
18. ● Detection - health checks
○ Linux subsystems (systemd, filesystems)
○ Docker aspects (runtime health, registry pulls)
○ Titus processes (networking, GPU, security drivers)
○ Mesos aspects (agent, executor)
● Remediation
○ Local reconciliation
○ Docker image cleanup
Overcoming failures on each agent
19. Process Model Evolution
Single process containers
● Worked for some time, until we needed system services
System services
● Telemetry, IAM support, log uploading
● Added as host installed daemons; isolation & multi-tenancy concerns
● Currently injecting system services into containers
Composing system services into containers
● Considered pods; lifecycle and usage complexities limited value
● Considering future of both systemd and docker image composability
20. Resource Isolation
● CPU
○ Started with bursting; was interfering with predictability
○ Resource tiers to avoid interference problems
● Memory
○ Hard limit, OOM kills entire container
● Network
○ Bandwidth throttling
● Disk space
● GPUs
21. Security Isolation
● Deployed user namespaces
○ Challenging due to shared systems without UID shifting
● Needed ad hoc debugging
○ Titus-ssh for user level access to their container
○ Still required power user access for kernel functions
○ Working to automate through tools like Vector (NetflixOSS)
● Seccomp overhead and complexity is prohibitive
○ Working towards automated policies and BPF driven implementations
22. Open Sourcing
Currently in private open source collaboration with those who want ...
● The NetflixOSS container solution (Spinnaker + Titus + Netflix RPC)
● A unified batch and service Mesos scheduler
● More robust & native AWS container platform
Hope to fully open source in early Q2
● If you want access now, let us know
● Looking for collaborators, feedback