SlideShare a Scribd company logo
Containerized architectures for deep learning
Antje Barth @anbarth
Me
Data Enthusiast
Technical Evangelist
AI / ML / Deep Learning
Container / Kubernetes
Big Data
#CodeLikeAGirl
Agenda
โ€ข Motivation
โ€ข ML pipeline tools and platforms
โ€ข Machine Learning on Kubernetes
โ€ข Deep Learning Demo
โ€ข Conclusion
Agenda
โ€ข Motivation
โ€ข ML pipeline tools and platforms
โ€ข Machine Learning on Kubernetes
โ€ข Deep Learning Demo
โ€ข Conclusion
Containerized architectures for deep learning
ML โ€“ Helicopter view
How good are
your predictions?โ€ข Accuracy
โ€ข Optimization
Containerized architectures for deep learning
ML โ€“ The (enterprise) reality
โ€ข Wrangle large datasets
โ€ข Unify disparate systems
โ€ข Composability
โ€ข Manage pipeline complexity
โ€ข Improve training/serving
consistency
โ€ข Improve portability
โ€ข Improve model quality
โ€ข Manage versions
Building
a model
Data
ingestion
Data
analysis
Data
transform
Data
validation
Data
splitting
Ad-hoc
Training
Model
validation
Logging
Roll-out Serving
Monitoring
Distributed
Training
Training
at scale
Data
Versioning
HP Tuning
Experiment
Tracking
Feature
Store
SYSTEM 1
SYSTEM 2 SYSTEM 3
SYSTEM 4 SYSTEM 5
SYSTEM 6
SYSTEM 3.5
SYSTEM 1.5
Containerized architectures for deep learning
The rise of ML pipeline tools & platforms
Agenda
โ€ข Motivation
โ€ข ML pipeline tools and platforms
โ€ข Machine Learning on Kubernetes
โ€ข Deep Learning Demo
โ€ข Conclusion
Quick comparison
Apache Airflow is a
platform to
programmatically author,
schedule and monitor
workflows.
The Kubeflow project is
dedicated to making
deployments of machine
learning (ML) workflows
on Kubernetes simple,
portable and scalable.
TensorFlow Extended
(TFX) is an end-to-end
platform for deploying
production ML pipelines.
MLflow is an open
source platform to
manage the ML lifecycle,
including
experimentation,
reproducibility and
deployment.
https://airflow.apache.org/ https://www.kubeflow.org/
https://www.tensorflow.org/
tfx
https://mlflow.org/
How to scale to production?
Composability
Portability
Scalability
Wait a minuteโ€ฆ
Containerized architectures for deep learning
Virtual Machines
are Computers in a Box
Containers
are Applications in a Box
Containers?
Kubernetes?
{api}
Kubernetes is an API and agents
The Kubernetes API provides containers
with a scheduling, configuration, network,
and storage
The Kubernetes runtime manages the
containers
Agenda
โ€ข Motivation
โ€ข ML pipeline tools and platforms
โ€ข Machine Learning on Kubernetes
โ€ข Deep Learning Demo
โ€ข Conclusion
Machine Learning on Kubernetes
โ€ข Kubernetes-native
โ€ข Run wherever k8s runs
โ€ข Move between local โ€“ dev โ€“ test โ€“ prod โ€“ cloud
โ€ข Use k8s to manage ML tasks
โ€ข CRDs (UDTs) for distributed training
โ€ข Adopt k8s patterns
โ€ข Microservices
โ€ข Manage infrastructure declaratively
โ€ข Support for multiple ML frameworks
โ€ข Tensorflow, Pytorch, Scikit, Xgboost, etc.
Kubernetes ML/DL
Landscape
Source: https://twimlai.com/kubernetes-ebook/
https://landscape.lfai.foundation/
https://landscape.cncf.io/
Introducing Kubeflow
Make it easy for everyone to develop,
deploy, and manage portable, scalable
ML everywhere.
Credits:
Kubeflow components
Credits:
Composability
โ€ข Build and deploy re-usable,
portable, scalable, machine
learning workflows based on
Docker containers.
โ€ข Use the libraries/ frameworks of
your choice
Example:
KubeFlow "deployer" component lets you
deploy as a plain TF Serving model
server:
https://github.com/kubeflow/pipelines/tree/
master/components/kubeflow/deployer
METADATA
SERVING
Back to our ML enterprise workflow!
Building
a model
Data
ingestion
Data
analysis
Data
transform
Data
validation
Data
splitting
Ad-hoc
Training
Model
validation
Logging
Roll-out Serving
Monitoring
Distributed
Training
Training
at scale
Data
Versioning
HP Tuning
Experiment
Tracking
Feature
Store
Portability
Containers for
Deep Learning
Container runtime
Infrastructure
NVIDIA drivers
Host OS
Packages:
TensorFlow
mkl
cudnn
cublas
Nccl
CUDA toolkit
CPU:
GPU:
TensorFlow
Container
Image
Keras
horovod
numpy
scipy
othersโ€ฆ
scikit-
learn
pandas
openmpi
Python
ML environments
that are:
TensorFlow
mkl
cudnn
cublas
Nccl
CUDA toolkit
CPU:
GPU:
TensorFlow
Container
Image
Keras
horovod
numpy
scipy
othersโ€ฆ
scikit-
learn
pandas
openmpi
Python
Container runtime
Development System
NVIDIA drivers
Host OS
Container registry
push
pull
TensorFlow
mkl
cudnn
cublas
Nccl
CUDA toolkit
CPU:
GPU:
TensorFlow
Container
Image
Keras
horovod
numpy
scipy
othersโ€ฆ
scikit-
learn
pandas
openmpi
Python
Container runtime
Training Cluster
NVIDIA drivers
Host OS
Scalability
โ€ข Kubernetes - Autoscaling Jobs
โ€ข Describe the job, let Kubernetes take care of the rest
โ€ข CPU, RAM, Accelerators
โ€ข TF Jobs delete themselves when finished, node pool will auto scale back
down
Model works
great! But I need
six nodes.
Data Scientist IT Ops
Credit: @aronchick
Scalability
โ€ข Kubernetes - Autoscaling Jobs
โ€ข Describe the job, let Kubernetes take care of the rest
โ€ข CPU, RAM, Accelerators
โ€ข TF Jobs delete themselves when finished, node pool will auto scale back
down
Data Scientist IT Ops
apiVersion: "kubeflow.org/v1alpha1"
kind: "TFJob"
spec:
replicaSpecs:
replicas: 6
CPU: 1
GPU: 1
containers: gcr.io/myco/myjob:1.0
Credit: @aronchick
Scalability
โ€ข Kubernetes - Autoscaling Jobs
โ€ข Describe the job, let Kubernetes take care of the rest
โ€ข CPU, RAM, Accelerators
โ€ข TF Jobs delete themselves when finished, node pool will auto scale back
down
Data Scientist IT Ops
GPU GPU GPU
GPU GPU GPU
Credit: @aronchick
Scalability
โ€ข Kubernetes - Autoscaling Jobs
โ€ข Describe the job, let Kubernetes take care of the rest
โ€ข CPU, RAM, Accelerators
โ€ข TF Jobs delete themselves when finished, node pool will auto scale back
down
Jobโ€™s done!
Data Scientist IT Ops
Credit: @aronchick
Agenda
โ€ข Motivation
โ€ข ML pipeline tools and platforms
โ€ข Container > Kubernetes > Kubeflow
โ€ข Deep Learning Demo
โ€ข Conclusion
DEMO โ€œDoppelganger Appโ€
Implementing Image Similarity search
Recap:
The โ€œKubeโ€flow
โ€ข Deploy Kubernetes & Kubeflow
โ€ข Experiment in Jupyter
โ€ข Build Docker Image
โ€ข Train at Scale
โ€ข Build Model Server
โ€ข Deploy Model
โ€ข Integrate Model into App
โ€ข Operate
Model Training Model Serving
Pod
Pod Pod
Kubernetes Worker Nodes
#1 #2 #3
Jupyter
Notebook
Seldon Core
Engine
Seldon Core
Engine
Doppelganger
Model
Doppelganger
Model
Istio Gateway
(Traffic Routing)
{REST API}
curlโ€ฆ
Dockerfile
Training Job
Dockerfile
Inference Service
Data Scientist
Pod
Train
Model
Pod
Train
Model
Agenda
โ€ข Motivation
โ€ข ML pipeline tools and platforms
โ€ข Machine Learning on Kubernetes
โ€ข Deep Learning Demo
โ€ข Conclusion
Conclusion & Take-aways
โ€ข Platform matters
โ€ข Composability โ€“ Portability โ€“ Scalability
โ€ข Containerized architectures
โ€ข Kubernetes + Machine Learning = Kubeflow
โ€ข Start building!
https://github.com/antje/doppelganger
More information
โ€ข Kubeflow
https://www.kubeflow.org/
https://github.com/kubeflow/kubeflow
โ€ข Tensorflow Extended (TFX)
https://www.tensorflow.org/tfx
โ€ข The Definitive Guide to Machine Learning Platforms
https://twimlai.com/mlplatforms-ebook/
โ€ข Amazon Elastic Kubernetes Service (Amazon EKS)
https://eksworkshop.com
https://github.com/aws-samples/machine-learning-using-k8s
Session page on conference website Oโ€™Reilly Events App
Rate todayโ€™s session
Thank you!
antje.official
antje@anbarth
Antje Barth
Ad

More Related Content

What's hot (19)

KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
Animesh Singh
ย 
Kubeflow
KubeflowKubeflow
Kubeflow
Karane Vieira
ย 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflow
Stepan Pushkarev
ย 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
Simba Khadder
ย 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
CodeOps Technologies LLP
ย 
Kubeflow Control Plane ไธญๆ–‡
Kubeflow Control Plane ไธญๆ–‡Kubeflow Control Plane ไธญๆ–‡
Kubeflow Control Plane ไธญๆ–‡
Weiqiang Zhuang
ย 
ODSC webinar "Kubeflow, MLFlow and Beyond โ€” augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond โ€” augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond โ€” augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond โ€” augmenting ML delivery" Stepan Pu...
Provectus
ย 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
Seldon
ย 
DevOps: Kubernetes + Helm with Azure
DevOps: Kubernetes + Helm with AzureDevOps: Kubernetes + Helm with Azure
DevOps: Kubernetes + Helm with Azure
Jessica Deen
ย 
Helm chart-introduction
Helm chart-introductionHelm chart-introduction
Helm chart-introduction
Ganesh Pol
ย 
PR workflow
PR workflowPR workflow
PR workflow
Weiqiang Zhuang
ย 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
cnvrg.io AI OS - Hands-on ML Workshops
ย 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCC
Chris Bunch
ย 
Neptune @ SoCal
Neptune @ SoCalNeptune @ SoCal
Neptune @ SoCal
Chris Bunch
ย 
Knative from an Enterprise Perspective
Knative from an Enterprise PerspectiveKnative from an Enterprise Perspective
Knative from an Enterprise Perspective
QAware GmbH
ย 
Intro to Helm for Kubernetes
Intro to Helm for KubernetesIntro to Helm for Kubernetes
Intro to Helm for Kubernetes
Carlos E. Salazar
ย 
AWS in Practice
AWS in PracticeAWS in Practice
AWS in Practice
Anna Ruokonen
ย 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
Bill Liu
ย 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on Kubeflow
Rustem Zakiev
ย 
KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
Animesh Singh
ย 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflow
Stepan Pushkarev
ย 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
Simba Khadder
ย 
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh SharmaTraining And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
CodeOps Technologies LLP
ย 
Kubeflow Control Plane ไธญๆ–‡
Kubeflow Control Plane ไธญๆ–‡Kubeflow Control Plane ไธญๆ–‡
Kubeflow Control Plane ไธญๆ–‡
Weiqiang Zhuang
ย 
ODSC webinar "Kubeflow, MLFlow and Beyond โ€” augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond โ€” augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond โ€” augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond โ€” augmenting ML delivery" Stepan Pu...
Provectus
ย 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
Seldon
ย 
DevOps: Kubernetes + Helm with Azure
DevOps: Kubernetes + Helm with AzureDevOps: Kubernetes + Helm with Azure
DevOps: Kubernetes + Helm with Azure
Jessica Deen
ย 
Helm chart-introduction
Helm chart-introductionHelm chart-introduction
Helm chart-introduction
Ganesh Pol
ย 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCC
Chris Bunch
ย 
Neptune @ SoCal
Neptune @ SoCalNeptune @ SoCal
Neptune @ SoCal
Chris Bunch
ย 
Knative from an Enterprise Perspective
Knative from an Enterprise PerspectiveKnative from an Enterprise Perspective
Knative from an Enterprise Perspective
QAware GmbH
ย 
Intro to Helm for Kubernetes
Intro to Helm for KubernetesIntro to Helm for Kubernetes
Intro to Helm for Kubernetes
Carlos E. Salazar
ย 
AWS in Practice
AWS in PracticeAWS in Practice
AWS in Practice
Anna Ruokonen
ย 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
Bill Liu
ย 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on Kubeflow
Rustem Zakiev
ย 

Similar to Containerized architectures for deep learning (20)

Deploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with KubernetesDeploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with Kubernetes
cnvrg.io AI OS - Hands-on ML Workshops
ย 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
Docker, Inc.
ย 
MLOps with Kubernetes - Thiago Ramos.pdf
MLOps with Kubernetes - Thiago Ramos.pdfMLOps with Kubernetes - Thiago Ramos.pdf
MLOps with Kubernetes - Thiago Ramos.pdf
ThiagoRamos343326
ย 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Akash Tandon
ย 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
Tushar Katarki
ย 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
Pieter de Bruin
ย 
NextGenML
NextGenML NextGenML
NextGenML
Moldovan Radu Adrian
ย 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
ย 
Kubeflow.pptx
Kubeflow.pptxKubeflow.pptx
Kubeflow.pptx
dhaferbenali1
ย 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Animesh Singh
ย 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
Databricks
ย 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
Henry Saputra
ย 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
ย 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
Codemotion Tel Aviv
ย 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
VMUG IT
ย 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
Datadog
ย 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
IT Arena
ย 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
Saurabh Kaushik
ย 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshell
Eugene Fedorenko
ย 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Jakob Karalus
ย 
Deploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with KubernetesDeploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with Kubernetes
cnvrg.io AI OS - Hands-on ML Workshops
ย 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
Docker, Inc.
ย 
MLOps with Kubernetes - Thiago Ramos.pdf
MLOps with Kubernetes - Thiago Ramos.pdfMLOps with Kubernetes - Thiago Ramos.pdf
MLOps with Kubernetes - Thiago Ramos.pdf
ThiagoRamos343326
ย 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Akash Tandon
ย 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
Tushar Katarki
ย 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
Pieter de Bruin
ย 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
ย 
Kubeflow.pptx
Kubeflow.pptxKubeflow.pptx
Kubeflow.pptx
dhaferbenali1
ย 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Animesh Singh
ย 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
Databricks
ย 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
Henry Saputra
ย 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
ย 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
Codemotion Tel Aviv
ย 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
VMUG IT
ย 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
Datadog
ย 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
IT Arena
ย 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
Saurabh Kaushik
ย 
Containers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshellContainers, Serverless and Functions in a nutshell
Containers, Serverless and Functions in a nutshell
Eugene Fedorenko
ย 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Jakob Karalus
ย 
Ad

Recently uploaded (20)

UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
ย 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
ย 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
ย 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
ย 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
ย 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
ย 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
ย 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
ย 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
ย 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
ย 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
ย 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
ย 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
ย 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
ย 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
ย 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
ย 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
ย 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
ย 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
ย 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
ย 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
ย 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
ย 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
ย 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
ย 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
ย 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
ย 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
ย 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
ย 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
ย 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
ย 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
ย 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
ย 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
ย 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
ย 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
ย 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
ย 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
ย 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
ย 
Ad

Containerized architectures for deep learning

  • 1. Containerized architectures for deep learning Antje Barth @anbarth
  • 2. Me Data Enthusiast Technical Evangelist AI / ML / Deep Learning Container / Kubernetes Big Data #CodeLikeAGirl
  • 3. Agenda โ€ข Motivation โ€ข ML pipeline tools and platforms โ€ข Machine Learning on Kubernetes โ€ข Deep Learning Demo โ€ข Conclusion
  • 4. Agenda โ€ข Motivation โ€ข ML pipeline tools and platforms โ€ข Machine Learning on Kubernetes โ€ข Deep Learning Demo โ€ข Conclusion
  • 6. ML โ€“ Helicopter view How good are your predictions?โ€ข Accuracy โ€ข Optimization
  • 8. ML โ€“ The (enterprise) reality โ€ข Wrangle large datasets โ€ข Unify disparate systems โ€ข Composability โ€ข Manage pipeline complexity โ€ข Improve training/serving consistency โ€ข Improve portability โ€ข Improve model quality โ€ข Manage versions Building a model Data ingestion Data analysis Data transform Data validation Data splitting Ad-hoc Training Model validation Logging Roll-out Serving Monitoring Distributed Training Training at scale Data Versioning HP Tuning Experiment Tracking Feature Store SYSTEM 1 SYSTEM 2 SYSTEM 3 SYSTEM 4 SYSTEM 5 SYSTEM 6 SYSTEM 3.5 SYSTEM 1.5
  • 10. The rise of ML pipeline tools & platforms
  • 11. Agenda โ€ข Motivation โ€ข ML pipeline tools and platforms โ€ข Machine Learning on Kubernetes โ€ข Deep Learning Demo โ€ข Conclusion
  • 12. Quick comparison Apache Airflow is a platform to programmatically author, schedule and monitor workflows. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. https://airflow.apache.org/ https://www.kubeflow.org/ https://www.tensorflow.org/ tfx https://mlflow.org/
  • 13. How to scale to production? Composability Portability Scalability
  • 16. Virtual Machines are Computers in a Box Containers are Applications in a Box Containers?
  • 18. Kubernetes is an API and agents The Kubernetes API provides containers with a scheduling, configuration, network, and storage The Kubernetes runtime manages the containers
  • 19. Agenda โ€ข Motivation โ€ข ML pipeline tools and platforms โ€ข Machine Learning on Kubernetes โ€ข Deep Learning Demo โ€ข Conclusion
  • 20. Machine Learning on Kubernetes โ€ข Kubernetes-native โ€ข Run wherever k8s runs โ€ข Move between local โ€“ dev โ€“ test โ€“ prod โ€“ cloud โ€ข Use k8s to manage ML tasks โ€ข CRDs (UDTs) for distributed training โ€ข Adopt k8s patterns โ€ข Microservices โ€ข Manage infrastructure declaratively โ€ข Support for multiple ML frameworks โ€ข Tensorflow, Pytorch, Scikit, Xgboost, etc.
  • 22. Introducing Kubeflow Make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere.
  • 24. Composability โ€ข Build and deploy re-usable, portable, scalable, machine learning workflows based on Docker containers. โ€ข Use the libraries/ frameworks of your choice Example: KubeFlow "deployer" component lets you deploy as a plain TF Serving model server: https://github.com/kubeflow/pipelines/tree/ master/components/kubeflow/deployer
  • 25. METADATA SERVING Back to our ML enterprise workflow! Building a model Data ingestion Data analysis Data transform Data validation Data splitting Ad-hoc Training Model validation Logging Roll-out Serving Monitoring Distributed Training Training at scale Data Versioning HP Tuning Experiment Tracking Feature Store
  • 26. Portability Containers for Deep Learning Container runtime Infrastructure NVIDIA drivers Host OS Packages: TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy othersโ€ฆ scikit- learn pandas openmpi Python ML environments that are:
  • 27. TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy othersโ€ฆ scikit- learn pandas openmpi Python Container runtime Development System NVIDIA drivers Host OS Container registry push pull TensorFlow mkl cudnn cublas Nccl CUDA toolkit CPU: GPU: TensorFlow Container Image Keras horovod numpy scipy othersโ€ฆ scikit- learn pandas openmpi Python Container runtime Training Cluster NVIDIA drivers Host OS
  • 28. Scalability โ€ข Kubernetes - Autoscaling Jobs โ€ข Describe the job, let Kubernetes take care of the rest โ€ข CPU, RAM, Accelerators โ€ข TF Jobs delete themselves when finished, node pool will auto scale back down Model works great! But I need six nodes. Data Scientist IT Ops Credit: @aronchick
  • 29. Scalability โ€ข Kubernetes - Autoscaling Jobs โ€ข Describe the job, let Kubernetes take care of the rest โ€ข CPU, RAM, Accelerators โ€ข TF Jobs delete themselves when finished, node pool will auto scale back down Data Scientist IT Ops apiVersion: "kubeflow.org/v1alpha1" kind: "TFJob" spec: replicaSpecs: replicas: 6 CPU: 1 GPU: 1 containers: gcr.io/myco/myjob:1.0 Credit: @aronchick
  • 30. Scalability โ€ข Kubernetes - Autoscaling Jobs โ€ข Describe the job, let Kubernetes take care of the rest โ€ข CPU, RAM, Accelerators โ€ข TF Jobs delete themselves when finished, node pool will auto scale back down Data Scientist IT Ops GPU GPU GPU GPU GPU GPU Credit: @aronchick
  • 31. Scalability โ€ข Kubernetes - Autoscaling Jobs โ€ข Describe the job, let Kubernetes take care of the rest โ€ข CPU, RAM, Accelerators โ€ข TF Jobs delete themselves when finished, node pool will auto scale back down Jobโ€™s done! Data Scientist IT Ops Credit: @aronchick
  • 32. Agenda โ€ข Motivation โ€ข ML pipeline tools and platforms โ€ข Container > Kubernetes > Kubeflow โ€ข Deep Learning Demo โ€ข Conclusion
  • 35. Recap: The โ€œKubeโ€flow โ€ข Deploy Kubernetes & Kubeflow โ€ข Experiment in Jupyter โ€ข Build Docker Image โ€ข Train at Scale โ€ข Build Model Server โ€ข Deploy Model โ€ข Integrate Model into App โ€ข Operate Model Training Model Serving Pod Pod Pod Kubernetes Worker Nodes #1 #2 #3 Jupyter Notebook Seldon Core Engine Seldon Core Engine Doppelganger Model Doppelganger Model Istio Gateway (Traffic Routing) {REST API} curlโ€ฆ Dockerfile Training Job Dockerfile Inference Service Data Scientist Pod Train Model Pod Train Model
  • 36. Agenda โ€ข Motivation โ€ข ML pipeline tools and platforms โ€ข Machine Learning on Kubernetes โ€ข Deep Learning Demo โ€ข Conclusion
  • 37. Conclusion & Take-aways โ€ข Platform matters โ€ข Composability โ€“ Portability โ€“ Scalability โ€ข Containerized architectures โ€ข Kubernetes + Machine Learning = Kubeflow โ€ข Start building! https://github.com/antje/doppelganger
  • 38. More information โ€ข Kubeflow https://www.kubeflow.org/ https://github.com/kubeflow/kubeflow โ€ข Tensorflow Extended (TFX) https://www.tensorflow.org/tfx โ€ข The Definitive Guide to Machine Learning Platforms https://twimlai.com/mlplatforms-ebook/ โ€ข Amazon Elastic Kubernetes Service (Amazon EKS) https://eksworkshop.com https://github.com/aws-samples/machine-learning-using-k8s
  • 39. Session page on conference website Oโ€™Reilly Events App Rate todayโ€™s session