SlideShare a Scribd company logo
Building an MLOps Stack for
Companies at Reasonable Scale
——————————————————————————————————————
© All Rights Reserved.
Is MLOps a Luxury reserved for AI-first enterprises?
Is MLOps a Luxury?
Intro
85%
11ppl 95
$ k
ML projects
don’t deliver value
Machine Learning Operations (MLOps): Overview,
Definition, and Architecture . https://arxiv.org/abs/
2205.02302
to support an end-
to-end ML workflow
phData “What is the Cost to Deploy and Maintain a
Machine Learning Model?"
- https://www.phdata.io/blog/what-is-the-cost-to-
deploy-and-maintain-a-machine-learning-model/
to deploy & maintain
one ML model
Operationalizing Machine Learning: An Interview
Study - https://arxiv.org/abs/2209.09125
© All Rights Reserved.
Majority of the companies only need MLOps at reasonable scale
Is MLOps a Luxury?
Intro
Adapted from: MLOps Is a Mess But That’s to be Expected
© All Rights Reserved.
MLOps
Ship reliable ML faster
- Principles over Technologies
- Conventions over Configurations
© All Rights Reserved.
01
Ship Reliable ML Faster
02
Principle over Technology
03
Convention over Configuration
04
A reasonable MLOps stack
05
Collab to GCP endpoint Demo
Practical
MLOps
© All Rights Reserved.
Machine Learning + Development + Operations
MLOps
Principles over Technology
Dev Ops
Code
Infra
ML
Data
Model
Data
Model
Code
Adapted from ml-ops.org
© All Rights Reserved.
Technology changes, but good design principles rarely do
MLOps Tooling Landscape
Principles over Technology
© All Rights Reserved.
Principles that will stand the test of time
7 MLOps Principles
Principles over Technology
Compliance
Reproducibility
Versioning
Testing
Iterative Development
Security
Monitoring
Automation
Continuous Deployment
Adapted from ml-ops.org
© All Rights Reserved.
Deciding MLOps stack to ship reliable ML faster
Decision Framework
Conventions over Configurations
Tools to Choose
Fits 80% of my
use case?
Run POC
Yes
Add to Stack
Yes
Ignore
No
No
Critical
operation in
Business?
Yes
No
Reversible
decision?
Expensive?
Yes
© All Rights Reserved.
Individual One Team (< 5 DS) Multiple Teams (> 10 DS)
Infra / Compute Local / Google Collab AWS Cloud Native
Source Control GitHub GitHub GitHub
Data Analysis Notebook on Collab Notebook on JupyterHub Notebook on JupyterHub
Testing Pytest Pytest Pytest + Others
MLOps Stack
Level 1:
Foundation
at reasonable scale
* Package Manager, Containerisation, CLI are foundational items and assumed to be present
© All Rights Reserved.
Individual One Team (< 5 DS) Multiple Teams (> 10 DS)
Infra / Compute Local / Google Collab AWS Cloud Native
Source Control GitHub GitHub GitHub
Data ingestion Reading CSVs dbt / Snowflake dbt / Snowflake
Data Analysis Notebook on Collab Notebook on JupyterHub Notebook on JupyterHub
Experimentation
(with HP / NII)
Ploomber /
Spreadsheets
MLFlow + Ray Tune Kubeflow
Testing Pytest Pytest Pytest + Others
Data Versioning - dvt / Pachyderm dvt / Pachyderm
Pipeline Orchestration Cron / Bash Scripts Airflow Kubeflow Pipeline
MLOps Stack
Level 2:
Basic
at reasonable scale
* Package Manager, Containerisation, CLI are foundational items and assumed to be present
© All Rights Reserved.
Individual One Team (< 5 DS) Multiple Teams (> 10 DS)
Infra / Compute Local / Google Collab AWS Cloud Native
Source Control GitHub GitHub GitHub / Gitlab
Data ingestion Reading CSVs dbt / Snowflake dbt / Snowflake
Data Analysis Notebook on Collab Notebook on JupyterHub Notebook on JupyterHub
Experimentation Ploomber / Spreadsheets MLFlow + Ray Tune Kubeflow
Data Versioning - dvt / Pachyderm dvt / Pachyderm
Testing Pytest Pytest Pytest + Others
Pipeline Orchestration Cron / Bash Scripts Airflow Kubeflow
CI/CD Scripts GitHub Actions Jenkins
Model Serving HuggingFace Spaces FastAPI Seldon / kserve
Feature / Model Stores - MLFlow Feast + Kubeflow
Monitoring Console Logs Grafana + Prometheus Arize AI
MLOps Stack
Level 3:
Advanced
at reasonable scale
* Package Manager, Containerisation, CLI are foundational items and assumed to be present
© All Rights Reserved.
Individual One Team (< 5 DS) Multiple Teams (> 10 DS)
Infra / Compute
Source Control
Data ingestion
Data Analysis
Experimentation
Data Versioning
Testing
Pipeline Orchestration
CI/CD
Model Serving
Feature / Model Stores
Monitoring
MLOps Stack
Build your Stack
* Package Manager, Containerisation, CLI are foundational items and assumed to be present
at reasonable scale
Try: https://mymlops.com/
© All Rights Reserved.
Reasonable MLOps
Ship reliable ML faster
85%
11ppl 95
$ k
without
© All Rights Reserved.
21
Some references used to create this presentation
References
Resources
• Melio’s cookiecutter-fastapi (forked from arthurhenrique/cookiecutter-fastapi)
• https://github.com/melio-consulting/cookiecutter-fastapi
• ml-ops.org
• Beyond Jupyter Notebooks: MLOps Environment Setup & First Deployment
• https://www.youtube.com/watch?v=4pkzY95Otm4
• MLOps Stack Canvas:
• https://miro.com/miroverse/mlops-stack-canvas/
• MLOps Is a Mess But That's to be Expected:
• https://www.mihaileric.com/posts/mlops-is-a-mess/
• MLOps at a Reasonable Scale [The Ultimate Guide]:
• https://neptune.ai/blog/mlops-at-reasonable-scale
• Metadata Storage and Management:
• https://mlops.community/learn/metadata-storage-and-management/
Ad

More Related Content

What's hot (20)

MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
Jordan Birdsell
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Red Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform OverviewRed Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform Overview
James Falkner
 
Google Vertex AI
Google Vertex AIGoogle Vertex AI
Google Vertex AI
VikasBisoi
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Gitops: the kubernetes way
Gitops: the kubernetes wayGitops: the kubernetes way
Gitops: the kubernetes way
sparkfabrik
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
Kubeflow
KubeflowKubeflow
Kubeflow
Karane Vieira
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
Winton Winton
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
Cloud Migration Strategy - IT Transformation with Cloud
Cloud Migration Strategy - IT Transformation with CloudCloud Migration Strategy - IT Transformation with Cloud
Cloud Migration Strategy - IT Transformation with Cloud
Blazeclan Technologies Private Limited
 
The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?
Codit
 
Kubernates vs Openshift: What is the difference and comparison between Opensh...
Kubernates vs Openshift: What is the difference and comparison between Opensh...Kubernates vs Openshift: What is the difference and comparison between Opensh...
Kubernates vs Openshift: What is the difference and comparison between Opensh...
jeetendra mandal
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
App Modernization
App ModernizationApp Modernization
App Modernization
PT Datacomm Diangraha
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
cloud-migrations.pptx
cloud-migrations.pptxcloud-migrations.pptx
cloud-migrations.pptx
John Mulhall
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
Jordan Birdsell
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Red Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform OverviewRed Hat OpenShift Container Platform Overview
Red Hat OpenShift Container Platform Overview
James Falkner
 
Google Vertex AI
Google Vertex AIGoogle Vertex AI
Google Vertex AI
VikasBisoi
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Gitops: the kubernetes way
Gitops: the kubernetes wayGitops: the kubernetes way
Gitops: the kubernetes way
sparkfabrik
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
Databricks
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
Winton Winton
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?
Codit
 
Kubernates vs Openshift: What is the difference and comparison between Opensh...
Kubernates vs Openshift: What is the difference and comparison between Opensh...Kubernates vs Openshift: What is the difference and comparison between Opensh...
Kubernates vs Openshift: What is the difference and comparison between Opensh...
jeetendra mandal
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
cloud-migrations.pptx
cloud-migrations.pptxcloud-migrations.pptx
cloud-migrations.pptx
John Mulhall
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 

Similar to Building an MLOps Stack for Companies at Reasonable Scale (20)

Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Animesh Singh
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
Jim Dowling
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructure
rhirschfeld
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
devopsdaysaustin
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and FinetuneAI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
Robert Lemke
 
Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so different
Ryan Dawson
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Luciano Resende
 
All the Ops: DataOps with GitOps for Streaming data on Kafka and Kubernetes
All the Ops: DataOps with GitOps for Streaming data on Kafka and KubernetesAll the Ops: DataOps with GitOps for Streaming data on Kafka and Kubernetes
All the Ops: DataOps with GitOps for Streaming data on Kafka and Kubernetes
DevOps.com
 
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
ActiveWarehouse/ETL - BI & DW for Ruby/RailsActiveWarehouse/ETL - BI & DW for Ruby/Rails
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
Paul Gallagher
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
Bill Liu
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Animesh Singh
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
Jim Dowling
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
OpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid InfrastructureOpenStack Preso: DevOps on Hybrid Infrastructure
OpenStack Preso: DevOps on Hybrid Infrastructure
rhirschfeld
 
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
2016 - Open Mic - IGNITE - Open Infrastructure = ANY Infrastructure
devopsdaysaustin
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
Julien Le Dem
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and FinetuneAI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
Scaleable PHP Applications in Kubernetes
Scaleable PHP Applications in KubernetesScaleable PHP Applications in Kubernetes
Scaleable PHP Applications in Kubernetes
Robert Lemke
 
Why is dev ops for machine learning so different
Why is dev ops for machine learning so differentWhy is dev ops for machine learning so different
Why is dev ops for machine learning so different
Ryan Dawson
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
Nordic APIs
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
Christopher Curtin
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Luciano Resende
 
All the Ops: DataOps with GitOps for Streaming data on Kafka and Kubernetes
All the Ops: DataOps with GitOps for Streaming data on Kafka and KubernetesAll the Ops: DataOps with GitOps for Streaming data on Kafka and Kubernetes
All the Ops: DataOps with GitOps for Streaming data on Kafka and Kubernetes
DevOps.com
 
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
ActiveWarehouse/ETL - BI & DW for Ruby/RailsActiveWarehouse/ETL - BI & DW for Ruby/Rails
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
Paul Gallagher
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
Bill Liu
 
Ad

Recently uploaded (20)

Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Ad

Building an MLOps Stack for Companies at Reasonable Scale

  • 1. Building an MLOps Stack for Companies at Reasonable Scale ——————————————————————————————————————
  • 2. © All Rights Reserved. Is MLOps a Luxury reserved for AI-first enterprises? Is MLOps a Luxury? Intro 85% 11ppl 95 $ k ML projects don’t deliver value Machine Learning Operations (MLOps): Overview, Definition, and Architecture . https://arxiv.org/abs/ 2205.02302 to support an end- to-end ML workflow phData “What is the Cost to Deploy and Maintain a Machine Learning Model?" - https://www.phdata.io/blog/what-is-the-cost-to- deploy-and-maintain-a-machine-learning-model/ to deploy & maintain one ML model Operationalizing Machine Learning: An Interview Study - https://arxiv.org/abs/2209.09125
  • 3. © All Rights Reserved. Majority of the companies only need MLOps at reasonable scale Is MLOps a Luxury? Intro Adapted from: MLOps Is a Mess But That’s to be Expected
  • 4. © All Rights Reserved. MLOps Ship reliable ML faster - Principles over Technologies - Conventions over Configurations
  • 5. © All Rights Reserved. 01 Ship Reliable ML Faster 02 Principle over Technology 03 Convention over Configuration 04 A reasonable MLOps stack 05 Collab to GCP endpoint Demo Practical MLOps
  • 6. © All Rights Reserved. Machine Learning + Development + Operations MLOps Principles over Technology Dev Ops Code Infra ML Data Model Data Model Code Adapted from ml-ops.org
  • 7. © All Rights Reserved. Technology changes, but good design principles rarely do MLOps Tooling Landscape Principles over Technology
  • 8. © All Rights Reserved. Principles that will stand the test of time 7 MLOps Principles Principles over Technology Compliance Reproducibility Versioning Testing Iterative Development Security Monitoring Automation Continuous Deployment Adapted from ml-ops.org
  • 9. © All Rights Reserved. Deciding MLOps stack to ship reliable ML faster Decision Framework Conventions over Configurations Tools to Choose Fits 80% of my use case? Run POC Yes Add to Stack Yes Ignore No No Critical operation in Business? Yes No Reversible decision? Expensive? Yes
  • 10. © All Rights Reserved. Individual One Team (< 5 DS) Multiple Teams (> 10 DS) Infra / Compute Local / Google Collab AWS Cloud Native Source Control GitHub GitHub GitHub Data Analysis Notebook on Collab Notebook on JupyterHub Notebook on JupyterHub Testing Pytest Pytest Pytest + Others MLOps Stack Level 1: Foundation at reasonable scale * Package Manager, Containerisation, CLI are foundational items and assumed to be present
  • 11. © All Rights Reserved. Individual One Team (< 5 DS) Multiple Teams (> 10 DS) Infra / Compute Local / Google Collab AWS Cloud Native Source Control GitHub GitHub GitHub Data ingestion Reading CSVs dbt / Snowflake dbt / Snowflake Data Analysis Notebook on Collab Notebook on JupyterHub Notebook on JupyterHub Experimentation (with HP / NII) Ploomber / Spreadsheets MLFlow + Ray Tune Kubeflow Testing Pytest Pytest Pytest + Others Data Versioning - dvt / Pachyderm dvt / Pachyderm Pipeline Orchestration Cron / Bash Scripts Airflow Kubeflow Pipeline MLOps Stack Level 2: Basic at reasonable scale * Package Manager, Containerisation, CLI are foundational items and assumed to be present
  • 12. © All Rights Reserved. Individual One Team (< 5 DS) Multiple Teams (> 10 DS) Infra / Compute Local / Google Collab AWS Cloud Native Source Control GitHub GitHub GitHub / Gitlab Data ingestion Reading CSVs dbt / Snowflake dbt / Snowflake Data Analysis Notebook on Collab Notebook on JupyterHub Notebook on JupyterHub Experimentation Ploomber / Spreadsheets MLFlow + Ray Tune Kubeflow Data Versioning - dvt / Pachyderm dvt / Pachyderm Testing Pytest Pytest Pytest + Others Pipeline Orchestration Cron / Bash Scripts Airflow Kubeflow CI/CD Scripts GitHub Actions Jenkins Model Serving HuggingFace Spaces FastAPI Seldon / kserve Feature / Model Stores - MLFlow Feast + Kubeflow Monitoring Console Logs Grafana + Prometheus Arize AI MLOps Stack Level 3: Advanced at reasonable scale * Package Manager, Containerisation, CLI are foundational items and assumed to be present
  • 13. © All Rights Reserved. Individual One Team (< 5 DS) Multiple Teams (> 10 DS) Infra / Compute Source Control Data ingestion Data Analysis Experimentation Data Versioning Testing Pipeline Orchestration CI/CD Model Serving Feature / Model Stores Monitoring MLOps Stack Build your Stack * Package Manager, Containerisation, CLI are foundational items and assumed to be present at reasonable scale Try: https://mymlops.com/
  • 14. © All Rights Reserved. Reasonable MLOps Ship reliable ML faster 85% 11ppl 95 $ k without
  • 15. © All Rights Reserved. 21 Some references used to create this presentation References Resources • Melio’s cookiecutter-fastapi (forked from arthurhenrique/cookiecutter-fastapi) • https://github.com/melio-consulting/cookiecutter-fastapi • ml-ops.org • Beyond Jupyter Notebooks: MLOps Environment Setup & First Deployment • https://www.youtube.com/watch?v=4pkzY95Otm4 • MLOps Stack Canvas: • https://miro.com/miroverse/mlops-stack-canvas/ • MLOps Is a Mess But That's to be Expected: • https://www.mihaileric.com/posts/mlops-is-a-mess/ • MLOps at a Reasonable Scale [The Ultimate Guide]: • https://neptune.ai/blog/mlops-at-reasonable-scale • Metadata Storage and Management: • https://mlops.community/learn/metadata-storage-and-management/