0% found this document useful (0 votes)

18 views

Leveraging MLOps and DataOps To Operationalize ML and AI

The document summarizes a presentation on leveraging MLOps and DataOps to operationalize machine learning and AI. It discusses how 50% of ML projects may not be fully deployed by 2021 due to difficulties taking models from development to revenue-generating products and services. The presentation covers ML workflows and pipelines, challenges in deploying ML, and recommendations to address problems deploying models in production environments.

Uploaded by

emre

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Leveraging MLOps and DataOps To Operationalize ML and AI

Uploaded by

emre

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Gartner Catalyst Conference

12 – 15 August 2019 / San Diego, CA

Don’t Stumble at the Last

Mile: Leveraging MLOps
and DataOps to
Operationalize ML and AI
Sumit Pal

© 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form
without Gartner’s prior written permission. It consists of the opinions of Gartner’s research organization, which should not be construed as statements of fact. While the information contained in this
publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research
may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are
governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or
influence from any third party. For further information, see “Guiding Principles on Independence and Objectivity.”
1 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Don’t Be — Bankrupt in 45 Minutes
Knightmare: A DevOps Cautionary Tale

Knight Capital Group $460M loss in 45 minutes

DevOps, DataOps and MLOps Problem

Systems were NOT setup for the risk they were exposed to.

Processes were inherently prone to error.

Deployment process relied on Humans

2 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
By 2021, at least 50% of machine
learning projects will not be fully
deployed.

3 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Gartner Client Question

Why would 50% ML and data science solutions

not be deployed successfully?

ML is not Easy
End result models are built that aren’t being turned into revenue-
generating products and services
• Bootcamps/Courses are great for learning how to build and train models,
• Don’t teach how to take them to the next step.

4 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Agenda

ML Workflow ML Pipeline Why ML Is Difficult ML Missing Pieces

Problems and
Solutions When Research
Deploying ML Tools Recommendations Recommendations

5 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
ML Workflow

1
Business 2 Data
Understanding Understanding

3 Data
6 Preparation
Deployment

Data 4
Modeling

5
Evaluation

6 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
7 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Machine Learning Workflow
Build
Problem Data Data
Statement Collection
EDA Engineering DataOps

Train
Model Model Model
Training Evaluation Tuning

MLOps
Deploy
Model Model
Deployment Monitoring

8 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
ML Pipeline
Data Processing
(Feature Engineering)

Processing Engine

Transformation Normalization Cleaning and

Encoding
Execution Deployment
Data Ingestion

ERP
Databases

Preprocessing Sample Training/

Stream Data Selection Testing Set
Processing
Platform

Experimentation Testing Tuning

Mainframe
Model Engineering
Batch Data
Warehouse Machine Algorithms
IoT
Devices Data Storage

Clustering Algorithm Learning Algorithm

Execution

9 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Why ML Is Difficult

• “Lack of clear abstraction barriers.”

• Debugging is harder.
• “Non-modularity”— if you change anything, you end up changing
everything.
• “Non-stationarity”— the need to account for new data.
• “What is produced is a black box — you can peek in a little bit, we have
some idea of what’s going on, but not a complete idea.”
• Reproducibility is extremely difficult.

10 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Why ML Is Difficult

• Scaling the model-training and serving process. How can we reliably

and repeatedly take our models from our laptop to production?
• Keeping track of multiple experiments with different hyper-parameters.
• Reproducing the results and retraining models in a predictive manner.
• Keeping track of different models and their model performance
over time (i.e. model drift).
• Dynamically retraining models with new data and rollback models.

11 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Software Development and ML Development

Software Development ML Development

12 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
ML Missing Pieces
Data Processing
(Feature Engineering)

Processing Engine Code and

Packaging Control

Transformation Normalization Cleaning and

Encoding
Execution Deployment
Data Ingestion

ERP
Databases

Preprocessing Sample Training/

Stream Data Selection Testing Set
Processing
Platform

Experimentation Testing Tuning

Mainframe
Model Engineering
Batch Data
Warehouse Machine Algorithms
IoT
Devices Data Storage

Performance
Clustering Algorithm Learning Algorithm
Execution
Management

Feedback Loop
Model
Data Versioning
13 Management
© 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Properties of DS / ML System
Reproducible builds

Ability to run the entire stack locally for development

Local, Continuous Integration/Test (CI/T), Staging, Production environments

identical

Production data (inputs or outputs) is versioned and queryable later on

Trace production data through the system

14 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Problem 1 — Works on My Machine

15 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
How to Solve — Problem 1

Track Code

Track Environment

Packaging

Consistent Environment and Consistent Packaging

16 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 1 — Works on My Machine
Run
Data Processing
(Feature Engineering) Tests

Processing Engine GIT CI/CD Dockerize

Docker Registry
Transformation Normalization Cleaning and
Encoding
Execution Deployment
Data Ingestion

ERP
Databases

Preprocessing Sample Training/

Stream Data Selection Testing Set
Processing
Platform

Experimentation Testing Tuning

Mainframe
Model Engineering
Batch Data
Warehouse Machine Algorithms
IoT
Devices Data Storage

Clustering Algorithm Learning Algorithm

Execution

17 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Problem 2 — What Happens When a Model Is
Deployed and It Doesn’t Work

Who Tracks Models ?

Who Keeps Tracks of All Experimentations You Do ?

Can You Search Your Models ?

Can You Reproduce Your Models ?

18 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
How to Solve — Problem 2

Model Storage — Model Repository — Track, Store, Index,

Searchable Models — Model Versioning

Collaborative Environments for Model Development —

Integrated to the Model Storage

Most recent model— Which Environment Was the Most Model

Trained and Developed

Track Hyperparameters — How do you keep track of hyperparameters

which were experimented with

19 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Model Variables

Model Has 2 Components

Model Artifacts — Parameters Associated

Model Code + Model Image/Container
With the Model

Dockerize
Model Code Model Container
Image

Hyperparameters

20 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 2 — Model Tracking Run
Tests

GIT CI/CD Dockerize

Docker Registry

Model Export

21 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 2 — Model Tracking

22 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 2 — Model Management - What
Who trained the model

Start and end time of the training job

Full model configuration (features used, hyper-parameter values, etc.)

Reference to training and test data sets

Distribution and relative importance of each feature

Full learned parameters of the model

23 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Problem 3 — How to Replicate Model Behavior

Have you kept track of the data on which you trained/tested

and validated ?

Was that data versioned ?

Can you Reproduce your Models ?

24 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
How to Solve — Problem 3

Need a Data Versioning System

Version All Your Data Which You Used for Training

Version — Training Data

Version — Validation Data, Version — Testing Data

GIT CI/CD Dockerize

Docker Registry

Model Export

Data Versioning

Training Validation Test

Model Issues Post Deployment

What Doesn’t run Doesn’t run

Doesn’t Run
Happened? “good” enough “fast” enough

Data
Root Wrong Code / Wrong Model More
Data Drift Pipeline
Cause? Parameters Environment Drift? Compute
Issue
/
Wrong Data
/
Wrong Schema

How do you know if your models are keeping up with data drift ?

How do you measure model and data drift ?

Do you know ⎯ .01% error thresholds can result in millions

of $ lost revenue ?

• What aspects of the model are important to watch ?

• What statistical/metrics identifies the quality of output of the models ?
• What is unacceptable output ?
• What is the threshold that defined an unacceptable output ?
• What is the course of action when the threshold is reached ?
• Who is responsible for taking action ?

Need auditing, performance management system

• Define aspects of the model that are important to watch

• Continuous measurement of model accuracy, and compare with thresholds

Define unacceptable thresholds

Define actions to be taken when thresholds are reached

Docker Registry
Data and Feature
Engineering Pipeline REST APIs

Feature
Feature Vector
Storage
Real- Time Engineering Model
Data Pipeline

SQL

Model Registry

Alerts/Trigger new
Data Drift Container Continuous
Monitoring Monitoring
model re-build/
Registry Metadata
rollback
Hyperparameters Model Servicing

Model
Metrics Monitoring
30 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. Repository
31 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Orchestrate/Automate Solutions 1 to 4
CI/CT
Data

Code Dockerized Model Data

Build Test Dockerize Sampled Data
Repository Training Code Repository
Model
Training Validation dataset
Bug Fixes
Hyperparameter
Tuning
Code
Change Model
Repository
Dockerized Model
Detect Data Drift Deployment
Generate Model Endpoint

Data Model Log

Results
Inference Results

Continuous
Monitoring/Performance
Management

Detect Model Deviation

How much calendar time should be deployed to a model from staging

to production?

How much calendar time to add a new feature to the production model?

How long to take to do an end-to-end testing of models?

33 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
New Problems With ML Models
Model Interpretability — Model Governance — GDPR

Model localization

Federated Learning – ML at the edge on devices

Audit Algorithms to ensure Fairness

34 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Tools
Problem 4 — Model
Problem 2 — Model Problem 3 — Data
Problem 1 — Code Version Serving/Perf
Management Versioning
Management/Ochestration
• Git • ModelDB • Palantir Foundry • Netflix (Meson)
• BitBucket • MLFlow • Databricks Delta Lake • Seldon
• Docker • Scared • DVC • PredictionIO
• Kubernetes • AWS System Manager • Workflow • Tensorflow Serving
Parameter Store • Pachyderm • Vertex.AI
• Google Subpar
Keras Tuner, Training (Airbnb)) • Mleap • Numericcal
• Facebook (XARs)
BigQueue, MLMD, Arbiter, • Quilt • Datatron
• TeamCity, Jenkins, GitLab
Aginity, Algorithmia, Anodot, • Immuta • Hydrosphere.io
• d6tStack (Pandas) Hydrosphere.io, ParallelM • GIT-LFS • Alteryx Promote
• d6tJoin, d6tFlow, d6tPipe Neptune, MLPerf, Iterative.ai • Oracle GraphPipe
Datamo, Google-Lucid,
Comet

SageMaker, Dataiku, Databricks, Determined AI, Uber (Michaelangelo), Airbnb (Bighead), Facebook (FBlearner Flow),
TensorFlow Extended, Polyaxon, dotData, Uber (Manifold), Hadoop (Submarine), Domino (Launchpad), MLAutomator,
DeepThought, Python (d6tflow)

35 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Practical Recommendations on How to Start …
Think operationalization — don’t just think about algorithms
and frameworks
Version control code/data/models
Establish CI/CD/CM
Canary Releases, automated deployments and testing frameworks
Capture Data Anomalies Early
Automate Data Validation
Data Errors Same Rigor as Code
Continuous Training

36 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Notebooks Bane or Boon
Notebooks promote - Bad Habits – Dump all files into one Directory
Notebooks - Code and Output gets mixed up
Notebooks - Don’t version control well

 A Guidance Framework for Operationalizing Machine Learning

for AI
Soyeb Barot (G00366587)
 Building a Framework for Managing Effective Machine Learning
Workloads
Sumit Agarwal (G00384678)
 Operationalizing Big Data Workloads
Sumit Pal (G00360371)

For information, please contact your Gartner representative.

Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
No ratings yet
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
2 pages
32d - BI21 - The Future of Data Science and Machine Learning CR - 1534748
No ratings yet
32d - BI21 - The Future of Data Science and Machine Learning CR - 1534748
27 pages
Mlops: Continuous Delivery and Automation Pipelines in Machine Learning
100% (1)
Mlops: Continuous Delivery and Automation Pipelines in Machine Learning
14 pages
Solving Enterprise ML 5 Challenges
No ratings yet
Solving Enterprise ML 5 Challenges
20 pages
Foundry Databricks 40822 Tech Dossier Final v2 7.26
No ratings yet
Foundry Databricks 40822 Tech Dossier Final v2 7.26
10 pages
Webinar Slides Mlops
100% (1)
Webinar Slides Mlops
35 pages
AWS MLOps Slides
No ratings yet
AWS MLOps Slides
185 pages
DevOps For AI-IEEE
No ratings yet
DevOps For AI-IEEE
6 pages
Tendencias en Inteligencia Artifiial y Maquinas
No ratings yet
Tendencias en Inteligencia Artifiial y Maquinas
58 pages
Tecton - The State of Applied ML 2023
No ratings yet
Tecton - The State of Applied ML 2023
30 pages
The Ai Playbook Siegel en 48585
No ratings yet
The Ai Playbook Siegel en 48585
6 pages
Data Ops
100% (1)
Data Ops
26 pages
AI - Introducing ModelOps To Operationalize AI
No ratings yet
AI - Introducing ModelOps To Operationalize AI
16 pages
MLOps Continuous Delivery For ML On AWS
No ratings yet
MLOps Continuous Delivery For ML On AWS
69 pages
Day - 6 - WONotes
No ratings yet
Day - 6 - WONotes
11 pages
lsc38 - c10 - Reasons Why Your Ai Project Will Fail and What To - 607522 - 342735
No ratings yet
lsc38 - c10 - Reasons Why Your Ai Project Will Fail and What To - 607522 - 342735
30 pages
Research Roles and Skills To Support Advanced Analytics and Ai Initiatives
No ratings yet
Research Roles and Skills To Support Advanced Analytics and Ai Initiatives
39 pages
ML at Scale Ebook
No ratings yet
ML at Scale Ebook
14 pages
REPEAT_1_Starting_the_enterprise_ML_journey,_featuring_ProSiebenSat.1_Media_SE_AIM205-R1
No ratings yet
REPEAT_1_Starting_the_enterprise_ML_journey,_featuring_ProSiebenSat.1_Media_SE_AIM205-R1
62 pages
6 Steps To Machine Learning Success
No ratings yet
6 Steps To Machine Learning Success
22 pages
Slide Deck - Session 3
No ratings yet
Slide Deck - Session 3
43 pages
Gartner_Data_AI
No ratings yet
Gartner_Data_AI
33 pages
Lecture 8 - Lifecycle of A Data Science Project - Part 2
No ratings yet
Lecture 8 - Lifecycle of A Data Science Project - Part 2
43 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
Gartner - The Future of Data Science, Machine Learning and AI
No ratings yet
Gartner - The Future of Data Science, Machine Learning and AI
40 pages
Certified Artificial Intelligence Practitioner 1
No ratings yet
Certified Artificial Intelligence Practitioner 1
43 pages
AI Talk PDF
No ratings yet
AI Talk PDF
29 pages
MLOps
No ratings yet
MLOps
16 pages
C2 - W1 Mlopssadsa
No ratings yet
C2 - W1 Mlopssadsa
111 pages
Mlops Productionalization Brochure
No ratings yet
Mlops Productionalization Brochure
7 pages
ML Midterm Cheatsheet
No ratings yet
ML Midterm Cheatsheet
2 pages
Presentation 1
No ratings yet
Presentation 1
5 pages
How To Become A Product Manager For AI - ML Products
No ratings yet
How To Become A Product Manager For AI - ML Products
17 pages
The 5 Steps To Successfully Pilot
No ratings yet
The 5 Steps To Successfully Pilot
35 pages
AWS Prescriptive Guidance - Planning For Successful MLOps
No ratings yet
AWS Prescriptive Guidance - Planning For Successful MLOps
21 pages
Accelerate Machine Learning Innovation With The Right Cloud Services and Infrastructure
No ratings yet
Accelerate Machine Learning Innovation With The Right Cloud Services and Infrastructure
17 pages
ML Ops White Paper 4
No ratings yet
ML Ops White Paper 4
17 pages
BBIL7 - W3 - Workshop Scaling Self-Service Data and Analytics - 379441
No ratings yet
BBIL7 - W3 - Workshop Scaling Self-Service Data and Analytics - 379441
11 pages
Machine Learning Operations
No ratings yet
Machine Learning Operations
11 pages
GenAI SecurityRisks
No ratings yet
GenAI SecurityRisks
53 pages
VIANOPS - Whitepaper 3 16
No ratings yet
VIANOPS - Whitepaper 3 16
18 pages
CT1-MLOPs_S1_2
No ratings yet
CT1-MLOPs_S1_2
68 pages
Build+business+outcomes+with+artificial+intelligence+and+machine+learning+-+Spencer+Marley+and+Aashmeet+Kalra-1
No ratings yet
Build+business+outcomes+with+artificial+intelligence+and+machine+learning+-+Spencer+Marley+and+Aashmeet+Kalra-1
30 pages
9780135350607_Sample
No ratings yet
9780135350607_Sample
53 pages
W11 Ecs7020p
No ratings yet
W11 Ecs7020p
35 pages
Road Map to Become Machine Learning Engineer
No ratings yet
Road Map to Become Machine Learning Engineer
1 page
MLops Concept
No ratings yet
MLops Concept
20 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
36 pages
Tantithamthavorn Et Al_2025
No ratings yet
Tantithamthavorn Et Al_2025
7 pages
MLOps Specialization Course January 2024
No ratings yet
MLOps Specialization Course January 2024
24 pages
The-Ultimate-Guide-to-MLOps-eBook
No ratings yet
The-Ultimate-Guide-to-MLOps-eBook
10 pages
E7fASQoeT6S3wEkKHi k2Q - Machine Learning in The Enterprise - Course Summary
No ratings yet
E7fASQoeT6S3wEkKHi k2Q - Machine Learning in The Enterprise - Course Summary
17 pages
Trailblazing To Success - The Generative AI Opportunity For Tech CEOs
No ratings yet
Trailblazing To Success - The Generative AI Opportunity For Tech CEOs
58 pages
Democratized, Operationalized, Responsible The 3 Keys To Successful AI and ML Outcomes
No ratings yet
Democratized, Operationalized, Responsible The 3 Keys To Successful AI and ML Outcomes
16 pages
Segmentation Dataset
No ratings yet
Segmentation Dataset
41 pages
cs329s 2022 02 Slides MLSD
No ratings yet
cs329s 2022 02 Slides MLSD
99 pages
Lecture+Notes_Intro_to_MLOps_Session3
No ratings yet
Lecture+Notes_Intro_to_MLOps_Session3
8 pages
howtobeagoodmachinelearningpmbygoogleproductmanager-181031104416
No ratings yet
howtobeagoodmachinelearningpmbygoogleproductmanager-181031104416
71 pages
IBM Business Analytics and Cloud Computing: Best Practices for Deploying Cognos Business Intelligence to the IBM Cloud
From Everand
IBM Business Analytics and Cloud Computing: Best Practices for Deploying Cognos Business Intelligence to the IBM Cloud
Anant Jhingran
5/5 (1)
Digital Information Design (DID) Foundation
From Everand
Digital Information Design (DID) Foundation
Brian Johnson
No ratings yet
9
No ratings yet
9
29 pages
DL 4
No ratings yet
DL 4
15 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
151 pages
A Practical Deep Learning-Based Acoustic Side
No ratings yet
A Practical Deep Learning-Based Acoustic Side
21 pages
Hyper-Parameter Optimization: A Review of Algorithms and Applications
No ratings yet
Hyper-Parameter Optimization: A Review of Algorithms and Applications
56 pages
10 - RA - Machine Learning For Large-Scale Crop Yield Forecasting
No ratings yet
10 - RA - Machine Learning For Large-Scale Crop Yield Forecasting
13 pages
ICAART_2024_-paper (1)
No ratings yet
ICAART_2024_-paper (1)
8 pages
Waymo Report
No ratings yet
Waymo Report
30 pages
Kaur2020 Article Hyper-parameterOptimizationOfD
No ratings yet
Kaur2020 Article Hyper-parameterOptimizationOfD
15 pages
Dokumen - Pub Handbook of Evolutionary Machine Learning 9789819938148 9789819938131
No ratings yet
Dokumen - Pub Handbook of Evolutionary Machine Learning 9789819938148 9789819938131
1,052 pages
Prediction of Idiopathic Recurrent Spontaneous Miscarriage Using Machine Learning
No ratings yet
Prediction of Idiopathic Recurrent Spontaneous Miscarriage Using Machine Learning
8 pages
XGBoost Tuning 1597155827
No ratings yet
XGBoost Tuning 1597155827
7 pages
DCN El
No ratings yet
DCN El
17 pages
HEALTH CARE ANALYTICS (All 5 Units Notes)
No ratings yet
HEALTH CARE ANALYTICS (All 5 Units Notes)
63 pages
What Is Hyperparameter Tuning
No ratings yet
What Is Hyperparameter Tuning
2 pages
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
No ratings yet
Intro To Deep Learning With TensorFlow - Introduction To TensorFlow Cheatsheet - Codecademy
8 pages
ML 1 Project
No ratings yet
ML 1 Project
2 pages
A General and Adaptive Robust Loss Function: Jonathan T. Barron Google Research
No ratings yet
A General and Adaptive Robust Loss Function: Jonathan T. Barron Google Research
19 pages
Optimization of Apparel Supply Chain Using Deep Reinforcement Learning
No ratings yet
Optimization of Apparel Supply Chain Using Deep Reinforcement Learning
9 pages
Deepseek LLM
No ratings yet
Deepseek LLM
48 pages
Project Presentation.
No ratings yet
Project Presentation.
19 pages
TNP Portal Using Web Development and Machine Learning
No ratings yet
TNP Portal Using Web Development and Machine Learning
9 pages
Modification and Extension of a Neural Question Answering System with Attention and Feature Variants
No ratings yet
Modification and Extension of a Neural Question Answering System with Attention and Feature Variants
10 pages
Heart Diseases Prediction Using Deep Learning Neural Network Model
No ratings yet
Heart Diseases Prediction Using Deep Learning Neural Network Model
5 pages
Answer: A
No ratings yet
Answer: A
48 pages
CS60010_Deep_NN.pptx (1) 2
No ratings yet
CS60010_Deep_NN.pptx (1) 2
50 pages
A Review On Automated Machine Learning (AutoML) Systems
No ratings yet
A Review On Automated Machine Learning (AutoML) Systems
6 pages
Tutorial: Gaussian Process Models For Machine Learning
No ratings yet
Tutorial: Gaussian Process Models For Machine Learning
35 pages
Deep Learning Courses - Coursera
No ratings yet
Deep Learning Courses - Coursera
3 pages
AI Model Life Cycle
No ratings yet
AI Model Life Cycle
13 pages

Leveraging MLOps and DataOps To Operationalize ML and AI

Uploaded by

Leveraging MLOps and DataOps To Operationalize ML and AI

Uploaded by

Gartner Catalyst Conference

12 – 15 August 2019 / San Diego, CA

Don’t Stumble at the Last

Knight Capital Group $460M loss in 45 minutes

DevOps, DataOps and MLOps Problem

Processes were inherently prone to error.

Deployment process relied on Humans

Why would 50% ML and data science solutions

ML Workflow ML Pipeline Why ML Is Difficult ML Missing Pieces

Transformation Normalization Cleaning and

Preprocessing Sample Training/

Experimentation Testing Tuning

Clustering Algorithm Learning Algorithm

• “Lack of clear abstraction barriers.”

• Scaling the model-training and serving process. How can we reliably

Software Development ML Development

Processing Engine Code and

Transformation Normalization Cleaning and

Preprocessing Sample Training/

Experimentation Testing Tuning

Ability to run the entire stack locally for development

Local, Continuous Integration/Test (CI/T), Staging, Production environments

Production data (inputs or outputs) is versioned and queryable later on

Trace production data through the system

Consistent Environment and Consistent Packaging

Processing Engine GIT CI/CD Dockerize

Preprocessing Sample Training/

Experimentation Testing Tuning

Clustering Algorithm Learning Algorithm

Who Tracks Models ?

Who Keeps Tracks of All Experimentations You Do ?

Can You Search Your Models ?

Can You Reproduce Your Models ?

Model Storage — Model Repository — Track, Store, Index,

Collaborative Environments for Model Development —

Most recent model— Which Environment Was the Most Model

Track Hyperparameters — How do you keep track of hyperparameters

Model Has 2 Components

Model Artifacts — Parameters Associated

GIT CI/CD Dockerize

Start and end time of the training job

Full model configuration (features used, hyper-parameter values, etc.)

Reference to training and test data sets

Distribution and relative importance of each feature

Full learned parameters of the model

Have you kept track of the data on which you trained/tested

Was that data versioned ?

Can you Reproduce your Models ?

Need a Data Versioning System

Version All Your Data Which You Used for Training

Version — Training Data

Version — Validation Data, Version — Testing Data

GIT CI/CD Dockerize

Training Validation Test

Model Issues Post Deployment

What Doesn’t run Doesn’t run

How do you measure model and data drift ?

Do you know ⎯ .01% error thresholds can result in millions

• What aspects of the model are important to watch ?

Need auditing, performance management system

• Define aspects of the model that are important to watch

Define unacceptable thresholds

Define actions to be taken when thresholds are reached

Code Dockerized Model Data

Data Model Log

Detect Model Deviation

How much calendar time should be deployed to a model from staging

How long to take to do an end-to-end testing of models?

Federated Learning – ML at the edge on devices

Audit Algorithms to ensure Fairness

 A Guidance Framework for Operationalizing Machine Learning

For information, please contact your Gartner representative.

You might also like