Leveraging MLOps and DataOps To Operationalize ML and AI
Leveraging MLOps and DataOps To Operationalize ML and AI
© 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form
without Gartner’s prior written permission. It consists of the opinions of Gartner’s research organization, which should not be construed as statements of fact. While the information contained in this
publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research
may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are
governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or
influence from any third party. For further information, see “Guiding Principles on Independence and Objectivity.”
1 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Don’t Be — Bankrupt in 45 Minutes
Knightmare: A DevOps Cautionary Tale
Systems were NOT setup for the risk they were exposed to.
3 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Gartner Client Question
ML is not Easy
End result models are built that aren’t being turned into revenue-
generating products and services
• Bootcamps/Courses are great for learning how to build and train models,
• Don’t teach how to take them to the next step.
4 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Agenda
Problems and
Solutions When Research
Deploying ML Tools Recommendations Recommendations
5 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
ML Workflow
1
Business 2 Data
Understanding Understanding
3 Data
6 Preparation
Deployment
Data 4
Modeling
5
Evaluation
6 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
7 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Machine Learning Workflow
Build
Problem Data Data
Statement Collection
EDA Engineering DataOps
Train
Model Model Model
Training Evaluation Tuning
MLOps
Deploy
Model Model
Deployment Monitoring
8 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
ML Pipeline
Data Processing
(Feature Engineering)
Processing Engine
ERP
Databases
9 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Why ML Is Difficult
10 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Why ML Is Difficult
11 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Software Development and ML Development
12 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
ML Missing Pieces
Data Processing
(Feature Engineering)
ERP
Databases
Performance
Clustering Algorithm Learning Algorithm
Execution
Management
Feedback Loop
Model
Data Versioning
13 Management
© 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Properties of DS / ML System
Reproducible builds
14 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Problem 1 — Works on My Machine
15 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
How to Solve — Problem 1
Track Code
Track Environment
Packaging
16 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 1 — Works on My Machine
Run
Data Processing
(Feature Engineering) Tests
Docker Registry
Transformation Normalization Cleaning and
Encoding
Execution Deployment
Data Ingestion
ERP
Databases
Mainframe
Model Engineering
Batch Data
Warehouse Machine Algorithms
IoT
Devices Data Storage
17 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Problem 2 — What Happens When a Model Is
Deployed and It Doesn’t Work
18 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
How to Solve — Problem 2
19 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Model Variables
Dockerize
Model Code Model Container
Image
Hyperparameters
20 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 2 — Model Tracking Run
Tests
Docker Registry
Model Export
21 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 2 — Model Tracking
22 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 2 — Model Management - What
Who trained the model
23 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Problem 3 — How to Replicate Model Behavior
24 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
How to Solve — Problem 3
25 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 3 — Version Data Run
Tests
Docker Registry
Model Export
Data Versioning
26 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Problem 4 — Drift (Data and Model)
Data
Root Wrong Code / Wrong Model More
Data Drift Pipeline
Cause? Parameters Environment Drift? Compute
Issue
/
Wrong Data
/
Wrong Schema
27 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Problem 4 — Drift (Data and Model)
How do you know if your models are keeping up with data drift ?
28 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
How to Solve — Problem 4
29 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Solution 4 — Model Performance Management
Docker Registry
Data and Feature
Engineering Pipeline REST APIs
Feature
Feature Vector
Storage
Real- Time Engineering Model
Data Pipeline
SQL
Model Registry
Alerts/Trigger new
Data Drift Container Continuous
Monitoring Monitoring
model re-build/
Registry Metadata
rollback
Hyperparameters Model Servicing
Model
Metrics Monitoring
30 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. Repository
31 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Orchestrate/Automate Solutions 1 to 4
CI/CT
Data
Continuous
Monitoring/Performance
Management
32 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Other Things to Measure
How much calendar time to add a new feature to the production model?
33 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
New Problems With ML Models
Model Interpretability — Model Governance — GDPR
Model localization
34 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Tools
Problem 4 — Model
Problem 2 — Model Problem 3 — Data
Problem 1 — Code Version Serving/Perf
Management Versioning
Management/Ochestration
• Git • ModelDB • Palantir Foundry • Netflix (Meson)
• BitBucket • MLFlow • Databricks Delta Lake • Seldon
• Docker • Scared • DVC • PredictionIO
• Kubernetes • AWS System Manager • Workflow • Tensorflow Serving
Parameter Store • Pachyderm • Vertex.AI
• Google Subpar
Keras Tuner, Training (Airbnb)) • Mleap • Numericcal
• Facebook (XARs)
BigQueue, MLMD, Arbiter, • Quilt • Datatron
• TeamCity, Jenkins, GitLab
Aginity, Algorithmia, Anodot, • Immuta • Hydrosphere.io
• d6tStack (Pandas) Hydrosphere.io, ParallelM • GIT-LFS • Alteryx Promote
• d6tJoin, d6tFlow, d6tPipe Neptune, MLPerf, Iterative.ai • Oracle GraphPipe
Datamo, Google-Lucid,
Comet
SageMaker, Dataiku, Databricks, Determined AI, Uber (Michaelangelo), Airbnb (Bighead), Facebook (FBlearner Flow),
TensorFlow Extended, Polyaxon, dotData, Uber (Manifold), Hadoop (Submarine), Domino (Launchpad), MLAutomator,
DeepThought, Python (d6tflow)
35 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Practical Recommendations on How to Start …
Think operationalization — don’t just think about algorithms
and frameworks
Version control code/data/models
Establish CI/CD/CM
Canary Releases, automated deployments and testing frameworks
Capture Data Anomalies Early
Automate Data Validation
Data Errors Same Rigor as Code
Continuous Training
36 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Notebooks Bane or Boon
Notebooks promote - Bad Habits – Dump all files into one Directory
Notebooks - Code and Output gets mixed up
Notebooks - Don’t version control well
37 © 2019 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Recommended Gartner Research