SlideShare a Scribd company logo
1
CD4ML and the challenges
of testing and quality in ML
systems
TensorFlow London Meetup, May 2020
Danilo Sato
@dtsato
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
7000+ technologists with 43 offices in 14 countries
We help clients become Modern Digital Businesses
DELIVER VALUE MOVE FASTTHINK BIG
#1
in Agile and
Continuous Delivery
100+
books written
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
CD4ML and the challenges of testing and quality in ML systems
Techniques
Continuous delivery
for machine
learning (CD4ML)
TRIAL
7
https://www.thoughtworks.com/radar
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
CD4ML isn’t a technology or a
tool; it is a practice and a set of
principles. Quality is built into
software and improvement is
always possible.
But machine learning systems
have unique challenges; unlike
deterministic software, it is
difficult—or impossible—to
understand the behavior of
data-driven intelligent systems.
This poses a huge challenge
when it comes to deploying
machine learning systems in
accordance with CD principles.
6
PRODUCTIONIZING ML IS HARD
Production systems should be:
● Reproducible
● Testable
● Auditable
● Continuously Improving
HOW DO WE APPLY DECADES OF SOFTWARE DELIVERY EXPERIENCE TO
INTELLIGENT SYSTEMS?
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
CD4ML isn’t a technology or a
tool; it is a practice and a set of
principles. Quality is built into
software and improvement is
always possible.
But machine learning systems
have unique challenges; unlike
deterministic software, it is
difficult—or impossible—to
understand the behavior of
data-driven intelligent systems.
This poses a huge challenge
when it comes to deploying
machine learning systems in
accordance with CD principles.
7
PRODUCTIONIZING ML IS HARD
Production systems should be:
● Reproducible
● Testable
● Auditable
● Continuously Improving
Machine Learning is:
● Non-deterministic
● Hard to test
● Hard to explain
● Hard to improve
HOW DO WE APPLY DECADES OF SOFTWARE DELIVERY EXPERIENCE TO
INTELLIGENT SYSTEMS?
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
MANY SOURCES OF CHANGE
8
ModelData Code
+ +
Schema
Sampling over Time
Volume
Algorithms
More Training
Experiments
Business Needs
Bug Fixes
Configuration
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
“Continuous Delivery is the ability to get changes of
all types — including new features, configuration
changes, bug fixes and experiments — into
production, or into the hands of users, safely and
quickly in a sustainable way.”
- Jez Humble & Dave Farley
9
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
PRINCIPLES OF CONTINUOUS DELIVERY
10
→ Create a Repeatable, Reliable Process for Releasing
Software
→ Automate Almost Everything
→ Build Quality In
→ Work in Small Batches
→ Keep Everything in Source Control
→ Done Means “Released”
→ Improve Continuously
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
TECHNICAL
COMPONENTS OF
CD4ML
Implementation requires lots of tools,
technologies, and architecture decisions
to fully automate the end-to-end process.
This presentation will focus on the
testing and quality aspects of CD4ML.
11
DOING CD4ML IS STILL A HARD PROBLEM
DISCOVERABLE AND
ACCESSIBLE DATA
REPRODUCIBLE
MODEL TRAINING
EXPERIMENTS
TRACKING
ELASTIC
INFRASTRUCTURE
VERSION CONTROL
& ARTIFACTS REPOS
MODEL SERVING
MODEL
DEPLOYMENT
TESTING & QUALITY
MONITORING &
OBSERVABILITY
CD
ORCHESTRATION
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
https://martinfowler.com/articles/cd4ml.html
“CLASSIC” SOFTWARE TEST PYRAMID
12
UI
Tests
Service Tests
Unit Tests
https://martinfowler.com/bliki/TestPyramid.html©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
Speed
Cost
AS SOFTWARE BECAME MORE COMPLEX
13
https://martinfowler.com/articles/microservice-testing©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
TESTING IN PRODUCTION
14
https://sookocheff.com/post/architecture/testing-in-production/©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
15
ModelData Code
+ +
??
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
TESTS FOR DATA
16
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
Data
Pipeline
Data/Feature Validation
Unit Tests
(Transformations, Engineered Features)
- Adherence to schemas
- Features can be used
- Schema versioning and
compatibility
- Integration tests against
(small) sample input
- Adherence to privacy
controls
- On-demand quality
checks
TESTS FOR MODEL
17
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
- Compare against a
simple model
- Numerical stability
(behaviour when NaN or
infinite values appear)
Unit Tests
(Model Specification)
Model
Quality
ML Training Pipeline
- Training is reproducible
(Watch out for sources of
non-determinism – e.g. RNG
seeds, initialization order)
- Integration test
18
ModelData Code
+ +
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
19
Model Performance
Contract Tests
Model Bias and Fairness
Data
Pipeline
Data/Feature Validation
Unit Tests
(Transformations, Engineered Features)
Unit Tests
(Model Specification)
Model
Quality
UI
Tests
Service Tests
Unit Tests
ML Training Pipeline
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
- Model evaluation against
different validation
datasets
- Thresholds for model
metrics and execution
performance
- Different data slices
- Feature generation is
same for training/serving
- Model contract is
adhered in production
- When model is exported,
test it still works
TESTING WHERE THEY OVERLAP
20
Model Performance
Contract Tests
Model Bias and Fairness
Data
Pipeline
Data/Feature Validation
Unit Tests
(Transformations, Engineered Features)
Unit Tests
(Model Specification)
Model
Quality
UI
Tests
Service Tests
Unit Tests
End-to-End Tests
Production Monitoring
Exploratory
Tests
ML Training Pipeline
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
- Model degradation
- Training/serving skew
- Operational metrics
(latency, throughput,
resource usage)
- Real impact! (KPIs)
21
“Inspection does not improve the
quality, nor guarantee quality.
Inspection is too late. The quality,
good or bad, is already in the
product.”
- W. Edward Deming
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
QUESTIONS?
22
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
WORKSHOPS,
PRESENTATIONS &
ARTICLES
Workshops:
https://github.com/ThoughtWorksInc/cd4ml-workshop
https://github.com/ThoughtWorksInc/CD4ML-Scenarios
Articles:
https://martinfowler.com/articles/cd4ml.html
https://www.thoughtworks.com/insights/articles/intelligent-enterprise-series-cd4ml
Paper:
“The ML Test Score: A Rubric for ML Production Readiness and Technical Debt
Reduction”, Breck et al (Google)
2323
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020
2424
THANK YOU!
Danilo Sato (dsato@thoughtworks.com)
@dtsato
©ThoughtWorks 2020 - @dtsato
TensorFlow London Meetup - May 28, 2020

More Related Content

Similar to CD4ML and the challenges of testing and quality in ML systems (20)

PDF
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
PDF
IBM Think Milano
ATMOSPHERE .
 
PDF
Continuous Intelligence: Moving Machine Learning into Production Reliably
Dr. Arif Wider
 
PPTX
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
PDF
Performance monitoring and call tracing in microservice environments
Martin Gutenbrunner
 
PPTX
Eliminate 7 Mudas
Raja Nagendra Kumar
 
PPTX
DevOpsGuys FutureDecoded 2016 - is DevOps the Answer
DevOpsGroup
 
PDF
Rsqrd AI: From R&D to ROI of AI
Sanjana Chowdhury
 
PDF
Continuous Delivery for Machine Learning
Thoughtworks
 
PDF
CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019
Christoph Windheuser
 
PDF
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
HemaVeeradhi1
 
PPTX
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Open Data Group
 
PPTX
Our research lines on Model-Driven Engineering and Software Engineering
Jordi Cabot
 
PPTX
Model Drift Monitoring using Tensorflow Model Analysis
Vivek Raja P S
 
PPTX
Continuous delivery practices and real experiences
Eduardo Ferro Aldama
 
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
PDF
Understand your data dependencies – Key enabler to efficient modernisation
Profinit
 
PDF
Introduction of TMAP to representatives of ISTQB boards in the GA week in Mar...
Rik Marselis
 
PDF
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 
PDF
Week 3 data journey and data storage
Ajay Taneja
 
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
IBM Think Milano
ATMOSPHERE .
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Dr. Arif Wider
 
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
Performance monitoring and call tracing in microservice environments
Martin Gutenbrunner
 
Eliminate 7 Mudas
Raja Nagendra Kumar
 
DevOpsGuys FutureDecoded 2016 - is DevOps the Answer
DevOpsGroup
 
Rsqrd AI: From R&D to ROI of AI
Sanjana Chowdhury
 
Continuous Delivery for Machine Learning
Thoughtworks
 
CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019
Christoph Windheuser
 
Data Science Meets DevOps: GitOps with OpenShift (1).pdf
HemaVeeradhi1
 
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Open Data Group
 
Our research lines on Model-Driven Engineering and Software Engineering
Jordi Cabot
 
Model Drift Monitoring using Tensorflow Model Analysis
Vivek Raja P S
 
Continuous delivery practices and real experiences
Eduardo Ferro Aldama
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus
 
Understand your data dependencies – Key enabler to efficient modernisation
Profinit
 
Introduction of TMAP to representatives of ISTQB boards in the GA week in Mar...
Rik Marselis
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 
Week 3 data journey and data storage
Ajay Taneja
 

More from Seldon (20)

PDF
TensorFlow London: Cutting edge generative models
Seldon
 
PDF
Tensorflow London: Tensorflow and Graph Recommender Networks by Yaz Santissi
Seldon
 
PDF
TensorFlow London: Progressive Growing of GANs for increased stability, quali...
Seldon
 
PDF
TensorFlow London 18: Dr Daniel Martinho-Corbishley, From science to startups...
Seldon
 
PDF
TensorFlow London 18: Dr Alastair Moore, Towards the use of Graphical Models ...
Seldon
 
PDF
Seldon: Deploying Models at Scale
Seldon
 
PDF
TensorFlow London 17: How NASA Frontier Development Lab scientists use AI to ...
Seldon
 
PDF
TensorFlow London 17: Practical Reinforcement Learning with OpenAI
Seldon
 
PDF
TensorFlow 16: Multimodal Sentiment Analysis with TensorFlow
Seldon
 
PDF
TensorFlow 16: Building a Data Science Platform
Seldon
 
PDF
Ai in financial services
Seldon
 
PDF
TensorFlow London 15: Find bugs in the herd with debuggable TensorFlow code
Seldon
 
PPTX
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
Seldon
 
PPTX
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Seldon
 
PDF
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Seldon
 
PDF
TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...
Seldon
 
PDF
TensorFlow London 11: Gema Parreno 'Use Cases of TensorFlow'
Seldon
 
PPTX
Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...
Seldon
 
PDF
TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'
Seldon
 
PDF
TensorFlow London 13.09.17 Ilya Dmitrichenko
Seldon
 
TensorFlow London: Cutting edge generative models
Seldon
 
Tensorflow London: Tensorflow and Graph Recommender Networks by Yaz Santissi
Seldon
 
TensorFlow London: Progressive Growing of GANs for increased stability, quali...
Seldon
 
TensorFlow London 18: Dr Daniel Martinho-Corbishley, From science to startups...
Seldon
 
TensorFlow London 18: Dr Alastair Moore, Towards the use of Graphical Models ...
Seldon
 
Seldon: Deploying Models at Scale
Seldon
 
TensorFlow London 17: How NASA Frontier Development Lab scientists use AI to ...
Seldon
 
TensorFlow London 17: Practical Reinforcement Learning with OpenAI
Seldon
 
TensorFlow 16: Multimodal Sentiment Analysis with TensorFlow
Seldon
 
TensorFlow 16: Building a Data Science Platform
Seldon
 
Ai in financial services
Seldon
 
TensorFlow London 15: Find bugs in the herd with debuggable TensorFlow code
Seldon
 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
Seldon
 
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Seldon
 
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Seldon
 
TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...
Seldon
 
TensorFlow London 11: Gema Parreno 'Use Cases of TensorFlow'
Seldon
 
Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...
Seldon
 
TensorFlow London 12: Oliver Gindele 'Recommender systems in Tensorflow'
Seldon
 
TensorFlow London 13.09.17 Ilya Dmitrichenko
Seldon
 
Ad

Recently uploaded (20)

PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
July Patch Tuesday
Ivanti
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
July Patch Tuesday
Ivanti
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Ad

CD4ML and the challenges of testing and quality in ML systems

  • 1. 1 CD4ML and the challenges of testing and quality in ML systems TensorFlow London Meetup, May 2020 Danilo Sato @dtsato ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 2. 7000+ technologists with 43 offices in 14 countries We help clients become Modern Digital Businesses DELIVER VALUE MOVE FASTTHINK BIG
  • 3. #1 in Agile and Continuous Delivery 100+ books written ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 5. Techniques Continuous delivery for machine learning (CD4ML) TRIAL 7 https://www.thoughtworks.com/radar ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 6. CD4ML isn’t a technology or a tool; it is a practice and a set of principles. Quality is built into software and improvement is always possible. But machine learning systems have unique challenges; unlike deterministic software, it is difficult—or impossible—to understand the behavior of data-driven intelligent systems. This poses a huge challenge when it comes to deploying machine learning systems in accordance with CD principles. 6 PRODUCTIONIZING ML IS HARD Production systems should be: ● Reproducible ● Testable ● Auditable ● Continuously Improving HOW DO WE APPLY DECADES OF SOFTWARE DELIVERY EXPERIENCE TO INTELLIGENT SYSTEMS? ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 7. CD4ML isn’t a technology or a tool; it is a practice and a set of principles. Quality is built into software and improvement is always possible. But machine learning systems have unique challenges; unlike deterministic software, it is difficult—or impossible—to understand the behavior of data-driven intelligent systems. This poses a huge challenge when it comes to deploying machine learning systems in accordance with CD principles. 7 PRODUCTIONIZING ML IS HARD Production systems should be: ● Reproducible ● Testable ● Auditable ● Continuously Improving Machine Learning is: ● Non-deterministic ● Hard to test ● Hard to explain ● Hard to improve HOW DO WE APPLY DECADES OF SOFTWARE DELIVERY EXPERIENCE TO INTELLIGENT SYSTEMS? ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 8. MANY SOURCES OF CHANGE 8 ModelData Code + + Schema Sampling over Time Volume Algorithms More Training Experiments Business Needs Bug Fixes Configuration ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 9. “Continuous Delivery is the ability to get changes of all types — including new features, configuration changes, bug fixes and experiments — into production, or into the hands of users, safely and quickly in a sustainable way.” - Jez Humble & Dave Farley 9 ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 10. PRINCIPLES OF CONTINUOUS DELIVERY 10 → Create a Repeatable, Reliable Process for Releasing Software → Automate Almost Everything → Build Quality In → Work in Small Batches → Keep Everything in Source Control → Done Means “Released” → Improve Continuously ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 11. TECHNICAL COMPONENTS OF CD4ML Implementation requires lots of tools, technologies, and architecture decisions to fully automate the end-to-end process. This presentation will focus on the testing and quality aspects of CD4ML. 11 DOING CD4ML IS STILL A HARD PROBLEM DISCOVERABLE AND ACCESSIBLE DATA REPRODUCIBLE MODEL TRAINING EXPERIMENTS TRACKING ELASTIC INFRASTRUCTURE VERSION CONTROL & ARTIFACTS REPOS MODEL SERVING MODEL DEPLOYMENT TESTING & QUALITY MONITORING & OBSERVABILITY CD ORCHESTRATION ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020 https://martinfowler.com/articles/cd4ml.html
  • 12. “CLASSIC” SOFTWARE TEST PYRAMID 12 UI Tests Service Tests Unit Tests https://martinfowler.com/bliki/TestPyramid.html©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020 Speed Cost
  • 13. AS SOFTWARE BECAME MORE COMPLEX 13 https://martinfowler.com/articles/microservice-testing©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 15. 15 ModelData Code + + ?? ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 16. TESTS FOR DATA 16 ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020 Data Pipeline Data/Feature Validation Unit Tests (Transformations, Engineered Features) - Adherence to schemas - Features can be used - Schema versioning and compatibility - Integration tests against (small) sample input - Adherence to privacy controls - On-demand quality checks
  • 17. TESTS FOR MODEL 17 ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020 - Compare against a simple model - Numerical stability (behaviour when NaN or infinite values appear) Unit Tests (Model Specification) Model Quality ML Training Pipeline - Training is reproducible (Watch out for sources of non-determinism – e.g. RNG seeds, initialization order) - Integration test
  • 18. 18 ModelData Code + + ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 19. 19 Model Performance Contract Tests Model Bias and Fairness Data Pipeline Data/Feature Validation Unit Tests (Transformations, Engineered Features) Unit Tests (Model Specification) Model Quality UI Tests Service Tests Unit Tests ML Training Pipeline ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020 - Model evaluation against different validation datasets - Thresholds for model metrics and execution performance - Different data slices - Feature generation is same for training/serving - Model contract is adhered in production - When model is exported, test it still works TESTING WHERE THEY OVERLAP
  • 20. 20 Model Performance Contract Tests Model Bias and Fairness Data Pipeline Data/Feature Validation Unit Tests (Transformations, Engineered Features) Unit Tests (Model Specification) Model Quality UI Tests Service Tests Unit Tests End-to-End Tests Production Monitoring Exploratory Tests ML Training Pipeline ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020 - Model degradation - Training/serving skew - Operational metrics (latency, throughput, resource usage) - Real impact! (KPIs)
  • 21. 21 “Inspection does not improve the quality, nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product.” - W. Edward Deming ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 22. QUESTIONS? 22 ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020
  • 24. 2424 THANK YOU! Danilo Sato ([email protected]) @dtsato ©ThoughtWorks 2020 - @dtsato TensorFlow London Meetup - May 28, 2020