SlideShare a Scribd company logo
Some Things I Wish I Had Known
Before Scaling Machine Learning
Solutions
Invector Labs
Today’s
session is
about
differentiating
BS from
reality…
Agenda
• Myths and realities of machine learning solutions in the real world
• 15 Lessons I learned when building large scale machine learning
systems
• Challenge
• What we learned?
• Solution
The different
dimensions of
machine
intelligence
solutions…
We can discuss the theoretical definitions or,
instead, focus on the pragmatic one…
Implementing Machine Learning in the Real World
Implementing Machine Learning in the Real World
But the reality
remains that
building machine
learning
solutions
remains brutally
difficult
But not just because of the obvious reasons…
Challenges of Machine Learning in the Real
World
High
Technological
Barrier
Limited
Talent
Availability
Labeled
Datasets
Cost
…
A lifecycle
we haven’t
seen
before…
We are dealing with a new app lifecycle…
Traditional App Lifecycle Machine Learning App
Lifecycle
Experimentation
Model Creation
Training
Testing
Regularization
Deployment
Monitoring
Optimization
Design Implementation Deployment
Management/
Monitoring
The
Ecosystem
is Incredibly
Crowded
The Aspects of a Machine Learning Solution
that will Drive You Crazy
Strategy &
Processes
Data Engineering
Experimentation Model Training
Model
Operationalization
Runtime
Execution
Security
Lifecycle
Management
Optimization …
Lessons
learned when
building high
scale machine
learning
solutions…
Strategy & Processes…
Lesson #1:
Data
scientists
make horrible
engineers…
Challenges Data scientists are great at experimentation
Not so much at writing high quality code
Experimentation deep learning frameworks
don’t necessarily make great production
frameworks, ex: PyTorch vs. TensorFlow
Some Ideas to Consider
•Write notebooks and
experimentation
models
Data Science
Team
•Refactor or rewrite
models for production
environments
•Automate training
and optimization jobs
Engineering
Team •Deploy models
•Monitor, retrain, and
optimize models
DevOps Teams
• Divide data science and
data engineering teams
Lesson #2
Neither Agile nor
Waterfall
Methodologies
Work in Machine
Learning
Challenges Waterfall methods don’t work
because you rarely know what
machine learning methods are
going to work for a specific problem
Agile methods don’t work because
you need very specific
requirements
Some Ideas to Consider
Agile Waterfall Agile
• Split the
development
lifecycle into agile
and waterfall
iterations
Data Engineering…
Lesson # 3 :
Feature
extraction can
become a
reusability
nightmare…
Challenges Different models require the same
features from a dataset
Feature extraction jobs are
computationally expensive
Different teams create proprietary
ways to capture and store feature
information
Some Ideas to Consider
Dataset Preparation
Job1
Dataset Preparation
Job2
Dataset Preparation
JobN
Representation
Learning Task1
Representation
Learning Task1
Representation
Learning Task1
Feature
Store
Model 1
Model N
 Implement a centralized
feature store
 Leverage
representation learning
to extract relevant
features from a dataset
 Look for reference
architectures: ex:
Uber’s Michelangelo
Lesson #4 :
Data labeling is
so easy to
underestimate
Challenges Data experts spend a lot of time
labeling datasets
The logic for data labeling is often not
reusable
Subjective data labeling strategy fail to
differentiate between useful and
useless features
Some Ideas to Consider
 Implement an
automated data
labeling strategy
 Generative learning can
help to structure more
effective labels
 Project Snorkel is one of
the leading automated
data labeling
frameworks in the
market
Model Experimentation…
Lesson #5: The
single machine
learning
framework
fallacy
Challenges Enterprises like to standardize on a
single machine learning framework
Different teams have different
technology preferences
Providing a consistent machine learning
platform across different machine
learning frameworks is no easy task
Some Ideas to Consider
Experimentation
Framework
Intermediate
Representation
Production
Framework
 Optimize for productivity, not
consistency
 Enable enough flexibility to
leverage different frameworks for
experimentation and production
 ONNX is a great solution for
intermediate representations
Lesson #6: Too
much time
going from
notebooks to
production
programs
Challenges Notebooks are ideal for model
experimentation and testing
Notebooks typically have performance
challenges when executed at scale
Scaling Notebook environments can be
challenging
Parametrizing Notebook executions is
far from trivial
Some Ideas To Consider
• Jupyter,
Zeppelin
Model
Experimentation
• Papermill
• Netflix’s
Meson
Scheduling
Notebooks • Docker
Containers
• Kubernetes
Running
Complex
Workflows
 Enable an infrastructure to
operationalize data science
notebooks
 Use containers for the most
complex machine learning
workflows
Lesson #7:
Model
selection can
be a machine
learning
problem
Challenges Data scientists make very subjective
decisions when comes to model
selection
The same problem can be solved using
different machine learning models
Very often is almost impossible to
differentiate between similar models
Some Ideas To Consider
 Represent machine learning
requirements as a dataset
with an objective attribute
 Leverage AutoML-based
techniques for model
selection
Problem
Dataset
AutoML
Proposed
Models
Machine learning training…
Lesson #8:
Training is
a
continuous
task…
Challenges The No Free Lunch Theorem
Trained models can perform poorly
against new datasets
New engineers and DevOps need to
understand how to re-train existing
models
Some Ideas to Consider
DataLake
Data Outcomes/Feature
Store
Training Job1
Training Job2
Training JobN
 Automate Training Jobs
 Orchestrate scheduled
execution of training jobs
Lesson #9:
Training
should be
incremental…
Challenges Training machine learning models can
be computationally expensive
Most machine learning models need to
be retrained entirely based on the
arrival of new data
Its nearly impossible to quantify the
impact that new datasets have in the
performance of a model
Some Ideas to Consider
 Implement continual
learning models
 Consider transfer learning
as a fundamental enabler
Lesson #10:
Training a
model requires
as much
coding as
creating it…
Challenges Data engineers spend a lot of time
writing training routines for machine
learning models
Comparing the performance of different
models on the same datasets remains
tricky
Changes on a training dataset often
imply changes on the training code
Some Ideas to Consider
 Explore a configuration-
driven training process
 Uber’s Ludwig is an
innovative, no-code
framework for training
machine learning models
Executing Machine Learning Models…
Lesson #11:
Different models
require different
execution
patterns…
Challenges Not all models can be executed via APIs
Some models take a long time to run
In some scenarios, different models
need to be executed at the same time
based on a specific condition
Some Ideas to Consider
Scheduled
Activation
Model Model
Pub-Sub
Activation
Model Model
On-Demand
Activation
Model Model
Model API
Gateway
Event
Gateway Enable different
execution modes based
on client’s requirements
Lesson #12:
Mobile deep
learning is
more
complicated
than you think
Challenges Centralized cloud deep learning models don’t
scale
On-device deep learning models are hard to
distribute and train
Tons of privacy challenges
Some Ideas to Consider
 Consider using
federated learning
or similar patterns
for mobile based
machine learning
Machine Learning Operationalization…
Lesson
#13:
Debugging
is a
nightmare
Challenges The accuracy-interpretability friction
The unpredictability factor
Limited toolset
Some Ideas to Consider
•Use tools like
TensorBoard to
visualize the structure
of neural networks
Visualize the Network
and its Results
•High training error is a
sign of underfitting
•High test error and
low training error is a
sign of overfitting
Compare Training and
Test Errors •Helps to determine
whether the error is in
the code or in the data
Test with Small
Datasets
•Monitor the number
of activations in
hidden units
Monitor Activations
and Gradient Values
Understanding How
Nodes are Activated
Understanding what
Hidden Layers Do
Understanding How
Concepts are Formed
Interpretability
 Establish systematic
practices to debug
machine learning
models
 Onboard modeling
visualization and
interpretability tools
Security…
Lesson #14:
Machine
learning
models are so
easy to hack
Challenges Most neural networks are vulnerable to
adversarial attacks
Attackers don’t need access to the models but
can simply manipulate input datasets
Most of the times adversarial attacks go
undetected
Some Ideas to Consider
 Test your neural
networks for
adversarial robustness
 IBM’s adversarial
robustness toolbox is
one of the leading
stacks in neural
network security
Lesson # 15:
Data privacy
is the
elephant in
the machine
learning room
Challenges Machine learning models intrinsically build
knowledge about private datasets
Most machine learning techniques require
clear access to data which, in many cases,
contains sensitive information
There are no established techniques to
evaluating the privacy robustness of machine
learning models
Some Ideas to Consider
 Private machine learning is
an emerging area of
research
 Leverage techniques such
as secured multi-party
computations or zero-
knowledge-proofs to
obfuscate training datasets
 PySyft is an emerging
framework to enable
privacy in machine learning
models
Some not-well-known, reference
architectures that might help…
DAWN Project from Stanford University Michelangelo from Uber
MLFlow from DataBricks
FBLearner from Facebook
TFX from Google
The challenges go beyond the obvious…
Three Foundational Challenges for the
Mainstream Adoption of Machine Learning
Lowering the Technological Entry Point
• Can mainstream developers embrace machine learning stacks?
Talent Availability
• Can companies and governments nurture local data science
talent?
Data Democratization
• Can rich datasets stop being a privilege of large corporations
and governments ?
Some Initiatives to Consider
Lowering the Technological Entry Point
• AutoML, low-code machine learning frameworks
Talent Availability
• Google AI Academy, Coursera, Udacity…
Data Democratization
• Decentralized AI platforms
Summary
• Implementing machine learning solutions in the real world remains
incredibly challenging
• There is a large gap between the advancements in AI research and the
practical viability of those techniques
• Machine learning applications require a new lifecycle different from
traditional software models
• Each aspect of that lifecycle brings a unique set of challenges
• Start small, iterate…
Thanks
jr@invectoriq.com
jr@intotheblock.io
https://medium.com/@jrodthoughts
https://twitter.com/jrdothoughts
Ad

More Related Content

What's hot (8)

Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya
 
Ai use cases
Ai use casesAi use cases
Ai use cases
Sparsh Agarwal
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
Mostafa
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
Seldon
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
Temporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platformTemporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platform
Jordi Cabot
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
Mostafa
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
Seldon
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
Temporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platformTemporal EMF: A temporal metamodeling platform
Temporal EMF: A temporal metamodeling platform
Jordi Cabot
 

Similar to Implementing Machine Learning in the Real World (20)

Machine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsMachine learning: A Walk Through School Exams
Machine learning: A Walk Through School Exams
Ramsha Ijaz
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science Projects
Danielle Dean
 
Webinar: Design Patterns : Tailor-made solutions for Software Development
Webinar: Design Patterns : Tailor-made solutions for Software DevelopmentWebinar: Design Patterns : Tailor-made solutions for Software Development
Webinar: Design Patterns : Tailor-made solutions for Software Development
Edureka!
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Design Patterns - The Ultimate Blueprint for Software
Design Patterns - The Ultimate Blueprint for SoftwareDesign Patterns - The Ultimate Blueprint for Software
Design Patterns - The Ultimate Blueprint for Software
Edureka!
 
Simulation Powerpoint- Lecture Notes
Simulation Powerpoint- Lecture NotesSimulation Powerpoint- Lecture Notes
Simulation Powerpoint- Lecture Notes
Kesavartinii Bala Krisnain
 
20121121101127simulation azmi
20121121101127simulation azmi20121121101127simulation azmi
20121121101127simulation azmi
Ahmad Nur Faiz
 
Machine Learning Activities for Students|ashokveda.com.pdf
Machine Learning Activities for Students|ashokveda.com.pdfMachine Learning Activities for Students|ashokveda.com.pdf
Machine Learning Activities for Students|ashokveda.com.pdf
df2608021
 
Module_1_Slide_01.pdf
Module_1_Slide_01.pdfModule_1_Slide_01.pdf
Module_1_Slide_01.pdf
FazleeKan
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
SQALab
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
ankit_ppt
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MAHIRA
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
CCG
 
Artificial Intelligence with Python | Edureka
Artificial Intelligence with Python | EdurekaArtificial Intelligence with Python | Edureka
Artificial Intelligence with Python | Edureka
Edureka!
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
Inductive Automation
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
Edge AI and Vision Alliance
 
Vitriol
VitriolVitriol
Vitriol
Sertaç Kağan Aydın
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
David Murgatroyd
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
Dr. Amanpreet Kaur
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Machine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsMachine learning: A Walk Through School Exams
Machine learning: A Walk Through School Exams
Ramsha Ijaz
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science Projects
Danielle Dean
 
Webinar: Design Patterns : Tailor-made solutions for Software Development
Webinar: Design Patterns : Tailor-made solutions for Software DevelopmentWebinar: Design Patterns : Tailor-made solutions for Software Development
Webinar: Design Patterns : Tailor-made solutions for Software Development
Edureka!
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
Design Patterns - The Ultimate Blueprint for Software
Design Patterns - The Ultimate Blueprint for SoftwareDesign Patterns - The Ultimate Blueprint for Software
Design Patterns - The Ultimate Blueprint for Software
Edureka!
 
20121121101127simulation azmi
20121121101127simulation azmi20121121101127simulation azmi
20121121101127simulation azmi
Ahmad Nur Faiz
 
Machine Learning Activities for Students|ashokveda.com.pdf
Machine Learning Activities for Students|ashokveda.com.pdfMachine Learning Activities for Students|ashokveda.com.pdf
Machine Learning Activities for Students|ashokveda.com.pdf
df2608021
 
Module_1_Slide_01.pdf
Module_1_Slide_01.pdfModule_1_Slide_01.pdf
Module_1_Slide_01.pdf
FazleeKan
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
SQALab
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
ankit_ppt
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MAHIRA
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
CCG
 
Artificial Intelligence with Python | Edureka
Artificial Intelligence with Python | EdurekaArtificial Intelligence with Python | Edureka
Artificial Intelligence with Python | Edureka
Edureka!
 
Design Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning BasicsDesign Like a Pro: Machine Learning Basics
Design Like a Pro: Machine Learning Basics
Inductive Automation
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
Edge AI and Vision Alliance
 
Introduction to Machine Learning.pptx
Introduction to Machine Learning.pptxIntroduction to Machine Learning.pptx
Introduction to Machine Learning.pptx
Dr. Amanpreet Kaur
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Ad

More from Jesus Rodriguez (20)

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-Primitives
Jesus Rodriguez
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto Market
Jesus Rodriguez
 
MEV Deep Dive .pptx
MEV Deep Dive .pptxMEV Deep Dive .pptx
MEV Deep Dive .pptx
Jesus Rodriguez
 
Quant in Crypto Land
Quant in Crypto LandQuant in Crypto Land
Quant in Crypto Land
Jesus Rodriguez
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the Numbers
Jesus Rodriguez
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies
Jesus Rodriguez
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating Strategies
Jesus Rodriguez
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFi
Jesus Rodriguez
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About
Jesus Rodriguez
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics
Jesus Rodriguez
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and Challenges
Jesus Rodriguez
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
Jesus Rodriguez
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain Indicators
Jesus Rodriguez
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for Cryptocurrencies
Jesus Rodriguez
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
Jesus Rodriguez
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep Learning
Jesus Rodriguez
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data Science
Jesus Rodriguez
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven rev
Jesus Rodriguez
 
Fundamental Analysis for Crypto Assets
Fundamental Analysis for Crypto AssetsFundamental Analysis for Crypto Assets
Fundamental Analysis for Crypto Assets
Jesus Rodriguez
 
The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-Primitives
Jesus Rodriguez
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto Market
Jesus Rodriguez
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the Numbers
Jesus Rodriguez
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies
Jesus Rodriguez
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating Strategies
Jesus Rodriguez
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFi
Jesus Rodriguez
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About
Jesus Rodriguez
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics
Jesus Rodriguez
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and Challenges
Jesus Rodriguez
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
Jesus Rodriguez
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain Indicators
Jesus Rodriguez
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for Cryptocurrencies
Jesus Rodriguez
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
Jesus Rodriguez
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep Learning
Jesus Rodriguez
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data Science
Jesus Rodriguez
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven rev
Jesus Rodriguez
 
Fundamental Analysis for Crypto Assets
Fundamental Analysis for Crypto AssetsFundamental Analysis for Crypto Assets
Fundamental Analysis for Crypto Assets
Jesus Rodriguez
 
Ad

Recently uploaded (20)

Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 

Implementing Machine Learning in the Real World

  • 1. Some Things I Wish I Had Known Before Scaling Machine Learning Solutions Invector Labs
  • 3. Agenda • Myths and realities of machine learning solutions in the real world • 15 Lessons I learned when building large scale machine learning systems • Challenge • What we learned? • Solution
  • 5. We can discuss the theoretical definitions or, instead, focus on the pragmatic one…
  • 8. But the reality remains that building machine learning solutions remains brutally difficult
  • 9. But not just because of the obvious reasons…
  • 10. Challenges of Machine Learning in the Real World High Technological Barrier Limited Talent Availability Labeled Datasets Cost …
  • 12. We are dealing with a new app lifecycle… Traditional App Lifecycle Machine Learning App Lifecycle Experimentation Model Creation Training Testing Regularization Deployment Monitoring Optimization Design Implementation Deployment Management/ Monitoring
  • 14. The Aspects of a Machine Learning Solution that will Drive You Crazy Strategy & Processes Data Engineering Experimentation Model Training Model Operationalization Runtime Execution Security Lifecycle Management Optimization …
  • 15. Lessons learned when building high scale machine learning solutions…
  • 18. Challenges Data scientists are great at experimentation Not so much at writing high quality code Experimentation deep learning frameworks don’t necessarily make great production frameworks, ex: PyTorch vs. TensorFlow
  • 19. Some Ideas to Consider •Write notebooks and experimentation models Data Science Team •Refactor or rewrite models for production environments •Automate training and optimization jobs Engineering Team •Deploy models •Monitor, retrain, and optimize models DevOps Teams • Divide data science and data engineering teams
  • 20. Lesson #2 Neither Agile nor Waterfall Methodologies Work in Machine Learning
  • 21. Challenges Waterfall methods don’t work because you rarely know what machine learning methods are going to work for a specific problem Agile methods don’t work because you need very specific requirements
  • 22. Some Ideas to Consider Agile Waterfall Agile • Split the development lifecycle into agile and waterfall iterations
  • 24. Lesson # 3 : Feature extraction can become a reusability nightmare…
  • 25. Challenges Different models require the same features from a dataset Feature extraction jobs are computationally expensive Different teams create proprietary ways to capture and store feature information
  • 26. Some Ideas to Consider Dataset Preparation Job1 Dataset Preparation Job2 Dataset Preparation JobN Representation Learning Task1 Representation Learning Task1 Representation Learning Task1 Feature Store Model 1 Model N  Implement a centralized feature store  Leverage representation learning to extract relevant features from a dataset  Look for reference architectures: ex: Uber’s Michelangelo
  • 27. Lesson #4 : Data labeling is so easy to underestimate
  • 28. Challenges Data experts spend a lot of time labeling datasets The logic for data labeling is often not reusable Subjective data labeling strategy fail to differentiate between useful and useless features
  • 29. Some Ideas to Consider  Implement an automated data labeling strategy  Generative learning can help to structure more effective labels  Project Snorkel is one of the leading automated data labeling frameworks in the market
  • 31. Lesson #5: The single machine learning framework fallacy
  • 32. Challenges Enterprises like to standardize on a single machine learning framework Different teams have different technology preferences Providing a consistent machine learning platform across different machine learning frameworks is no easy task
  • 33. Some Ideas to Consider Experimentation Framework Intermediate Representation Production Framework  Optimize for productivity, not consistency  Enable enough flexibility to leverage different frameworks for experimentation and production  ONNX is a great solution for intermediate representations
  • 34. Lesson #6: Too much time going from notebooks to production programs
  • 35. Challenges Notebooks are ideal for model experimentation and testing Notebooks typically have performance challenges when executed at scale Scaling Notebook environments can be challenging Parametrizing Notebook executions is far from trivial
  • 36. Some Ideas To Consider • Jupyter, Zeppelin Model Experimentation • Papermill • Netflix’s Meson Scheduling Notebooks • Docker Containers • Kubernetes Running Complex Workflows  Enable an infrastructure to operationalize data science notebooks  Use containers for the most complex machine learning workflows
  • 37. Lesson #7: Model selection can be a machine learning problem
  • 38. Challenges Data scientists make very subjective decisions when comes to model selection The same problem can be solved using different machine learning models Very often is almost impossible to differentiate between similar models
  • 39. Some Ideas To Consider  Represent machine learning requirements as a dataset with an objective attribute  Leverage AutoML-based techniques for model selection Problem Dataset AutoML Proposed Models
  • 42. Challenges The No Free Lunch Theorem Trained models can perform poorly against new datasets New engineers and DevOps need to understand how to re-train existing models
  • 43. Some Ideas to Consider DataLake Data Outcomes/Feature Store Training Job1 Training Job2 Training JobN  Automate Training Jobs  Orchestrate scheduled execution of training jobs
  • 45. Challenges Training machine learning models can be computationally expensive Most machine learning models need to be retrained entirely based on the arrival of new data Its nearly impossible to quantify the impact that new datasets have in the performance of a model
  • 46. Some Ideas to Consider  Implement continual learning models  Consider transfer learning as a fundamental enabler
  • 47. Lesson #10: Training a model requires as much coding as creating it…
  • 48. Challenges Data engineers spend a lot of time writing training routines for machine learning models Comparing the performance of different models on the same datasets remains tricky Changes on a training dataset often imply changes on the training code
  • 49. Some Ideas to Consider  Explore a configuration- driven training process  Uber’s Ludwig is an innovative, no-code framework for training machine learning models
  • 51. Lesson #11: Different models require different execution patterns…
  • 52. Challenges Not all models can be executed via APIs Some models take a long time to run In some scenarios, different models need to be executed at the same time based on a specific condition
  • 53. Some Ideas to Consider Scheduled Activation Model Model Pub-Sub Activation Model Model On-Demand Activation Model Model Model API Gateway Event Gateway Enable different execution modes based on client’s requirements
  • 54. Lesson #12: Mobile deep learning is more complicated than you think
  • 55. Challenges Centralized cloud deep learning models don’t scale On-device deep learning models are hard to distribute and train Tons of privacy challenges
  • 56. Some Ideas to Consider  Consider using federated learning or similar patterns for mobile based machine learning
  • 59. Challenges The accuracy-interpretability friction The unpredictability factor Limited toolset
  • 60. Some Ideas to Consider •Use tools like TensorBoard to visualize the structure of neural networks Visualize the Network and its Results •High training error is a sign of underfitting •High test error and low training error is a sign of overfitting Compare Training and Test Errors •Helps to determine whether the error is in the code or in the data Test with Small Datasets •Monitor the number of activations in hidden units Monitor Activations and Gradient Values Understanding How Nodes are Activated Understanding what Hidden Layers Do Understanding How Concepts are Formed Interpretability  Establish systematic practices to debug machine learning models  Onboard modeling visualization and interpretability tools
  • 63. Challenges Most neural networks are vulnerable to adversarial attacks Attackers don’t need access to the models but can simply manipulate input datasets Most of the times adversarial attacks go undetected
  • 64. Some Ideas to Consider  Test your neural networks for adversarial robustness  IBM’s adversarial robustness toolbox is one of the leading stacks in neural network security
  • 65. Lesson # 15: Data privacy is the elephant in the machine learning room
  • 66. Challenges Machine learning models intrinsically build knowledge about private datasets Most machine learning techniques require clear access to data which, in many cases, contains sensitive information There are no established techniques to evaluating the privacy robustness of machine learning models
  • 67. Some Ideas to Consider  Private machine learning is an emerging area of research  Leverage techniques such as secured multi-party computations or zero- knowledge-proofs to obfuscate training datasets  PySyft is an emerging framework to enable privacy in machine learning models
  • 69. DAWN Project from Stanford University Michelangelo from Uber MLFlow from DataBricks FBLearner from Facebook TFX from Google
  • 70. The challenges go beyond the obvious…
  • 71. Three Foundational Challenges for the Mainstream Adoption of Machine Learning Lowering the Technological Entry Point • Can mainstream developers embrace machine learning stacks? Talent Availability • Can companies and governments nurture local data science talent? Data Democratization • Can rich datasets stop being a privilege of large corporations and governments ?
  • 72. Some Initiatives to Consider Lowering the Technological Entry Point • AutoML, low-code machine learning frameworks Talent Availability • Google AI Academy, Coursera, Udacity… Data Democratization • Decentralized AI platforms
  • 73. Summary • Implementing machine learning solutions in the real world remains incredibly challenging • There is a large gap between the advancements in AI research and the practical viability of those techniques • Machine learning applications require a new lifecycle different from traditional software models • Each aspect of that lifecycle brings a unique set of challenges • Start small, iterate…