SlideShare a Scribd company logo
From Science to Engineering, 

Process of a Machine Learning Product
Bruce Kuo

bruce3557@gmail.com
!1
Who Am I?
• Bruce Kuo

• Experience:

• Yahoo software engineer in Data team
and Global Search (2014-2017)

• Codementor Data Scientist
(2017 - 2019)
!2
Target Audience
• Who is interested in machine learning product development

• Junior / Mid Level machine learning engineers

• Data scientists / engineers
!3
Goals of This Talk
• Share the overview of a machine learning project

• Share points between business problems and machine learning problems

• Share engineering stuffs in a machine learning product
!4
Machine Learning Project Overview
!5
Machine Learning
Project
Science Engineering
Science
Levels
Research
Steps
Requirements
Define Problem &
Objectives
Offline
Evalution
Solution
Research
Model
Serialization
ML Data
Pipeline
Model

Serving
Performance
Tracking
CI &
Monitoring
Science
!6
Two Different Science Levels
• Unknown business problem

• Example: do we need fast
recommendation after user view a
product page?

• This is another topic
• Known business problem, unknown solutions

• Example: we have supply problem on
matching algorithm, how can we improve
conversion by recommendation.

• More ML steps here

!7
We focus on known business problem in this sharing.
Unknown problem part is another story…
Where ML Requirements Come From
!8
Data Analysis
We need a
recommendation
module!
PM, Analysts
Where ML Requirements Come From
!9
Qualitative Analysis

(User Feedback)
We need to improve
our tag suggestion
module!
PM, Designer, Sales, Marketing
ML Science on Business
• ML science of business problems are like “experimental science”

• Different dataset will have different algorithms to solve / learn.

• Designing experiments is important.
!10
ML Problem Steps
• Goal: enhance a specific business metric
!11
Define Problem &
Objectives
Solution Research &
Experiments
Define Evaluation
Metrics
Define the Problem
• What is the business problem?

• News triggering

• Mentor matching

• Which type of ML problem can be used to solve the business problem?

• Classification?

• Recommendation?

• …
!12
Define Objectives
• In algorithm, we focus on loss

• 0/1 loss

• Mean Square Error (MSE)

• Mean Absolute Error (MAE)

• Cross Entropy

• …
https://cloud.tencent.com/developer/article/1092365!13
• In business, we focus on
business goal.

• Interest rate

• Conversion 

• CTR

• …
Design Offline Evaluation
• After defining problem & objectives, we need to design offline evaluation.

• Usually offline evaluation metrics are business goals (CTR, interest rate, …)

• First version of data pipeline design and online evaluation design.

• Provide confidence before we start integrating algorithm to online service.

• Supervised offline evaluation is easy, unsupervised is hard.
!14
Solution Research
!15
• Paper, paper, paper

• Learning how to solve similar problems
and how we can get idea from those
solutions

• Research areas of machine learning

• For different purposes: classification /
regression / clustering …

• Algorithm optimization: which kind of
gradient descend function is better
Solution Research (Cont.)
• In startup, we usually focus on high level parts because:

• Tuning speed

• Integration - need to choose mature implementation for better
production usage, e.g., scikit-learn or keras.

• Feature engineering is pretty important when we only select
algorithms 

• Small goals on solution engineering - easy to retrain
!16
Example: Product Recommendation
• Problem: Give an user, we want to recommend products to the user

• Ranking problem or recommendation problem

• Objectives: 

• Business Metrics: top-k interest rate

• Loss function: dependent to our solution

• Offline Metrics: We evaluate top-k interest rate as performance metrics after optimization

• Solution research: Matrix Factorization, Factorization machine, Deep Learning, Learning to
rank…
!17
Engineering
!18
Why Engineering Needed?
• Model results should be used in your products

• How?
!19
Need CI / Monitoring
Model Training Data Pipeline
Science Engineering
Serialization
API Serving
From Science to Engineering
First Step: Export Model
!20
Serialization - Export Model
• Goal: serialize your model into binary file or general format, everyone can use
this for prediction.

• Different serialization methods for different algorithms but same interface in
different machine learning packages

• Scikit-learn: https://scikit-learn.org/stable/modules/model_persistence.html

• Keras: https://jovianlin.io/saving-loading-keras-models/

• …

• More low level design: http://dmg.org/pmml/products.html
!21
Serialization - Export Model
• Example: how to serialize logistic regression model?

• scikit-learn: joblib.dump(model, path)

• From scratch: need to realize the model equation

• Equation:

• Only save , that is a linear weight vector, and we can calculate the
prediction function.

• PMML is trying to define serialization interface for each algorithm
!22
Pr(Yi = y|Xi) =
eβ*Xi*y
1 + eβ*Xi
β
Serialization - Export Latent Features
• We extract hidden vectors to represent user / items

• Extract photo features with auto encoder

• Extract user features with matrix factorization …

• 2 Ways to export latent features

• Save model, e.g., auto-encoder

• Save features vectors, e.g., matrix factorization vectors
!23
Example - Matrix Factorization
!24
picture: https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/
Save by user
Save by product
How to Use Model Result?
• Model predict in data API

• Model predict in data pipeline
!25
Predict in Data API
• Model predict in Data API

• Prepare data in data pipeline

• Data in request payload

• Can provide realtime prediction

• Latency is a challenge
Data API
Model
user data
Data Warehouse
user data
!26
Serving Database
user features
DataPipeline
Predict in Data Pipeline
• Model predict in Data Pipeline

• Predict result in pipeline and save to
database 

• Backend implements logics on their
side

• Better API speed

• Lower flexibility
Data API
Model
Data Warehouse
user data
Serving Database
predict
extract result
Components
!27
DataPipeline
Other Concerns in Engineering
• How long we need to provide
model results to users?

• How to handle data changes?

• Online performance tracking

• Monitoring

• CI / CD
Factors to design your
pipeline
!28
Conclusion
• The overview of a machine learning project

• Points between business problems and machine learning problems

• Engineering details in a machine learning project
!29
Q & A

Thanks for
Listening!
!30
Ad

More Related Content

What's hot (7)

Software Engineering : Process Models
Software Engineering : Process ModelsSoftware Engineering : Process Models
Software Engineering : Process Models
Ajit Nayak
 
[2016/2017] Modern development paradigms
[2016/2017] Modern development paradigms [2016/2017] Modern development paradigms
[2016/2017] Modern development paradigms
Ivano Malavolta
 
Unit 5
Unit 5Unit 5
Unit 5
GunasundariSelvaraj
 
UML Intro
UML IntroUML Intro
UML Intro
koppenolski
 
Chapter 12 Lecture: GUI Programming, Multithreading, and Animation
Chapter 12 Lecture: GUI Programming, Multithreading, and AnimationChapter 12 Lecture: GUI Programming, Multithreading, and Animation
Chapter 12 Lecture: GUI Programming, Multithreading, and Animation
Nicole Ryan
 
Class Diagrams
Class DiagramsClass Diagrams
Class Diagrams
Mubariz Hamza Aslam
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Benjamin Le
 
Software Engineering : Process Models
Software Engineering : Process ModelsSoftware Engineering : Process Models
Software Engineering : Process Models
Ajit Nayak
 
[2016/2017] Modern development paradigms
[2016/2017] Modern development paradigms [2016/2017] Modern development paradigms
[2016/2017] Modern development paradigms
Ivano Malavolta
 
Chapter 12 Lecture: GUI Programming, Multithreading, and Animation
Chapter 12 Lecture: GUI Programming, Multithreading, and AnimationChapter 12 Lecture: GUI Programming, Multithreading, and Animation
Chapter 12 Lecture: GUI Programming, Multithreading, and Animation
Nicole Ryan
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Benjamin Le
 

Similar to From science to engineering, the process to build a machine learning product (20)

DE PPT.pptx
DE PPT.pptxDE PPT.pptx
DE PPT.pptx
Priyanka Prajapati
 
Mg6088 spm unit-2
Mg6088 spm unit-2Mg6088 spm unit-2
Mg6088 spm unit-2
SIMONTHOMAS S
 
Year13_SystemModelsmypresentationTechnology.ppt
Year13_SystemModelsmypresentationTechnology.pptYear13_SystemModelsmypresentationTechnology.ppt
Year13_SystemModelsmypresentationTechnology.ppt
AbhishekaVidyalankar
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
Mathieu Dumoulin
 
project planning components.pdf
project planning components.pdfproject planning components.pdf
project planning components.pdf
saman Iftikhar
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
What Is PLM and Why Is It Important
What Is PLM and Why Is It ImportantWhat Is PLM and Why Is It Important
What Is PLM and Why Is It Important
Elizabeth Steiner
 
Chapter 04 Basic OOAD Process_Software Eng.ppt
Chapter 04 Basic OOAD Process_Software Eng.pptChapter 04 Basic OOAD Process_Software Eng.ppt
Chapter 04 Basic OOAD Process_Software Eng.ppt
AhammadUllah3
 
Soft engg introduction and process models
Soft engg introduction and process modelsSoft engg introduction and process models
Soft engg introduction and process models
snehalkulkarni74
 
Project Estimation
Project EstimationProject Estimation
Project Estimation
Kasun Ranga Wijeweera
 
Machine learning in survey monkey
Machine learning in survey monkeyMachine learning in survey monkey
Machine learning in survey monkey
Da Kuang
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3
Azhar Shaik
 
Machine learning specialist ver#4
Machine learning specialist ver#4Machine learning specialist ver#4
Machine learning specialist ver#4
EPSILON AI INSTITUTE
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notes
Siva Ayyakutti
 
Week 4- Software Process models (Cont..).pptx
Week 4- Software Process models (Cont..).pptxWeek 4- Software Process models (Cont..).pptx
Week 4- Software Process models (Cont..).pptx
syedusama54
 
Technical debt management strategies
Technical debt management strategiesTechnical debt management strategies
Technical debt management strategies
Raquel Pau
 
Agile methodology in cloud computing
Agile methodology in cloud computingAgile methodology in cloud computing
Agile methodology in cloud computing
Ahmed M. Abed
 
prod-dev-management.pptx
prod-dev-management.pptxprod-dev-management.pptx
prod-dev-management.pptx
Michael Ming Lei
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
Traditional Process Models
Traditional Process ModelsTraditional Process Models
Traditional Process Models
Ahsan Rahim
 
Year13_SystemModelsmypresentationTechnology.ppt
Year13_SystemModelsmypresentationTechnology.pptYear13_SystemModelsmypresentationTechnology.ppt
Year13_SystemModelsmypresentationTechnology.ppt
AbhishekaVidyalankar
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
Mathieu Dumoulin
 
project planning components.pdf
project planning components.pdfproject planning components.pdf
project planning components.pdf
saman Iftikhar
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
What Is PLM and Why Is It Important
What Is PLM and Why Is It ImportantWhat Is PLM and Why Is It Important
What Is PLM and Why Is It Important
Elizabeth Steiner
 
Chapter 04 Basic OOAD Process_Software Eng.ppt
Chapter 04 Basic OOAD Process_Software Eng.pptChapter 04 Basic OOAD Process_Software Eng.ppt
Chapter 04 Basic OOAD Process_Software Eng.ppt
AhammadUllah3
 
Soft engg introduction and process models
Soft engg introduction and process modelsSoft engg introduction and process models
Soft engg introduction and process models
snehalkulkarni74
 
Machine learning in survey monkey
Machine learning in survey monkeyMachine learning in survey monkey
Machine learning in survey monkey
Da Kuang
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3
Azhar Shaik
 
Software engineering lecture notes
Software engineering lecture notesSoftware engineering lecture notes
Software engineering lecture notes
Siva Ayyakutti
 
Week 4- Software Process models (Cont..).pptx
Week 4- Software Process models (Cont..).pptxWeek 4- Software Process models (Cont..).pptx
Week 4- Software Process models (Cont..).pptx
syedusama54
 
Technical debt management strategies
Technical debt management strategiesTechnical debt management strategies
Technical debt management strategies
Raquel Pau
 
Agile methodology in cloud computing
Agile methodology in cloud computingAgile methodology in cloud computing
Agile methodology in cloud computing
Ahmed M. Abed
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
Traditional Process Models
Traditional Process ModelsTraditional Process Models
Traditional Process Models
Ahsan Rahim
 
Ad

Recently uploaded (20)

211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Mathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdfMathematical foundation machine learning.pdf
Mathematical foundation machine learning.pdf
TalhaShahid49
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
Ad

From science to engineering, the process to build a machine learning product

  • 1. From Science to Engineering, Process of a Machine Learning Product Bruce Kuo [email protected] !1
  • 2. Who Am I? • Bruce Kuo • Experience: • Yahoo software engineer in Data team and Global Search (2014-2017) • Codementor Data Scientist (2017 - 2019) !2
  • 3. Target Audience • Who is interested in machine learning product development • Junior / Mid Level machine learning engineers • Data scientists / engineers !3
  • 4. Goals of This Talk • Share the overview of a machine learning project • Share points between business problems and machine learning problems • Share engineering stuffs in a machine learning product !4
  • 5. Machine Learning Project Overview !5 Machine Learning Project Science Engineering Science Levels Research Steps Requirements Define Problem & Objectives Offline Evalution Solution Research Model Serialization ML Data Pipeline Model Serving Performance Tracking CI & Monitoring
  • 7. Two Different Science Levels • Unknown business problem • Example: do we need fast recommendation after user view a product page? • This is another topic • Known business problem, unknown solutions • Example: we have supply problem on matching algorithm, how can we improve conversion by recommendation. • More ML steps here !7 We focus on known business problem in this sharing. Unknown problem part is another story…
  • 8. Where ML Requirements Come From !8 Data Analysis We need a recommendation module! PM, Analysts
  • 9. Where ML Requirements Come From !9 Qualitative Analysis (User Feedback) We need to improve our tag suggestion module! PM, Designer, Sales, Marketing
  • 10. ML Science on Business • ML science of business problems are like “experimental science” • Different dataset will have different algorithms to solve / learn. • Designing experiments is important. !10
  • 11. ML Problem Steps • Goal: enhance a specific business metric !11 Define Problem & Objectives Solution Research & Experiments Define Evaluation Metrics
  • 12. Define the Problem • What is the business problem? • News triggering • Mentor matching • Which type of ML problem can be used to solve the business problem? • Classification? • Recommendation? • … !12
  • 13. Define Objectives • In algorithm, we focus on loss • 0/1 loss • Mean Square Error (MSE) • Mean Absolute Error (MAE) • Cross Entropy • … https://cloud.tencent.com/developer/article/1092365!13 • In business, we focus on business goal. • Interest rate • Conversion • CTR • …
  • 14. Design Offline Evaluation • After defining problem & objectives, we need to design offline evaluation. • Usually offline evaluation metrics are business goals (CTR, interest rate, …) • First version of data pipeline design and online evaluation design. • Provide confidence before we start integrating algorithm to online service. • Supervised offline evaluation is easy, unsupervised is hard. !14
  • 15. Solution Research !15 • Paper, paper, paper • Learning how to solve similar problems and how we can get idea from those solutions • Research areas of machine learning • For different purposes: classification / regression / clustering … • Algorithm optimization: which kind of gradient descend function is better
  • 16. Solution Research (Cont.) • In startup, we usually focus on high level parts because: • Tuning speed • Integration - need to choose mature implementation for better production usage, e.g., scikit-learn or keras. • Feature engineering is pretty important when we only select algorithms • Small goals on solution engineering - easy to retrain !16
  • 17. Example: Product Recommendation • Problem: Give an user, we want to recommend products to the user • Ranking problem or recommendation problem • Objectives: • Business Metrics: top-k interest rate • Loss function: dependent to our solution • Offline Metrics: We evaluate top-k interest rate as performance metrics after optimization • Solution research: Matrix Factorization, Factorization machine, Deep Learning, Learning to rank… !17
  • 19. Why Engineering Needed? • Model results should be used in your products • How? !19
  • 20. Need CI / Monitoring Model Training Data Pipeline Science Engineering Serialization API Serving From Science to Engineering First Step: Export Model !20
  • 21. Serialization - Export Model • Goal: serialize your model into binary file or general format, everyone can use this for prediction. • Different serialization methods for different algorithms but same interface in different machine learning packages • Scikit-learn: https://scikit-learn.org/stable/modules/model_persistence.html • Keras: https://jovianlin.io/saving-loading-keras-models/ • … • More low level design: http://dmg.org/pmml/products.html !21
  • 22. Serialization - Export Model • Example: how to serialize logistic regression model? • scikit-learn: joblib.dump(model, path) • From scratch: need to realize the model equation • Equation: • Only save , that is a linear weight vector, and we can calculate the prediction function. • PMML is trying to define serialization interface for each algorithm !22 Pr(Yi = y|Xi) = eβ*Xi*y 1 + eβ*Xi β
  • 23. Serialization - Export Latent Features • We extract hidden vectors to represent user / items • Extract photo features with auto encoder • Extract user features with matrix factorization … • 2 Ways to export latent features • Save model, e.g., auto-encoder • Save features vectors, e.g., matrix factorization vectors !23
  • 24. Example - Matrix Factorization !24 picture: https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/ Save by user Save by product
  • 25. How to Use Model Result? • Model predict in data API • Model predict in data pipeline !25
  • 26. Predict in Data API • Model predict in Data API • Prepare data in data pipeline • Data in request payload • Can provide realtime prediction • Latency is a challenge Data API Model user data Data Warehouse user data !26 Serving Database user features DataPipeline
  • 27. Predict in Data Pipeline • Model predict in Data Pipeline • Predict result in pipeline and save to database • Backend implements logics on their side • Better API speed • Lower flexibility Data API Model Data Warehouse user data Serving Database predict extract result Components !27 DataPipeline
  • 28. Other Concerns in Engineering • How long we need to provide model results to users? • How to handle data changes? • Online performance tracking • Monitoring • CI / CD Factors to design your pipeline !28
  • 29. Conclusion • The overview of a machine learning project • Points between business problems and machine learning problems • Engineering details in a machine learning project !29
  • 30. Q & A Thanks for Listening! !30