From science to engineering, the process to build a machine learning product

From Science to Engineering,

Process of a Machine Learning Product
Bruce Kuo

bruce3557@gmail.com
!1

Who Am I?
• Bruce Kuo

• Experience:

• Yahoo software engineer in Data team
and Global Search (2014-2017)

• Codementor Data Scientist
(2017 - 2019)
!2

Target Audience
• Who is interested in machine learning product development

• Junior / Mid Level machine learning engineers

• Data scientists / engineers
!3

Goals of This Talk
• Share the overview of a machine learning project

• Share points between business problems and machine learning problems

• Share engineering stuffs in a machine learning product
!4

Machine Learning Project Overview
!5
Machine Learning
Project
Science Engineering
Science
Levels
Research
Steps
Requirements
Deﬁne Problem &
Objectives
Ofﬂine
Evalution
Solution
Research
Model
Serialization
ML Data
Pipeline
Model

Serving
Performance
Tracking
CI &
Monitoring

Two Different Science Levels
• Unknown business problem

• Example: do we need fast
recommendation after user view a
product page?

• This is another topic
• Known business problem, unknown solutions

• Example: we have supply problem on
matching algorithm, how can we improve
conversion by recommendation.

• More ML steps here

!7
We focus on known business problem in this sharing.
Unknown problem part is another story…

Where ML Requirements Come From
!8
Data Analysis
We need a
recommendation
module!
PM, Analysts

Where ML Requirements Come From
!9
Qualitative Analysis

(User Feedback)
We need to improve
our tag suggestion
module!
PM, Designer, Sales, Marketing

ML Science on Business
• ML science of business problems are like “experimental science”

• Different dataset will have different algorithms to solve / learn.

• Designing experiments is important.
!10

ML Problem Steps
• Goal: enhance a specific business metric
!11
Define Problem &
Objectives
Solution Research &
Experiments
Define Evaluation
Metrics

Deﬁne the Problem
• What is the business problem?

• News triggering

• Mentor matching

• Which type of ML problem can be used to solve the business problem?

• Classiﬁcation?

• Recommendation?

• …
!12

Deﬁne Objectives
• In algorithm, we focus on loss

• 0/1 loss

• Mean Square Error (MSE)

• Mean Absolute Error (MAE)

• Cross Entropy

• …
https://cloud.tencent.com/developer/article/1092365!13
• In business, we focus on
business goal.

• Interest rate

• Conversion

• CTR

• …

Design Offline Evaluation
• After defining problem & objectives, we need to design offline evaluation.

• Usually offline evaluation metrics are business goals (CTR, interest rate, …)

• First version of data pipeline design and online evaluation design.

• Provide confidence before we start integrating algorithm to online service.

• Supervised offline evaluation is easy, unsupervised is hard.
!14

Solution Research
!15
• Paper, paper, paper

• Learning how to solve similar problems
and how we can get idea from those
solutions

• Research areas of machine learning

• For different purposes: classiﬁcation /
regression / clustering …

• Algorithm optimization: which kind of
gradient descend function is better

Solution Research (Cont.)
• In startup, we usually focus on high level parts because:

• Tuning speed

• Integration - need to choose mature implementation for better
production usage, e.g., scikit-learn or keras.

• Feature engineering is pretty important when we only select
algorithms

• Small goals on solution engineering - easy to retrain
!16

Example: Product Recommendation
• Problem: Give an user, we want to recommend products to the user

• Ranking problem or recommendation problem

• Objectives:

• Business Metrics: top-k interest rate

• Loss function: dependent to our solution

• Ofﬂine Metrics: We evaluate top-k interest rate as performance metrics after optimization

• Solution research: Matrix Factorization, Factorization machine, Deep Learning, Learning to
rank…
!17

Why Engineering Needed?
• Model results should be used in your products

• How?
!19

Need CI / Monitoring
Model Training Data Pipeline
Science Engineering
Serialization
API Serving
From Science to Engineering
First Step: Export Model
!20

Serialization - Export Model
• Goal: serialize your model into binary ﬁle or general format, everyone can use
this for prediction.

• Different serialization methods for different algorithms but same interface in
different machine learning packages

• Scikit-learn: https://scikit-learn.org/stable/modules/model_persistence.html

• Keras: https://jovianlin.io/saving-loading-keras-models/

• …

• More low level design: http://dmg.org/pmml/products.html
!21

Serialization - Export Model
• Example: how to serialize logistic regression model?

• scikit-learn: joblib.dump(model, path)

• From scratch: need to realize the model equation

• Equation:

• Only save , that is a linear weight vector, and we can calculate the
prediction function.

• PMML is trying to deﬁne serialization interface for each algorithm
!22
Pr(Yi = y|Xi) =
eβ*Xi*y
1 + eβ*Xi
β

Serialization - Export Latent Features
• We extract hidden vectors to represent user / items

• Extract photo features with auto encoder

• Extract user features with matrix factorization …

• 2 Ways to export latent features

• Save model, e.g., auto-encoder

• Save features vectors, e.g., matrix factorization vectors
!23

Example - Matrix Factorization
!24
picture: https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/
Save by user
Save by product

How to Use Model Result?
• Model predict in data API

• Model predict in data pipeline
!25

Predict in Data API
• Model predict in Data API

• Prepare data in data pipeline

• Data in request payload

• Can provide realtime prediction

• Latency is a challenge
Data API
Model
user data
Data Warehouse
user data
!26
Serving Database
user features
DataPipeline

Predict in Data Pipeline
• Model predict in Data Pipeline

• Predict result in pipeline and save to
database

• Backend implements logics on their
side

• Better API speed

• Lower ﬂexibility
Data API
Model
Data Warehouse
user data
Serving Database
predict
extract result
Components
!27
DataPipeline

Other Concerns in Engineering
• How long we need to provide
model results to users?

• How to handle data changes?

• Online performance tracking

• Monitoring

• CI / CD
Factors to design your
pipeline
!28

Conclusion
• The overview of a machine learning project

• Points between business problems and machine learning problems

• Engineering details in a machine learning project
!29

Q & A

Thanks for
Listening!
!30

From science to engineering, the process to build a machine learning product

Recommended

More Related Content

What's hot (7)

Similar to From science to engineering, the process to build a machine learning product (20)

Recently uploaded (20)

From science to engineering, the process to build a machine learning product