Recsys 2016 tutorial: Lessons learned from building real-life recommender systems

Lessons Learned from
Building real-life Recsys
Xavier Amatriain (Quora)
Deepak Agarwal (LinkedIn)

Our Mission
“To share and grow the world’s
knowledge”
• Millions of questions & answers
• Millions of users
• Thousands of topics
• ...

Demand
What we care about
Quality
Relevance

Lots of high-quality textual information

Recommendations at Quora
● Homepage feed ranking
● Email digest
● Answer ranking
● Topic recommendation
● User recommendation
● Trending Topics
● Automated Topic Labelling
● Related Question
● ...
click
upvote
downvote
expand
share

Models
● Deep Neural Networks
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision Trees
● Random Forests
● LambdaMART
● Matrix Factorization
● LDA
● ...
●

1. Implicitsignalsbeat
explicitones
(almostalways)

Implicit vs. Explicit
● Many have acknowledged
that implicit feedback is more
useful
● Is implicit feedback really always
more useful?
● If so, why?

● Implicit data is (usually):
○ More dense, and available for all users
○ Better representative of user behavior vs.
user reflection
○ More related to final objective function
○ Better correlated with AB test results
● E.g. Rating vs watching

● However
○ It is not always the case that
direct implicit feedback correlates
well with long-term retention
○ E.g. clickbait
● Solution:
○ Combine different forms of
implicit + explicit to better represent
long-term goal

2.bethoughtfulaboutyour
TrainingData

Defining training/testing data
● Training a simple binary classifier for
good/bad answer
○ Defining positive and negative labels ->
Non-trivial task
○ Is this a positive or a negative?
■ funny uninformative answer with many
upvotes
■ short uninformative answer by a well-known
expert in the field
■ very long informative answer that nobody
reads/upvotes
■ informative answer with grammar/spelling
mistakes
■ ...

3.YourModelwilllearn
whatyouteachittolearn

Training a model
● Model will learn according to:
○ Training data (e.g. implicit and explicit)
○ Target function (e.g. probability of user reading an answer)
○ Metric (e.g. precision vs. recall)
● Example 1 (made up):
○ Optimize probability of a user going to the cinema to
watch a movie and rate it “highly” by using purchase history
and previous ratings. Use NDCG of the ranking as final
metric using only movies rated 4 or higher as positives.

Example 2 - Quora’s feed
● Training data = implicit + explicit
● Target function: Value of showing a
story to a
user ~ weighted sum of actions:
v = ∑a
va
1{ya
= 1}
○ predict probabilities for each action, then compute expected
value: v_pred = E[ V | x ] = ∑a
va
p(a | x)
● Metric: any ranking metric

4.Explanationsmightmatter
morethantheprediction

Explanation/Support for Recommendations
Social Support

5.IfYouHavetoPickonesingleapproach,
Matrixfactorizationisyourbestbet

Matrix Factorization
● MF can be interpreted as
○ Unsupervised:
■ Dimensionality Reduction a la PCA
■ Clustering (e.g. NMF)
○ Supervised:
■ Labeled targets ~ regression
● Very useful variations of MF
○ BPR, ALS, SVD++
○ Tensor Factorization, Factorization Machines
● However...

Ensembles
● Netflix Prize was won by an ensemble
○ Initially Bellkor was using GDBTs
○ BigChaos introduced ANN-based ensemble
● Most practical applications of ML run an ensemble
○ Why wouldn’t you?
○ At least as good as the best of your methods
○ Can add completely different approaches
(e.g. CF and content-based)
○ You can use many different models at the
ensemble layer: LR, GDBTs, RFs, ANNs...

Ensembles & Feature Engineering
● Ensembles are the way to turn any model into a feature!
● E.g. Don’t know if the way to go is to use Factorization
Machines, Tensor Factorization, or RNNs?
○ Treat each model as a “feature”
○ Feed them into an ensemble

The Master Algorithm?
It definitely is an ensemble!

7.BuildingRecommenderSystemsisalso
aboutFeatureEngineering

Need for feature engineering
In many cases an understanding of the domain will lead to
optimal results.
Feature Engineering

Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
• truthful
• reusable
• provides explanation
• well formatted
• ...

Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
• Features that relate to the answer
quality itself
• Interaction features
(upvotes/downvotes, clicks,
comments…)
• User features (e.g. expertise in topic)

Feature Engineering
● Properties of a well-behaved
ML feature:
○ Reusable
○ Transformable
○ Interpretable
○ Reliable

8.Whyyoushouldcareabout
answeringquestions
(aboutyourrecsys)

Model debuggability
● Value of a model = value it brings to the product
● Product owners/stakeholders have expectations on
the product
● It is important to answer questions to why did
something fail
● Bridge gap between product design and ML algos
● Model debuggability is so important it can
determine:
○ Particular model to use
○ Features to rely on
○ Implementation of tools

Model debuggability
● E.g. Why am I seeing or not seeing
this on my homepage feed?

9.DataandModelsaregreat.Youknow
what’sevenbetter?
Therightevaluationapproach!

Offline/Online testing process

Executing A/B tests
● Measure differences in metrics across statistically identical
populations that each experience a different algorithm.
● Decisions on the product always data-driven
● Overall Evaluation Criteria (OEC) = member retention
○ Use long-term metrics whenever possible
○ Short-term metrics can be informative and allow faster decisions
■ But, not always aligned with OEC

Offline testing
● Measure model performance,
using (IR) metrics
● Offline performance = indication
to make decisions on follow-up
A/B tests
● A critical (and mostly unsolved)
issue is how offline metrics
correlate with A/B test results.

10.Youdon’tneedtodistributeyour
Recsys

Distributing Recommender Systems
● Most of what people do in practice can fit
into a multi-core machine
○ As long as you use:
■ Smart data sampling
■ Offline schemes
■ Efficient parallel code
● (… but not Deep ANNs)
● Do you care about costs? How about latencies or
system complexity/debuggability?

● Recommender Systems are about much more than
just predicting a rating
● Designing a “real-life” recsys means paying
attention to issues such as:
○ Feature engineering
○ Training dataset
○ Metrics
○ Experimentation and AB Testing
○ System scalability
○ ...
● Lots of room for improvement & research

Questions?
Xavier Amatriain (Quora)
xavier@amatriain.net
Deepak Agarwal (LinkedIn)
dagarwal@linkedin.com

Recsys 2016 tutorial: Lessons learned from building real-life recommender systems

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Recsys 2016 tutorial: Lessons learned from building real-life recommender systems (20)

More from Xavier Amatriain (17)

Recently uploaded (20)

Recsys 2016 tutorial: Lessons learned from building real-life recommender systems