SlideShare a Scribd company logo
Introduction to Recommender Systems
Fabio Petroni
About me
Fabio Petroni
Sapienza University of Rome, Italy
Current position:
PhD Student in Engineering in Computer Science
Research Interests:
data mining, machine learning, big data
petroni@dis.uniroma1.it
I slides available at
http://www.fabiopetroni.com/teaching
2 of 65
Materials
I Xavier Amatriain Lecture at Machine Learning Summer
School 2014, Carnegie Mellon University
B https://youtu.be/bLhq63ygoU8
B https://youtu.be/mRToFXlNBpQ
I Recommender Systems course by Rahul Sami at Michigan’s
Open University
B http://open.umich.edu/education/si/si583/winter2009
I Data Mining and Matrices Course by Rainer Gemulla at
University of Mannheim
B http://dws.informatik.uni-mannheim.de/en/teaching/courses-
for-master-candidates/ie-673-data-mining-and-matrices/
3 of 65
Age of discovery
The Age of Search has come to an end
• ... long live the Age of Recommendation!
• Chris Anderson in “The Long Tail”
• “We are leaving the age of information and entering the age
of recommendation”
• CNN Money, “The race to create a 'smart' Google”:
• “The Web, they say, is leaving the era of search and
entering one of discovery. What's the difference? Search is
what you do when you're looking for something. Discovery
is when something wonderful that you didn't know existed,
or didn't know how to ask for, finds you.”
4 of 65
Web Personalization & Recommender Systems
I Most of todays internet businesses deeply root their success
in the ability to provide users with strongly personalized
experiences.
I Recommender Systems are a particular type of personalized
Web-based applications that provide to users personalized
recommendations about content they may be interested in.
5 of 65
Example 1
6 of 65
Example 2
Example: Amazon
Recommendations
http://www.amazon.com/
7 of 65
Example 3
8 of 65
The tyranny of choice
Information overload
“People read around 10 MB worth of material a day, hear 400 MB a
day, and see 1 MB of information every second” - The Economist, November 2006
In 2015, consumption will raise to 74 GB a day - UCSD Study 2014
9 of 65
The value of recommendations
• Netflix: 2/3 of the movies watched are
recommended
• Google News: recommendations generate
38% more clickthrough
• Amazon: 35% sales from recommendations
• Choicestream: 28% of the people would buy
more music if they found what they liked.
u
10 of 65
Recommendation process
users
items
feedback
11 of 65
Input
Sources of information
• Explicit ratings on a numeric/ 5-star/3-star etc. scale
• Explicit binary ratings (thumbs up/thumbs down)
• Implicit information, e.g.,
– who bookmarked/linked to the item?
– how many times was it viewed?
– how many units were sold?
– how long did users read the page?
• Item descriptions/features
• User profiles/preferences
12 of 65
Methods of a aggregating inputs
I Content-based filtering
B recommendations based on item descriptions/features, and
profile or past behavior of the “target” user only.
I Collaborative filtering
B look at the ratings of like-minded users to provide
recommendations, with the idea that users who have expressed
similar interests in the past will share common interests in the
future.
13 of 65
Collaborative Filtering
I Collaborative Filtering (CF) represents today’s a widely
adopted strategy to build recommendation engines.
I CF analyzes the known preferences of a group of users to
make predictions of the unknown preferences for other users.
14 of 65
Collaborative filtering
I problem
B set of users
B set of items (movies, books, songs, ...)
B feedback
I explicit (ratings, ...)
I implicit (purchase, click-through, ...)
I predict the preference of each user for each item
B assumption: similar feedback $ similar taste
I example (explicit feedback):
Avatar The Matrix Up
Marco 4 2
Luca 3 2
Anna 5 3
15 of 65
Collaborative filtering
I problem
B set of users
B set of items (movies, books, songs, ...)
B feedback
I explicit (ratings, ...)
I implicit (purchase, click-through, ...)
I predict the preference of each user for each item
B assumption: similar feedback $ similar taste
I example (explicit feedback):
Avatar The Matrix Up
Marco ? 4 2
Luca 3 2 ?
Anna 5 ? 3
15 of 65
Collaborative filtering taxonomy
SVD PMF
user based PLS(A/I)
memory
based
collaborative
filtering
item based
model
based
probabilistic
methods
neighborhood
models
dimensionality
reduction
matrix
completion
latent
Dirichlet
allocation
other machine
learning
methods
Bayesian
networks
Markov
decision
processes
neural
networks
I Memory-based use the ratings to compute similarities
between users or items (the “memory" of the system) that are
successively exploited to produce recommendations.
I Model-based use the ratings to estimate or learn a model
and then apply this model to make rating predictions.
16 of 65
Memory based
neighborhood models
17 of 65
The CF Ingredients
● List of m Users and a list of n Items
● Each user has a list of items with associated opinion
○ Explicit opinion - a rating score
○ Sometime the rating is implicitly – purchase records
or listen to tracks
● Active user for whom the CF prediction task is
performed
● Metric for measuring similarity between users
● Method for selecting a subset of neighbors
● Method for predicting a rating for items not currently
rated by the active user.
18 of 65
Collaborative Filtering
The basic steps:
1. Identify set of ratings for the target/active user
2. Identify set of users most similar to the target/active user
according to a similarity function (neighborhood
formation)
3. Identify the products these similar users liked
4. Generate a prediction - rating that would be given by the
target user to the product - for each one of these products
5. Based on this predicted rating recommend a set of top N
products
19 of 65
User-based Collaborative Filtering
20 of 65
User-User Collaborative
Filtering
Target User
Weighted
Sum
21 of 65
UB Collaborative Filtering
● A collection of user ui
, i=1, …n and a collection
of products pj
, j=1, …, m
● An n × m matrix of ratings vij
, with vij
= ? if user
i did not rate product j
● Prediction for user i and product j is computed
as
• Similarity can be computed by Pearson correlation
or
or
22 of 65
23 of 65
24 of 65
25 of 65
26 of 65
27 of 65
Item-based Collaborative Filtering
28 of 65
Item-Item Collaborative
Filtering
29 of 65
Item Based CF Algorithm
● Look into the items the target user has rated
● Compute how similar they are to the target item
○ Similarity only using past ratings from other
users!
● Select k most similar items.
● Compute Prediction by taking weighted average
on the target user’s ratings on the most similar
items.
30 of 65
Item Similarity Computation
● Similarity between items i & j computed by finding
users who have rated them and then applying a
similarity function to their ratings.
● Cosine-based Similarity – items are vectors in the m
dimensional user space (difference in rating scale
between users is not taken into account).
31 of 65
Prediction Computation
● Generating the prediction – look into the target
users ratings and use techniques to obtain
predictions.
● Weighted Sum – how the active user rates the
similar items.
32 of 65
Item-based CF Example
33 of 65
Item-based CF Example
34 of 65
Item-based CF Example
35 of 65
Item-based CF Example
36 of 65
Item-based CF Example
37 of 65
Item-based CF Example
38 of 65
Performance Implications
● Bottleneck - Similarity computation.
● Time complexity, highly time consuming with
millions of users and items in the database.
○ Isolate the neighborhood generation and
predication steps.
○ “off-line component” / “model” – similarity
computation, done earlier & stored in memory.
○ “on-line component” – prediction generation
process.
39 of 65
Challenges Of User-based CF
Algorithms
● Sparsity – evaluation of large item sets, users purchases
are under 1%.
● Difficult to make predictions based on nearest neighbor
algorithms =>Accuracy of recommendation may be poor.
● Scalability - Nearest neighbor require computation that
grows with both the number of users and the number of
items.
● Poor relationship among like minded but sparse-rating
users.
● Solution : usage of latent models to capture similarity
between users & items in a reduced dimensional space.
40 of 65
Model based
dimensionality reduction
41 of 65
What we were interested in:
■ High quality recommendations
Proxy question:
■ Accuracy in predicted rating
■ Improve by 10% = $1million!
42 of 65
43 of 65
SVD/MF
X[n x m] = U[n x r] S [ r x r] (V[m x r])T
● X: m x n matrix (e.g., m users, n videos)
● U: m x r matrix (m users, r factors)
● S: r x r diagonal matrix (strength of each ‘factor’) (r: rank of the
matrix)
● V: r x n matrix (n videos, r factor)
44 of 65
Recap: Singular Value Decomposition
• SVD is useful in data analysis
Noise removal, visualization, dimensionality reduction, . . .
• Provides a mean to understand the hidden structure in the data
We may think of Ak and its factor matrices as a low-rank model
of the data:
• Used to capture the important aspects of the data
(cf. principal components)
• Ignores the rest
• Truncated SVD is best low-rank factorization of the data in
terms of Frobenius norm
• Truncated SVD Ak = Uk kV T
k of A thus satisfies
A Ak F = min
rank(B)=k
A B F
45 of 65
SVD problems
I complete input matrix: all entries available and considered
I large portion of missing values
I heuristics to pre-fill missing values
B item’s average rating
B missing values as zeros
46 of 65
Matrix completion
I Matrix completion techniques avoid the necessity of
pre-filling missing entries by reasoning only on the observed
ratings.
I They can be seen as an estimate or an approximation of the
SVD, computed using application specific optimization
criteria.
I Such solutions are currently considered as the best
single-model approach to collaborative filtering, as
demonstrated, for instance, by the Netflix prize.
47 of 65
Matrix completion for collaborative filtering
I the completion is driven by a factorization
R P Q
I associate a latent factor vector with each user and each item
I missing entries are estimated through the dot product
rij ⇡ piqj
48 of 65
Latent factor models (Koren et al., 2009)
49 of 65
Latent factor models
Discover latent factors (r = 1)
Avatar The Matrix Up
Anni 4 2
Bob 3 2
Charlie 5 3
50 of 65
Latent factor models
Discover latent factors (r = 1)
Avatar The Matrix Up
(2.24) (1.92) (1.18)
Anni 4 2
(1.98)
Bob 3 2
(1.21)
Charlie 5 3
(2.30)
51 of 65
Latent factor models
Discover latent factors (r = 1)
Avatar The Matrix Up
(2.24) (1.92) (1.18)
Anni 4 2
(1.98) (3.8) (2.3)
Bob 3 2
(1.21) (2.7) (2.3)
Charlie 5 3
(2.30) (5.2) (2.7)
Minimum loss
min
Q,P
(i,j)
(vij [QT
P]ij )2
52 of 65
Latent factor models
Discover latent factors (r = 1)
Avatar The Matrix Up
(2.24) (1.92) (1.18)
Anni ? 4 2
(1.98) (4.4) (3.8) (2.3)
Bob 3 2 ?
(1.21) (2.7) (2.3) (1.4)
Charlie 5 ? 3
(2.30) (5.2) (4.4) (2.7)
Minimum loss
min
Q,P
(i,j)
(vij [QT
P]ij )2
53 of 65
Latent factor models
Discover latent factors (r = 1)
Avatar The Matrix Up
(2.24) (1.92) (1.18)
Anni ? 4 2
(1.98) (4.4) (3.8) (2.3)
Bob 3 2 ?
(1.21) (2.7) (2.3) (1.4)
Charlie 5 ? 3
(2.30) (5.2) (4.4) (2.7)
Minimum loss
min
Q,P,u,m
(i,j)
(vij µ ui mj [QT
P]ij )2
Bias
54 of 65
Latent factor models
Discover latent factors (r = 1)
Avatar The Matrix Up
(2.24) (1.92) (1.18)
Anni ? 4 2
(1.98) (4.4) (3.8) (2.3)
Bob 3 2 ?
(1.21) (2.7) (2.3) (1.4)
Charlie 5 ? 3
(2.30) (5.2) (4.4) (2.7)
Minimum loss
min
Q,P,u,m
(i,j)
(vij µ ui mj [QT
P]ij )2
+ ( Q + P + u + m )
Bias, regularization
55 of 65
Latent factor models
Discover latent factors (r = 1)
Avatar The Matrix Up
(2.24) (1.92) (1.18)
Anni ? 4 2
(1.98) (4.4) (3.8) (2.3)
Bob 3 2 ?
(1.21) (2.7) (2.3) (1.4)
Charlie 5 ? 3
(2.30) (5.2) (4.4) (2.7)
Minimum loss
min
Q,P,u,m
(i,j,t) t
(vij µ ui (t) mj (t) [QT
(t)P]ij )2
+ ( Q(t) + P + u(t) + m(t) )
Bias, regularization, time, . . .
56 of 65
Example: Netflix prize data
Root mean square error of predictions
COVER FEATURE
M
collabor
mender
datasets s
data has
accuracy
nearest-n
the same
pact me
that syste
easily. W
niques ev
that mod
rally man
data, suc
feedback
and conf
40
60
90
128
180
50
100
200
50
100
200
100
200 500
50
100 200 500 1,000
1,500
0.875
0.88
0.885
0.89
0.895
0.9
0.905
0.91
10 100 1,000 10,000 100,000
Millionsofparameters
RMSE
Plain
With biases
With implicit feedback
With temporal dynamics (v.1)
With temporal dynamics (v.2)
Figure 4. Matrix factorization models’ accuracy. The plots show the root-mean-square 57 of 65
Another matrix
58 of 65
Matrix reconstruction (unregularized)
59 of 65
Matrix reconstruction (unregularized)
60 of 65
Matrix reconstruction (unregularized)
61 of 65
Matrix reconstruction (unregularized)
62 of 65
Stochastic gradient descent
I parameters ⇥ = {P, Q}
I find minimum ⇥⇤
of loss
function L
I pick a starting point ⇥0
I iteratively update current
estimations for ⇥
6
7
0 5 10 15 20 25 30
loss
(×
10
7
)
iterations
⇥n+1 ⇥n ⌘
@L
@⇥
I learning rate ⌘
I an update for each given training point
63 of 65
Stochastic updates
Lij (P, Q) = (rij pi qj )2
I SGD to minimize the squared loss iteratively computes:
pi pi ⌘
@Lij (P, Q)
@pi
= pi + ⌘("ij · qj )
qj qj ⌘
@Lij (P, Q)
@qj
= qj + ⌘("ij · pi )
I where "ij = rij pi qj
64 of 65
Suggested reading
I G. Linden, B. Smith, and J. York. Amazon.com recommendations:
Item-to-item collaborative filtering. Internet Computing, IEEE,
7(1):76–80, 2003.
I Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques
for recommender systems. Computer, 42(8):30–37, 2009.
I X. Su and T. M. Khoshgoftaar. A survey of collaborative filtering
techniques. Advances in Artificial Intelligence, 2009:4, 2009.
I F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender
systems handbook. Springer, 2011.
I M. D. Ekstrand, J. T. Riedl, and J. A. Konstan. Collaborative filtering
recommender systems. Foundations and Trends in Human-Computer
Interaction, 4(2):81–173, 2011.
I J. A. Konstan and J. Riedl. Recommender systems: from algorithms to
user experience. User Modeling and User-Adapted Interaction,
22(1-2):101–123, 2012.
65 of 65

More Related Content

Similar to IntroductionRecommenderSystems_Petroni.pdf (20)

PPTX
Lecture Notes on Recommender System Introduction
PerumalPitchandi
 
PDF
Recommender systems
Vivek Murugesan
 
PDF
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...
Rajasekar Nonburaj
 
PDF
Big data certification training mumbai
TejaspathiLV
 
PDF
Top data science institutes in hyderabad
prathyusha1234
 
PDF
Best data science courses in pune
prathyusha1234
 
PDF
best online data science courses
prathyusha1234
 
PDF
Overview of recommender system
Stanley Wang
 
PPT
Cs583 recommender-systems
Aravindharamanan S
 
PPTX
Recommenders Systems
Tariq Hassan
 
PPTX
Collaborative Filtering Recommendation System
Milind Gokhale
 
PPTX
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
PDF
Recommendation Systems
Cenk Bircanoğlu
 
PPTX
Recommender Systems
Girish Khanzode
 
PPT
CS583-recommender-systems.ppt
ArfatAhmadKhan1
 
PDF
Recommender Systems
Francesco Casalegno
 
PDF
Real-world News Recommender Systems
kib_83
 
PPTX
Recommendation system
Ding Li
 
PDF
[系列活動] 人工智慧與機器學習在推薦系統上的應用
台灣資料科學年會
 
PPTX
Movie Recommender System Using Artificial Intelligence
Shrutika Oswal
 
Lecture Notes on Recommender System Introduction
PerumalPitchandi
 
Recommender systems
Vivek Murugesan
 
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...
Rajasekar Nonburaj
 
Big data certification training mumbai
TejaspathiLV
 
Top data science institutes in hyderabad
prathyusha1234
 
Best data science courses in pune
prathyusha1234
 
best online data science courses
prathyusha1234
 
Overview of recommender system
Stanley Wang
 
Cs583 recommender-systems
Aravindharamanan S
 
Recommenders Systems
Tariq Hassan
 
Collaborative Filtering Recommendation System
Milind Gokhale
 
Recommender systems: Content-based and collaborative filtering
Viet-Trung TRAN
 
Recommendation Systems
Cenk Bircanoğlu
 
Recommender Systems
Girish Khanzode
 
CS583-recommender-systems.ppt
ArfatAhmadKhan1
 
Recommender Systems
Francesco Casalegno
 
Real-world News Recommender Systems
kib_83
 
Recommendation system
Ding Li
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
台灣資料科學年會
 
Movie Recommender System Using Artificial Intelligence
Shrutika Oswal
 

Recently uploaded (20)

PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PPTX
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Introduction to Probability(basic) .pptx
purohitanuj034
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PDF
EXCRETION-STRUCTURE OF NEPHRON,URINE FORMATION
raviralanaresh2
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
PPTX
Digital Professionalism and Interpersonal Competence
rutvikgediya1
 
PPTX
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
PPTX
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
Introduction to Probability(basic) .pptx
purohitanuj034
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
EXCRETION-STRUCTURE OF NEPHRON,URINE FORMATION
raviralanaresh2
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
Virus sequence retrieval from NCBI database
yamunaK13
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
Digital Professionalism and Interpersonal Competence
rutvikgediya1
 
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 7-20-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Ad

IntroductionRecommenderSystems_Petroni.pdf

  • 1. Introduction to Recommender Systems Fabio Petroni
  • 2. About me Fabio Petroni Sapienza University of Rome, Italy Current position: PhD Student in Engineering in Computer Science Research Interests: data mining, machine learning, big data [email protected] I slides available at http://www.fabiopetroni.com/teaching 2 of 65
  • 3. Materials I Xavier Amatriain Lecture at Machine Learning Summer School 2014, Carnegie Mellon University B https://youtu.be/bLhq63ygoU8 B https://youtu.be/mRToFXlNBpQ I Recommender Systems course by Rahul Sami at Michigan’s Open University B http://open.umich.edu/education/si/si583/winter2009 I Data Mining and Matrices Course by Rainer Gemulla at University of Mannheim B http://dws.informatik.uni-mannheim.de/en/teaching/courses- for-master-candidates/ie-673-data-mining-and-matrices/ 3 of 65
  • 4. Age of discovery The Age of Search has come to an end • ... long live the Age of Recommendation! • Chris Anderson in “The Long Tail” • “We are leaving the age of information and entering the age of recommendation” • CNN Money, “The race to create a 'smart' Google”: • “The Web, they say, is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.” 4 of 65
  • 5. Web Personalization & Recommender Systems I Most of todays internet businesses deeply root their success in the ability to provide users with strongly personalized experiences. I Recommender Systems are a particular type of personalized Web-based applications that provide to users personalized recommendations about content they may be interested in. 5 of 65
  • 9. The tyranny of choice Information overload “People read around 10 MB worth of material a day, hear 400 MB a day, and see 1 MB of information every second” - The Economist, November 2006 In 2015, consumption will raise to 74 GB a day - UCSD Study 2014 9 of 65
  • 10. The value of recommendations • Netflix: 2/3 of the movies watched are recommended • Google News: recommendations generate 38% more clickthrough • Amazon: 35% sales from recommendations • Choicestream: 28% of the people would buy more music if they found what they liked. u 10 of 65
  • 12. Input Sources of information • Explicit ratings on a numeric/ 5-star/3-star etc. scale • Explicit binary ratings (thumbs up/thumbs down) • Implicit information, e.g., – who bookmarked/linked to the item? – how many times was it viewed? – how many units were sold? – how long did users read the page? • Item descriptions/features • User profiles/preferences 12 of 65
  • 13. Methods of a aggregating inputs I Content-based filtering B recommendations based on item descriptions/features, and profile or past behavior of the “target” user only. I Collaborative filtering B look at the ratings of like-minded users to provide recommendations, with the idea that users who have expressed similar interests in the past will share common interests in the future. 13 of 65
  • 14. Collaborative Filtering I Collaborative Filtering (CF) represents today’s a widely adopted strategy to build recommendation engines. I CF analyzes the known preferences of a group of users to make predictions of the unknown preferences for other users. 14 of 65
  • 15. Collaborative filtering I problem B set of users B set of items (movies, books, songs, ...) B feedback I explicit (ratings, ...) I implicit (purchase, click-through, ...) I predict the preference of each user for each item B assumption: similar feedback $ similar taste I example (explicit feedback): Avatar The Matrix Up Marco 4 2 Luca 3 2 Anna 5 3 15 of 65
  • 16. Collaborative filtering I problem B set of users B set of items (movies, books, songs, ...) B feedback I explicit (ratings, ...) I implicit (purchase, click-through, ...) I predict the preference of each user for each item B assumption: similar feedback $ similar taste I example (explicit feedback): Avatar The Matrix Up Marco ? 4 2 Luca 3 2 ? Anna 5 ? 3 15 of 65
  • 17. Collaborative filtering taxonomy SVD PMF user based PLS(A/I) memory based collaborative filtering item based model based probabilistic methods neighborhood models dimensionality reduction matrix completion latent Dirichlet allocation other machine learning methods Bayesian networks Markov decision processes neural networks I Memory-based use the ratings to compute similarities between users or items (the “memory" of the system) that are successively exploited to produce recommendations. I Model-based use the ratings to estimate or learn a model and then apply this model to make rating predictions. 16 of 65
  • 19. The CF Ingredients ● List of m Users and a list of n Items ● Each user has a list of items with associated opinion ○ Explicit opinion - a rating score ○ Sometime the rating is implicitly – purchase records or listen to tracks ● Active user for whom the CF prediction task is performed ● Metric for measuring similarity between users ● Method for selecting a subset of neighbors ● Method for predicting a rating for items not currently rated by the active user. 18 of 65
  • 20. Collaborative Filtering The basic steps: 1. Identify set of ratings for the target/active user 2. Identify set of users most similar to the target/active user according to a similarity function (neighborhood formation) 3. Identify the products these similar users liked 4. Generate a prediction - rating that would be given by the target user to the product - for each one of these products 5. Based on this predicted rating recommend a set of top N products 19 of 65
  • 23. UB Collaborative Filtering ● A collection of user ui , i=1, …n and a collection of products pj , j=1, …, m ● An n × m matrix of ratings vij , with vij = ? if user i did not rate product j ● Prediction for user i and product j is computed as • Similarity can be computed by Pearson correlation or or 22 of 65
  • 31. Item Based CF Algorithm ● Look into the items the target user has rated ● Compute how similar they are to the target item ○ Similarity only using past ratings from other users! ● Select k most similar items. ● Compute Prediction by taking weighted average on the target user’s ratings on the most similar items. 30 of 65
  • 32. Item Similarity Computation ● Similarity between items i & j computed by finding users who have rated them and then applying a similarity function to their ratings. ● Cosine-based Similarity – items are vectors in the m dimensional user space (difference in rating scale between users is not taken into account). 31 of 65
  • 33. Prediction Computation ● Generating the prediction – look into the target users ratings and use techniques to obtain predictions. ● Weighted Sum – how the active user rates the similar items. 32 of 65
  • 40. Performance Implications ● Bottleneck - Similarity computation. ● Time complexity, highly time consuming with millions of users and items in the database. ○ Isolate the neighborhood generation and predication steps. ○ “off-line component” / “model” – similarity computation, done earlier & stored in memory. ○ “on-line component” – prediction generation process. 39 of 65
  • 41. Challenges Of User-based CF Algorithms ● Sparsity – evaluation of large item sets, users purchases are under 1%. ● Difficult to make predictions based on nearest neighbor algorithms =>Accuracy of recommendation may be poor. ● Scalability - Nearest neighbor require computation that grows with both the number of users and the number of items. ● Poor relationship among like minded but sparse-rating users. ● Solution : usage of latent models to capture similarity between users & items in a reduced dimensional space. 40 of 65
  • 43. What we were interested in: ■ High quality recommendations Proxy question: ■ Accuracy in predicted rating ■ Improve by 10% = $1million! 42 of 65
  • 45. SVD/MF X[n x m] = U[n x r] S [ r x r] (V[m x r])T ● X: m x n matrix (e.g., m users, n videos) ● U: m x r matrix (m users, r factors) ● S: r x r diagonal matrix (strength of each ‘factor’) (r: rank of the matrix) ● V: r x n matrix (n videos, r factor) 44 of 65
  • 46. Recap: Singular Value Decomposition • SVD is useful in data analysis Noise removal, visualization, dimensionality reduction, . . . • Provides a mean to understand the hidden structure in the data We may think of Ak and its factor matrices as a low-rank model of the data: • Used to capture the important aspects of the data (cf. principal components) • Ignores the rest • Truncated SVD is best low-rank factorization of the data in terms of Frobenius norm • Truncated SVD Ak = Uk kV T k of A thus satisfies A Ak F = min rank(B)=k A B F 45 of 65
  • 47. SVD problems I complete input matrix: all entries available and considered I large portion of missing values I heuristics to pre-fill missing values B item’s average rating B missing values as zeros 46 of 65
  • 48. Matrix completion I Matrix completion techniques avoid the necessity of pre-filling missing entries by reasoning only on the observed ratings. I They can be seen as an estimate or an approximation of the SVD, computed using application specific optimization criteria. I Such solutions are currently considered as the best single-model approach to collaborative filtering, as demonstrated, for instance, by the Netflix prize. 47 of 65
  • 49. Matrix completion for collaborative filtering I the completion is driven by a factorization R P Q I associate a latent factor vector with each user and each item I missing entries are estimated through the dot product rij ⇡ piqj 48 of 65
  • 50. Latent factor models (Koren et al., 2009) 49 of 65
  • 51. Latent factor models Discover latent factors (r = 1) Avatar The Matrix Up Anni 4 2 Bob 3 2 Charlie 5 3 50 of 65
  • 52. Latent factor models Discover latent factors (r = 1) Avatar The Matrix Up (2.24) (1.92) (1.18) Anni 4 2 (1.98) Bob 3 2 (1.21) Charlie 5 3 (2.30) 51 of 65
  • 53. Latent factor models Discover latent factors (r = 1) Avatar The Matrix Up (2.24) (1.92) (1.18) Anni 4 2 (1.98) (3.8) (2.3) Bob 3 2 (1.21) (2.7) (2.3) Charlie 5 3 (2.30) (5.2) (2.7) Minimum loss min Q,P (i,j) (vij [QT P]ij )2 52 of 65
  • 54. Latent factor models Discover latent factors (r = 1) Avatar The Matrix Up (2.24) (1.92) (1.18) Anni ? 4 2 (1.98) (4.4) (3.8) (2.3) Bob 3 2 ? (1.21) (2.7) (2.3) (1.4) Charlie 5 ? 3 (2.30) (5.2) (4.4) (2.7) Minimum loss min Q,P (i,j) (vij [QT P]ij )2 53 of 65
  • 55. Latent factor models Discover latent factors (r = 1) Avatar The Matrix Up (2.24) (1.92) (1.18) Anni ? 4 2 (1.98) (4.4) (3.8) (2.3) Bob 3 2 ? (1.21) (2.7) (2.3) (1.4) Charlie 5 ? 3 (2.30) (5.2) (4.4) (2.7) Minimum loss min Q,P,u,m (i,j) (vij µ ui mj [QT P]ij )2 Bias 54 of 65
  • 56. Latent factor models Discover latent factors (r = 1) Avatar The Matrix Up (2.24) (1.92) (1.18) Anni ? 4 2 (1.98) (4.4) (3.8) (2.3) Bob 3 2 ? (1.21) (2.7) (2.3) (1.4) Charlie 5 ? 3 (2.30) (5.2) (4.4) (2.7) Minimum loss min Q,P,u,m (i,j) (vij µ ui mj [QT P]ij )2 + ( Q + P + u + m ) Bias, regularization 55 of 65
  • 57. Latent factor models Discover latent factors (r = 1) Avatar The Matrix Up (2.24) (1.92) (1.18) Anni ? 4 2 (1.98) (4.4) (3.8) (2.3) Bob 3 2 ? (1.21) (2.7) (2.3) (1.4) Charlie 5 ? 3 (2.30) (5.2) (4.4) (2.7) Minimum loss min Q,P,u,m (i,j,t) t (vij µ ui (t) mj (t) [QT (t)P]ij )2 + ( Q(t) + P + u(t) + m(t) ) Bias, regularization, time, . . . 56 of 65
  • 58. Example: Netflix prize data Root mean square error of predictions COVER FEATURE M collabor mender datasets s data has accuracy nearest-n the same pact me that syste easily. W niques ev that mod rally man data, suc feedback and conf 40 60 90 128 180 50 100 200 50 100 200 100 200 500 50 100 200 500 1,000 1,500 0.875 0.88 0.885 0.89 0.895 0.9 0.905 0.91 10 100 1,000 10,000 100,000 Millionsofparameters RMSE Plain With biases With implicit feedback With temporal dynamics (v.1) With temporal dynamics (v.2) Figure 4. Matrix factorization models’ accuracy. The plots show the root-mean-square 57 of 65
  • 64. Stochastic gradient descent I parameters ⇥ = {P, Q} I find minimum ⇥⇤ of loss function L I pick a starting point ⇥0 I iteratively update current estimations for ⇥ 6 7 0 5 10 15 20 25 30 loss (× 10 7 ) iterations ⇥n+1 ⇥n ⌘ @L @⇥ I learning rate ⌘ I an update for each given training point 63 of 65
  • 65. Stochastic updates Lij (P, Q) = (rij pi qj )2 I SGD to minimize the squared loss iteratively computes: pi pi ⌘ @Lij (P, Q) @pi = pi + ⌘("ij · qj ) qj qj ⌘ @Lij (P, Q) @qj = qj + ⌘("ij · pi ) I where "ij = rij pi qj 64 of 65
  • 66. Suggested reading I G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collaborative filtering. Internet Computing, IEEE, 7(1):76–80, 2003. I Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009. I X. Su and T. M. Khoshgoftaar. A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009:4, 2009. I F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook. Springer, 2011. I M. D. Ekstrand, J. T. Riedl, and J. A. Konstan. Collaborative filtering recommender systems. Foundations and Trends in Human-Computer Interaction, 4(2):81–173, 2011. I J. A. Konstan and J. Riedl. Recommender systems: from algorithms to user experience. User Modeling and User-Adapted Interaction, 22(1-2):101–123, 2012. 65 of 65