Lecture 1_Collaborative Filtering
Lecture 1_Collaborative Filtering
Collaborative Filtering
Tae-Sub Yun
Department of Digital Business, Korea University
[email protected]
learn about collaborative filtering recommender systems, covering the following topics:
Memory-based Model-based
User: Andy O O
User: Jessica O O
User: Bob O O O
User: Sophia O O O
( ※ The "O" symbol indicates that the user has already viewed that video. )
Questions
• Who is the user with the most similar video preferences to user Andy?
User: Andy 4 5 1 ??
User: Jenny 4 4
User: Mike 3 4 1 2
User: Sally 4 1 2 4
• Who is the user with the most similar video preferences to user Andy?
• What is the expected rating from user Andy for video 5? (Is video 5 worth recommending?)
• Explicit ratings:
Users provide direct feedback on individual items through numerical ratings
ex. liking/disliking videos on YouTube, movie ratings
• Implicit feedback:
User preferences are indirectly estimated through user behavior
ex. Buying behavior, time spent on a website, clicked on detailed information
• Unary ratings:
specify a positive preference for an item, but there is no mechanism to specify negative preference.
ex. “like” button on Facebook
• Binary ratings:
only two options are present, corresponding to positive or negative response.
ex. either like or dislike on YouTube Music
• Interval-based ratings:
representing preferences through discrete numbers within a specific range.
ex. 5-point, 10-point rating system for movies
• Continuous ratings:
representing preferences through continuous numbers within a specific range.
Ratings matrix
𝑟, ⋯ 𝑟,
• 𝑅 = 𝑟 , = ⋮ ⋱ ⋮
𝑟 , … 𝑟 ,
• Assumption:
incomplete 𝑚 × 𝑛 matrix 𝑅 = 𝑟 ,
→ only a small subset of the rating matrix is specified (or observed)
• Primitive problem:
predicting the missing (unobserved) rating values of a user-item rating matrix.
• Advanced problem:
Determining the top-k items or top-k users.
→ equivalent to the problem of selecting the top-k from the expected ratings.
• Memory-based methods:
directly utilize the ratings matrix for predicting unobserved ratings
(also referred to as neighborhood-based methods)
1. User-based methods:
aggregating ratings or preferences from similar users
→ the goal is to find users similar to the target user
2. Item-based Methods:
aggregating ratings or preferences from similar items
→the goal is to find items similar to the ones the target user has interacted with
• Model-based methods:
Indirectly utilize the ratings matrix to estimate a representative model
User: 4
5 3 4 4
Ratings
Andy 3
User: 2
3 1 2 3
Jenny
1
User: 0
1 5 5 2
Bob Item 1 Item 2 Item 3 Movie 4
Questions
• Who is the user with the most similar preferences to user Andy?
• Euclidean distance
The differences for each item can be combined into a single value using the Euclidean distance.
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑎𝑛𝑑𝑦, 𝑗𝑒𝑛𝑛𝑦 = ∑ ∈ , , , 𝑟 , −𝑟 ,
• What is the similarity, using Euclidean distance, between Andy and Bob?
• In terms of similarity using Euclidean distance, who has preferences closer to Andy's?
𝑟𝑎𝑡𝑖𝑛𝑔𝑠 = ∑ ∈{ , , , } 𝑟 , = 5 + 3 + 4 + 4 = 66 = 8.12
𝑟𝑎𝑡𝑖𝑛𝑔𝑠 = 3 + 1 + 2 + 3 = 23 = 4.80
• Cosine similarity
The calculation of cosine similarity between two rating vectors is as follow:
∑ ∈{ , ,…, }( , , )
𝑐𝑜𝑠𝑖𝑛𝑒 𝑎𝑛𝑑𝑦, 𝑗𝑒𝑛𝑛𝑦 = = = = 0.97
. .
Questions
∑ ∈{ , ,…, }( , )( , )
𝑃𝑒𝑎𝑟𝑠𝑜𝑛 𝑎𝑛𝑑𝑦, 𝑗𝑒𝑛𝑛𝑦 =
∑ ∈ , ,…, , ∑ ∈ , ,…, ,
. . . ( )( . )
= = .
= 0.85
. . . .
𝐼 = 1, 2, 3, 4 ; 𝐼 = 3, 4, 5 ; 𝐼 = {1, 2, 4, 5} → 𝐼 ∩𝐼 = {3, 4}
∑ ∈
• Similarity (Pearson):
,
𝜇 =
∑ ∈ ∩ ( , )( , )
𝑃𝑒𝑎𝑟𝑠𝑜𝑛 𝑎𝑛𝑑𝑦, 𝑗𝑒𝑛𝑛𝑦 =
∑ ∈ ∩ , ∑ ∈ ∩ ,
𝐼 = 1, 2, 3, 4 ; 𝐼 = 3, 4, 5 ; 𝐼 = 1, 2, 4, 5
• Jaccard similarity:
∩
𝐽𝑎𝑐𝑐ard andy, jenny = = = 0.4
∪
Questions
• Who is the user with similar preferences to User 3 from a cosine similarity perspective?
• Who is the user with similar preferences to User 3 from a Pearson similarity perspective?
• Hat notation:
The hat notation “^” on top of 𝑟 , indicates a predicted rating.
Questions
Mean-centered prediction:
predicting unrated items using the mean-centered ratings of neighbors. ※ Average rating (recap)
∑∈ 𝑟,
• Mean-centered ratings: 𝑠 , =𝑟 , −𝜇 𝜇 =
𝐼
∑ ∈ , ∗ , ( . )∗ . ( . )∗ .
𝑟̂ , =𝜇 + ∑ ∈
=2+ = 3.35
, . .
∑ ∈ , ∗ , ( . )∗ . ( . )∗ .
𝑟̂ , =𝜇 + ∑ ∈
=2+ = 0.85
, . .
Questions
To-do: Please fill in the correct values in the cells marked with ??.