0% found this document useful (0 votes)
5 views

Lecture 1_Collaborative Filtering

The lecture focuses on collaborative filtering recommender systems, discussing their definition, input data, and algorithms for measuring similarity and predicting ratings. It categorizes recommender systems into collaborative, content-based, knowledge-based, and hybrid types, with a primary focus on memory-based collaborative filtering. Key concepts include user-item interaction data, similarity measures, and methods for predicting unobserved ratings.

Uploaded by

rhwkdsk125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 1_Collaborative Filtering

The lecture focuses on collaborative filtering recommender systems, discussing their definition, input data, and algorithms for measuring similarity and predicting ratings. It categorizes recommender systems into collaborative, content-based, knowledge-based, and hybrid types, with a primary focus on memory-based collaborative filtering. Key concepts include user-item interaction data, similarity measures, and methods for predicting unobserved ratings.

Uploaded by

rhwkdsk125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

DIGB368 Lecture 1:

Collaborative Filtering
Tae-Sub Yun
Department of Digital Business, Korea University
[email protected]

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 1 ]


Lecture Objectives
In today's class, we aim to ...

learn about collaborative filtering recommender systems, covering the following topics:

• Definition of Collaborative Filtering?

• Input Data for Collaborative Filtering


• User-item interaction data

• Collaborative Filtering Algorithms


• Measuring similarity
• Predicting ratings

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 2 ]


Classification of Recommender Systems
Recommender Systems

Collaborative Content-based Knowledge-based Hybrid


Filtering

Memory-based Model-based

User- Item- Matrix Deep


based based Clustering Factorization Learning

In this lecture, we will focus primarily on Memory-Based Collaborative Filtering (CF).

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 3 ]


Basic Models of Recommender Systems
Recommender systems work with two kinds of data!

• User-item interactions data:


Data of actions performed by users in relation to items
ex. ratings, buying behavior User-item interactions data example

• Attribute information data:


Data related to the attributes of users or items
ex. textual profiles, relevant keywords

Recommender systems are categorized based on the utilized data!

Collaborative filtering models are a type of


recommender system that uses “user-item interaction data”.

Attribute information data example


Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 4 ]
Collaborative Filtering Models
Collaborative Filtering (CF) Model
1. Viewed by
• Utilize the collaborative power of the both users
user-item interactions provided by multiple users

• Three steps in CF recommender system (User-based)

1. Get User-item interactions data


Obtain user viewing history data
2. Identify
similar users
2. Identify similar users
For each user, identify similar users who share
similar video-watching patterns.
3. Viewed by her,
recommended to him
3. Make recommendations
recommend videos that similar users have watched,
Example of collaborative filtering model using
but the target user hasn't seen yet
YouTube watching history data

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 5 ]


Tabular Representation of CF
Video 1 Video 2 Video 3 Video 4 Video 5

User: Andy O O

User: Jessica O O

User: Bob O O O

User: Sophia O O O

( ※ The "O" symbol indicates that the user has already viewed that video. )
Questions

• Who is the user with the most similar video preferences to user Andy?

• Which video is the most recommended to user Andy?

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 6 ]


CF based on Ratings
Movie 1 Movie 2 Movie 3 Movie 4 Movie 5

User: Andy 4 5 1 ??

User: Jenny 4 4

User: Mike 3 4 1 2

User: Sally 4 1 2 4

( ※ Users rate movies on a scale of 1 to 5. )


Questions

• Who is the user with the most similar video preferences to user Andy?

• What is the expected rating from user Andy for video 5? (Is video 5 worth recommending?)

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 7 ]


User-Item Interaction Data
Types of user-item interaction

• Explicit ratings:
Users provide direct feedback on individual items through numerical ratings
ex. liking/disliking videos on YouTube, movie ratings

• Implicit feedback:
User preferences are indirectly estimated through user behavior
ex. Buying behavior, time spent on a website, clicked on detailed information

Noises in Implicit feedback

• There is a possibility of misinterpretation


ex. accidental clicks, purchases made as gifts (not reflecting the user's own preferences)

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 8 ]


Types of Ratings
Ratings can be defined in a variety of ways, depending on the application at hand!

• Unary ratings:
specify a positive preference for an item, but there is no mechanism to specify negative preference.
ex. “like” button on Facebook

• Binary ratings:
only two options are present, corresponding to positive or negative response.
ex. either like or dislike on YouTube Music

• Interval-based ratings:
representing preferences through discrete numbers within a specific range.
ex. 5-point, 10-point rating system for movies

• Continuous ratings:
representing preferences through continuous numbers within a specific range.

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 9 ]


Notations of Rating Matrices
Rating

• 𝑟 , : rating of user 𝑢 for item 𝑗

Ratings matrix

• 𝑚 × 𝑛 matrix, containing 𝑚 users and 𝑛 items, denoted by 𝑅

𝑟, ⋯ 𝑟,
• 𝑅 = 𝑟 , = ⋮ ⋱ ⋮
𝑟 , … 𝑟 ,

• The 𝑢 row contains a collection of ratings for individual items


by user 𝑢

• The 𝑗 column is a collection of ratings by users for item 𝑗

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 10 ]


Problem Formulation of CF
The problem formulation of collaborative filtering (CF)

• Assumption:
incomplete 𝑚 × 𝑛 matrix 𝑅 = 𝑟 ,
→ only a small subset of the rating matrix is specified (or observed)

• Primitive problem:
predicting the missing (unobserved) rating values of a user-item rating matrix.

• Advanced problem:
Determining the top-k items or top-k users.
→ equivalent to the problem of selecting the top-k from the expected ratings.

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 11 ]


Predictions for Unobserved Ratings
There are two types of methods used in CF recommender system.

• Memory-based methods:
directly utilize the ratings matrix for predicting unobserved ratings
(also referred to as neighborhood-based methods)

1. User-based methods:
aggregating ratings or preferences from similar users
→ the goal is to find users similar to the target user

2. Item-based Methods:
aggregating ratings or preferences from similar items
→the goal is to find items similar to the ones the target user has interacted with

• Model-based methods:
Indirectly utilize the ratings matrix to estimate a representative model

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 12 ]


Similarity Measures
Ratings database for collaborative recommendation
6 User: Andy User: Jenny User: Bob
Item 1 Item 2 Item 3 Item 4
5

User: 4
5 3 4 4

Ratings
Andy 3

User: 2
3 1 2 3
Jenny
1
User: 0
1 5 5 2
Bob Item 1 Item 2 Item 3 Movie 4

Questions

• Who is the user with the most similar preferences to user Andy?

• Can you guess a way to numerically compare similarity between users?

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 13 ]


Euclidean Distance
Similarity measure using Euclidean distance Item 1 Item 2 Item 3 Item 4

• Preference difference (gap) Andy 5 3 4 4


The preference difference for item 𝑗 between Jenny 3 1 2 3
two users (Andy and Jenny) is as follows:
Bob 1 5 5 2
𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 = 𝑟 , −𝑟 ,

• Euclidean distance
The differences for each item can be combined into a single value using the Euclidean distance.
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑎𝑛𝑑𝑦, 𝑗𝑒𝑛𝑛𝑦 = ∑ ∈ , , , 𝑟 , −𝑟 ,

= 5−3 + 3−1 + 4−2 + 4−3 = 13 = 3.61


Questions

• What is the similarity, using Euclidean distance, between Andy and Bob?

• In terms of similarity using Euclidean distance, who has preferences closer to Andy's?

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 14 ]


Cosine Similarity in 2D
Item 2
Item 1 Item 2 Item 3 Item 4 Rating vector
Bob (1,5)
Andy 5 3 4 4 Euclidean distance
(Andy, Bob)
Jenny 3 1 2 3
Rating vector
Bob 1 5 5 2 Andy (5,3)

How to calculate 𝒄𝒐𝒔𝒊𝒏𝒆(𝜽) between the rating vectors Cosine Similarity


(Andy, Bob)
For two vectors A and B in n-dimensional space:
• Cosine similarity
𝜽
𝑐𝑜𝑠𝑖𝑛𝑒 𝐴, 𝐵 = Rating vector
Jenny (3,1)

• Dot product (𝐴 𝐵) Item 1


𝐴 𝐵 = ∑ ∈{ , ,…, }(𝐴 𝐵)
𝐴 𝐵
𝑐𝑜𝑠𝑖𝑛𝑒 𝐴𝑛𝑑𝑦, 𝐵𝑜𝑏 =
• Euclidean norm ( 𝐴 , 𝐵 ) 𝐴 𝐵
5 1 + (3 5) 20
𝐴 = ∑ 𝐴 = = = 0.67
∈{ , ,…, } 5 +3 1 +5 34 26

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 15 ]


Cosine Similarity
Cosine similarity in n-dimension Item 1 Item 2 Item 3 Item 4

• Magnitude of rating vectors Andy 5 3 4 4


The magnitude of each rating vectors is calculated Jenny 3 1 2 3
using the Euclidean norm formula.
Bob 1 5 5 2

𝑟𝑎𝑡𝑖𝑛𝑔𝑠 = ∑ ∈{ , , , } 𝑟 , = 5 + 3 + 4 + 4 = 66 = 8.12
𝑟𝑎𝑡𝑖𝑛𝑔𝑠 = 3 + 1 + 2 + 3 = 23 = 4.80

• Cosine similarity
The calculation of cosine similarity between two rating vectors is as follow:

∑ ∈{ , ,…, }( , , )
𝑐𝑜𝑠𝑖𝑛𝑒 𝑎𝑛𝑑𝑦, 𝑗𝑒𝑛𝑛𝑦 = = = = 0.97
. .

Questions

• What is the cosine similarity between Andy and Bob?

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 16 ]


Pearson’s Correlation Coefficient
Pearson’s correlation coefficient Item 1 Item 2 Item 3 Item 4

• Average rating Andy 5 3 4 4


Average rating values are used to remove Jenny 3 1 2 3
user- or item-specific biases.
Bob 1 5 5 2
∑ ∈{ , , , } ,
𝜇 = ∑ ∈{ , , , }
= =4
Questions
𝜇 = = 2.25
• What is the Pearson similarity between Andy and Bob?
• Pearson similarity
The calculation of Pearson similarity between two rating vectors is as follow:

∑ ∈{ , ,…, }( , )( , )
𝑃𝑒𝑎𝑟𝑠𝑜𝑛 𝑎𝑛𝑑𝑦, 𝑗𝑒𝑛𝑛𝑦 =
∑ ∈ , ,…, , ∑ ∈ , ,…, ,
. . . ( )( . )
= = .
= 0.85
. . . .

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 17 ]


Similarity in Incomplete Ratings Matrix
Movie 1 Movie 2 Movie 3 Movie 4 Movie 5
User: Andy 4 5 1 2
User: Jenny 4 4 1
User: Mike 3 4 1 2

Similarity calculation in an incomplete ratings matrix

• Set of item indices:


denote the set of item indices for which ratings have been specified by user 𝑢.

𝐼 = 1, 2, 3, 4 ; 𝐼 = 3, 4, 5 ; 𝐼 = {1, 2, 4, 5} → 𝐼 ∩𝐼 = {3, 4}
∑ ∈
• Similarity (Pearson):
,
𝜇 =
∑ ∈ ∩ ( , )( , )
𝑃𝑒𝑎𝑟𝑠𝑜𝑛 𝑎𝑛𝑑𝑦, 𝑗𝑒𝑛𝑛𝑦 =
∑ ∈ ∩ , ∑ ∈ ∩ ,

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 18 ]


Jaccard Similarity
Movie 1 Movie 2 Movie 3 Movie 4 Movie 5
User: Andy O O O O
User: Jenny O O O
User: Mike O O O O

Similarity calculation for non-ratings matrix

• Set of item indices (repeated):


denote the set of item indices for which ratings have been specified by user 𝑢.

𝐼 = 1, 2, 3, 4 ; 𝐼 = 3, 4, 5 ; 𝐼 = 1, 2, 4, 5

• Jaccard similarity:


𝐽𝑎𝑐𝑐ard andy, jenny = = = 0.4

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 19 ]


Finding Neighbors with Similar Tastes
User-user similarity computation between user 3 and other users

Item Item Item Item Item Item Avg.


𝒄𝒐𝒔𝒊𝒏𝒆(𝒊, 𝟑) 𝑷𝒆𝒂𝒓𝒔𝒐𝒏 𝒊, 𝟑
1 2 3 4 5 6 rating
User 1 7 6 7 4 5 4 5.5 0.956 0.894
User 2 6 7 - 4 3 4 4.8 0.981 0.939
User 3 - 3 3 1 1 - 2 1.0 1.0
User 4 1 2 2 3 3 4 2.5 0.789 -1.0
User 5 1 - 1 2 3 3 2 0.645 -0.817

Questions

• Who is the user with similar preferences to User 3 from a cosine similarity perspective?

• Who is the user with similar preferences to User 3 from a Pearson similarity perspective?

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 20 ]


Predicting Unobserved Ratings
Item Item Item Item Item Item
𝒄𝒐𝒔𝒊𝒏𝒆(𝒊, 𝟑) 𝑷𝒆𝒂𝒓𝒔𝒐𝒏 𝒊, 𝟑
1 2 3 4 5 6
User 1 7 6 7 4 5 4 0.956 0.894 Neighbors:
user with similar
User 2 6 7 - 4 3 4 0.981 0.939 preferences
User 3 ?? 3 3 1 1 ?? 1.0 1.0
User 4 1 2 2 3 3 4 0.789 -1.0
User 5 1 - 1 2 3 3 0.645 -0.817

Predicting unobserved (missing) ratings

• Hat notation:
The hat notation “^” on top of 𝑟 , indicates a predicted rating.

𝑟̂ , : The predicted rating for User 3 on Item 1


𝑟̂ , : The predicted rating for User 3 on Item 6

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 21 ]


Raw Rating Prediction
Item Item Item Item Item Item
𝒄𝒐𝒔𝒊𝒏𝒆(𝒊, 𝟑) 𝑷𝒆𝒂𝒓𝒔𝒐𝒏 𝒊, 𝟑
1 2 3 4 5 6
User 1 7 6 7 4 5 4 0.956 0.894 Neighbors:
user with similar
User 2 6 7 - 4 3 4 0.981 0.939 preferences
User 3 ?? 3 3 1 1 ?? 1.0 1.0
User 4 1 2 2 3 3 4 0.789 -1.0
User 5 1 - 1 2 3 3 0.645 -0.817

Raw rating prediction:


predicting unrated items using the raw ratings of neighbors.
※ Notations
Based on cosine similarity: 𝑁(= 1, 2 ):
set of user with similar tastes to the target user.
∑ ∈ , ∗ , ∗ . ∗ .
𝑟̂ , = ∑ ∈ ,
= . .
= 6.49
𝑏 ∈ 𝑁:
∑ ∈ , ∗
𝑟̂ , = ,
=
∗ . ∗ .
=4 𝑏 represents an user belonging to the set N.
∑ ∈ , . .

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 22 ]


Does something seem odd?
Unobserved rating prediction result with raw ratings of neighbors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 𝒄𝒐𝒔𝒊𝒏𝒆(𝒊, 𝟑) 𝑷𝒆𝒂𝒓𝒔𝒐𝒏 𝒊, 𝟑

User 1 7 6 7 4 5 4 0.956 0.894

User 2 6 7 - 4 3 4 0.981 0.939

User 3 6.49 3 3 1 1 4 1.0 1.0

User 4 1 2 2 3 3 4 0.789 -1.0

User 5 1 - 1 2 3 3 0.645 -0.817

Questions

• Can you guess what's strange?

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 23 ]


Mean-Centered Prediction
Item Item Item Item Item Item Avg.
𝒄𝒐𝒔𝒊𝒏𝒆(𝒊, 𝟑) 𝑷𝒆𝒂𝒓𝒔𝒐𝒏 𝒊, 𝟑
1 2 3 4 5 6 rating
User 1 7 6 7 4 5 4 5.5 0.956 0.894
User 2 6 7 - 4 3 4 4.8 0.981 0.939
User 3 ?? 3 3 1 1 ?? 2 1.0 1.0

Mean-centered prediction:
predicting unrated items using the mean-centered ratings of neighbors. ※ Average rating (recap)
∑∈ 𝑟,
• Mean-centered ratings: 𝑠 , =𝑟 , −𝜇 𝜇 =
𝐼

Based on cosine similarity:

∑ ∈ , ∗ , ( . )∗ . ( . )∗ .
𝑟̂ , =𝜇 + ∑ ∈
=2+ = 3.35
, . .
∑ ∈ , ∗ , ( . )∗ . ( . )∗ .
𝑟̂ , =𝜇 + ∑ ∈
=2+ = 0.85
, . .

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 24 ]


Actual Recommendation
Unobserved rating prediction result with mean-centered ratings of neighbors

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 𝒄𝒐𝒔𝒊𝒏𝒆(𝒊, 𝟑) 𝑷𝒆𝒂𝒓𝒔𝒐𝒏 𝒊, 𝟑

User 1 7 6 7 4 5 4 0.956 0.894

User 2 6 7 - 4 3 4 0.981 0.939

User 3 3.35 3 3 1 1 0.85 1.0 1.0

User 4 1 2 2 3 3 4 0.789 -1.0

User 5 1 - 1 2 3 3 0.645 -0.817

Questions

• What item would you recommend to User 3?

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 25 ]


In-Class Activities
Item-based collaborative filtering case:

Item 1 Item 2 Item 3 Item 4 Item 5 Item 6


User 1 7 6 7 4 5 4
User 2 6 7 - 4 3 4
User 3 ?? 3 3 1 1 ??
User 4 1 2 2 3 3 4
User 5 1 - 1 2 3 3
Avg. rating ?? ?? ?? ?? ?? ??
𝒄𝒐𝒔𝒊𝒏𝒆 𝟏, 𝒋
?? ?? ?? ?? ?? ??
(item-item)
𝒄𝒐𝒔𝒊𝒏𝒆 𝟔, 𝒋
?? ?? ?? ?? ?? ??
(item-item)

To-do: Please fill in the correct values in the cells marked with ??.

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 26 ]


Wrap up!
What we’ve learned In today's class ...

• Definition of Collaborative Filtering


→ Recommendation technique
that predicts a user's preferences based on the past preferences data.

• Input Data for Collaborative Filtering


• User-item interaction data
→ Records of actions performed by users in relation to items,
such as ratings or buying behavior.

• Collaborative Filtering Algorithms


• Types of CF → memory-based (user-based, item-based), model-based
• Measuring similarity → cosine similarity, Pearson similarity, …
• Predicting ratings → raw rating prediction, mean-centered prediction

Copyright ⓒ 2024 by Tae-Sub Yun, Dept. of Digital Business, Korea University [ 27 ]

You might also like