0% found this document useful (0 votes)
3 views

Module 6_Link Analysis Recommendation Systems.pptx

The document discusses recommendation systems, focusing on their models, including content-based recommendations and collaborative filtering. It explains how these systems utilize user profiles and community behavior to suggest items of interest, highlighting approaches like personalized, social, and item recommendations. Additionally, it presents case studies of Amazon and Google to illustrate the practical applications of these technologies in enhancing user experience.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module 6_Link Analysis Recommendation Systems.pptx

The document discusses recommendation systems, focusing on their models, including content-based recommendations and collaborative filtering. It explains how these systems utilize user profiles and community behavior to suggest items of interest, highlighting approaches like personalized, social, and item recommendations. Additionally, it presents case studies of Amazon and Google to illustrate the practical applications of these technologies in enhancing user experience.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Recommendation Systems

Contents
• A Model for Recommendation Systems
• Content-Based Recommendations
• Collaborative Filtering
Why I Get Recommendations???
Introduction
• An information filtering technology, commonly
used on e-commerce Web sites that uses
a collaborative filtering to present information on
items and products that are likely to be of interest
to the reader.

• In presenting the recommendations, the


recommender system will use details of the
registered user's profile and opinions and habits of
their whole community of users and compare the
information to reference characteristics to present
the recommendations.
Background
• Recommenders are instances of personalization
software.
• Personalization concerns adapting to the individual
needs, interests, and preferences of each user.
• Includes:
– Recommending
– Filtering
– Predicting

From a business perspective, it is viewed as part of


Customer Relationship Management (CRM).
4 Main Approaches To Recommendations

1. Personalized recommendation - recommend


things based on the individual's past behavior
2. Social recommendation - recommend things
based on the past behavior of similar users
3. Item recommendation - recommend things
based on the item itself
4. A combination of the three approaches
above
HOW ???
Collect Information
Information used for recommendations can
come from different sources:
• browsing and searching data
• purchase data
• feedback explicitly provided by the users
• textual comments
• expert recommendations
• demographic data

22
Recommendation technologies
Information retrieval (IR) systems:
• allow users to express queries to retrieve information
relevant to a topic of interest or fulfil an information need
• they are not useful in the actual recommendation process
• they cannot capture any information about the users’
preferences
• they cannot retrieve documents based on opinions or quality
as they are text-based

To address these issues two techniques have been developed:


• Content-based filtering (Information filtering)
• Collaborative-based filtering

23
Case Study 1:Amazon: King of Recommendations
• Amazon used all 3 approaches (personalized,
social and item). Amazon's system is very
sophisticated, but at heart all of its
recommendations "are based on individual
behavior, plus either the item itself or behavior
of other people on Amazon."
• What's more, the aim of it all is to get you to
add more things to your shopping cart.
Case Study 2 :Google: Focus on Personalized
Recommendations
• The most successful Internet company of this era has without a doubt been
Google. It too has been using recommendation technologies to improve its
core search product.
• There are two ways that Google does this:
• 1) Google customizes your search results "when possible" based on
your location and/or recent search activity
• 2) When you're signed in to your Google Account, you "may see even
more relevant, useful results based on your web history.“

• So Google is using both your location and your personal search history to
make its search results supposedly stronger. This is very much the
'personalized recommendation' approach –

• However, the two other types of recommendation are also present in Google's
core search product:
• Google's search algorithm PageRank is basically dependent on social
recommendations - i.e. who links to a webpage;
• Google also does item recommendations with its "Did you mean" feature.
• Google News, its start page iGoogle, and its ecommerce site Froogle all have
recommendation features.
Contents of the Chapter
• A Model for Recommendation Systems
1. Content-Based Recommendations
2. Collaborative Filtering
A Model for Recommendation Systems

Formal Model
• X = set of Customers
• S = set of Items
• Utility function u: X × S 🡪 R
– R = set of ratings
– R is a totally ordered set
– e.g., 0-5 stars, real number in [0,1]
Utility Matrix
Avatar LOTR Matrix Pirates

Alice

Bob

Carol

David

J. Leskovec, A. Rajaraman, J. Ullman: Mining


28
of Massive Datasets, http://www.mmds.org
Key Problems
• (1) Gathering “known” ratings for matrix
– How to collect the data in the utility matrix

• (2) Extrapolate unknown ratings from the


known ones
– Mainly interested in high unknown ratings
• We are not interested in knowing what you don’t like
but what you like

• (3) Evaluating extrapolation methods


– How to measure success/performance of
recommendation methods
J. Leskovec, A. Rajaraman, J. Ullman: Mining
29
of Massive Datasets, http://www.mmds.org
(1) Gathering Ratings
• Explicit
– Ask people to rate items
– Doesn’t work well in practice – people
can’t be bothered

• Implicit
– Learn ratings from user actions
• E.g., purchase implies high rating

J. Leskovec, A. Rajaraman, J. Ullman: Mining


30
of Massive Datasets, http://www.mmds.org
A Model for Recommendation Systems
• A goal of a recommendation system is to
predict the blanks in the utility matrix.
Content-Based Recommendations

• Content Based Systems focus on properties of


items .Similarity of items is determined by
measuring the similarity in their properties.
Content-Based Recommendations
Item Profile
• In a content-based system, we must construct for each item a profile, which is
• a record or collection of records representing important characteristics of that
• item.
• In simple cases, the profile consists of some characteristics of the item
• that are easily discovered.

• For example, consider the features of a movie that might be relevant to a


recommendation system.
• 1. The set of actors of the movie. Some viewers prefer movies with their
• favourite actors.
• 2. The director. Some viewers have a preference for the work of certain
• directors.
• 3. The year in which the movie was made. Some viewers prefer old movies;
• others watch only the latest releases.
• 4. The genre or general type of movie. Some viewers like only comedies,
• others dramas or romances.
Content-Based Recommendations
Discovering Features of Documents
• There are other classes of items where it is not
immediately apparent what the values of features should
be.
• Ex . 1 .document collections 2. images.
• There are many kinds of documents for which a
recommendation system can be useful.
• For example, there are many news articles published each
day, and we cannot read all of them.
• A recommendation system can suggest articles on topics a
user is interested in, but how can we distinguish among
topics?
Content-Based Recommendations
Discovering Features of Documents
• Web pages are also a collection of documents. Can we
suggest pages a user might want to see?
• Likewise, blogs could be recommended to interested
users, if we could classify blogs by topics
Content-Based Recommendations
Discovering Features of Documents
• Steps :
• 1. Eliminate the stop words –the several
hundred most common words –which say
little about the documents.
• 2. For remaining word calculate TF. IDF score
for each word in the document.

• Pick n words with highest TF.IDF as features of


the document.
Content-Based Recommendations

• Stop words
• Commonly used words that are excluded from searches to
help index and parse web pages faster.

• Some examples of stop words are: "a", "and", "but", "how",


"or", and "what."

• While the majority of all Internet search engines utilize stop


words, they do not prevent a user from using them, but
they are ignored.
• For example, if you were to search for "What is a
motherboard?" on Computer Hope, the search engine
would only look for the term "motherboard."
Content-Based Recommendations

• TF- IDF -- Short for term frequency–inverse


document frequency, is a numerical statistic
that is intended to reflect how important a
word is to a document in a collection or
corpus.
Content-Based Recommendations
Obtaining Items Features from Tags
• Example : Images

• The problem with images is that their data,


typically an array of pixels, does not tell us
anything useful about their features.

• There have been a number of attempts to obtain


information about features of items by inviting
users to tag the items by entering words or
phrases that describe the item.
Content-Based Recommendations
Content-Based Recommendations
Content-Based Recommendations
Representing Item Profile
• Our ultimate goal for content-based
recommendation is to create both an item
profile consisting of feature-value pairs and a
user profile summarizing the preferences of
the user, based of their row of the utility
matrix.
User Profile
• We not only need to create vectors describing
items; we need to create vectors with the
same components that describe the user’s
preferences. We have the utility matrix
representing the connection between users
and items.
User Profile
• Suppose items are movies, represented by
boolean profiles with components
corresponding to actors.
• Also, the utility matrix has a 1 if the user has
seen the movie and is blank otherwise. If 20%
of the movies that user U likes have Salman
Khan as one of the actors, then the user
profile for U will have 0.2 in the component
for Salman Khan
• On the other hand, user V gives an average
rating of 4, and has also rated three movies
with Salman Khan .User V gives these three
movies ratings of 2, 3, and 5.
• The user profile for V has, in the component
for Salman Khan, the average of 2 − 4, 3 − 4,
and 5 − 4, that is, the value −2/3.
• Consider the same movie information as in
Example but now suppose the utility matrix has
nonblank entries that are ratings in the 1–5
• range. Suppose user U gives an average rating of
3. There are three movies
• with Julia Roberts as an actor, and those movies
got ratings of 3, 4, and 5.
• Then in the user profile of U, the component for
Julia Roberts will have value
• that is the average of 3 − 3, 4 − 3, and 5 − 3, that
is, a value of 1.
Pros: Content-based Approach
• +: No need for data on other users
– No cold-start or sparsity problems
• +: Able to recommend to users with
unique tastes
• +: Able to provide explanations
– Can provide explanations of recommended items by
listing content-features that caused an item to be
recommended

50
Cons: Content-based Approach
• –: Finding the appropriate features is hard
– E.g., images, movies, music
• –: Recommendations for new users
– How to build a user profile?
• –: Overspecialization
– Never recommends items outside user’s
content profile
– People might have multiple interests
– Unable to exploit quality judgments of other users

51
Collaborative Filtering
• A significantly different approach to
recommendation.

• Instead of using features of items to


determine their similarity, we focus on the
similarity of the user ratings for two items.
Collaborative Filtering
• Consider user x

• Find set N of other x


users whose ratings
are “similar” to
x’s ratings N

• Estimate x’s ratings


based on ratings
of users in N
J. Leskovec, A. Rajaraman, J. Ullman: Mining
53
of Massive Datasets, http://www.mmds.org
rx = [*, _, _, *, ***]
ry = [*, _, **, **, _]
Finding “Similar” Users
• rx, ry as sets:
rx = {1, 4, 5}
ry = {1, 3, 4}

rx, ry as points:
rx = {1, 0, 0, 1, 3}
ry = {1, 0, 2, 2, 0}

54
rx = [*, _, _, *, ***]
ry = [*, _, **, **, _]
Finding “Similar” Users
• rx, ry as sets:
rx = {1, 4, 5}
ry = {1, 3, 4}

rx, ry as points:
rx = {1, 0, 0, 1, 3}
ry = {1, 0, 2, 2, 0}

J. Leskovec, A. Rajaraman, J. Ullman: Mining rx, ry … avg.


55
of Massive Datasets, http://www.mmds.org rating of x, y
Cosine sim:
Similarity Metric

• The first question we must deal with is how to


measure similarity of users or items from their rows
or columns in the utility matrix.

• Observe specifically the users A and C. They rated two movies


in common, but they appear to have almost diametrically
opposite opinions of these movies. We would expect that a
good distance measure would make them rather far apart.
56
Jaccard
Distance

• A and B have an intersection of size 1 and a union of size 5.


• Thus, their Jaccard similarity is 1/5, and their Jaccard
distance is 4/5; i.e. they are very far apart.

• In comparison, A and C have a Jaccard similarity of 2/4, so


their Jaccard distance is the same, 1/2. Thus, A appears
closer to C than to B.

• Yet that conclusion seems intuitively wrong. A and C


disagree on the two movies they both watched, while A and
B seem both to have liked the one movie they watched in
common. 2
Cosine
Distance

• The cosine of the angle between A and B is


A={ 4,0,0,5,1,0,0} => A.A ={4X4 + 5X5 +1X1}= 42
B= {5,5,4,0,0,0,0} => B.B={5X5+5X5+4X4}=66
A.B= { 4 X5} =>20
= A.B / || A.A|| X || B.B||
= 20/ 6.48 x 8.12 = 0.381
Cosine
Distance

• The cosine of the angle between A and C is


A={ 4,0,0,5,1,0,0} => A.A ={4X4 + 5X5 +1X1}= 42
C= {0,0,0,2,4,5,0} => B.B={2X2+4X4+5x5}= 45
A.C= {5X2 +1X4} =>14
= A.C/ || A.A|| X|| C.C||
= 20/ 6.48 x 6.70 = 0.322
Cosine sim:
Similarity Metric

• Intuitively we want: sim(A, B) > sim(A, C)


• Jaccard similarity: 1/5 < 2/4
• Cosine similarity: 0.386 > 0.322

J. Leskovec, A. Rajaraman, J. Ullman: 60


Solution : Rounding the Data

• Try to eliminate the apparent similarity between movies a user rates highly
and those with low scores by rounding the ratings.
• For instance, we could consider ratings of 3, 4, and 5 as a “1” and consider
ratings 1 and 2 as unrated.
• The utility matrix would then look as in

• Now, the Jaccard distance between A and B is 3/4, while between A and C
it is 1; i.e., C appears further from A than B does, which is intuitively
correct. Applying cosine distance to us to draw the same conclusion.
Solution 2 :Normalizing Ratings

• If we normalize ratings, by subtracting from each rating the


average rating of that user, we turn low ratings into negative
numbers and high ratings into positive numbers.

• Cosine distance, we find that users with opposite views of


the movies they viewed in common will have vectors in
almost opposite directions, and can be considered as far
apart as possible.

• However, users with similar opinions about the movies


rated in common will have a relatively small angle between
them.
sim A,B vs. A,C:
0.092 > -0.559
Rating Predictions

64
Item-Item Collaborative Filtering
• So far: User-user collaborative filtering
• Another view: Item-item
– For item i, find other similar items
– Estimate rating for item i based
on ratings for similar items
– Can use same similarity metrics and
prediction functions as in user-user model

sij… similarity of items i and j


rxj…rating of user u on item j
N(i;x)… set items rated by x similar to i
65
Pros/Cons of Collaborative Filtering
• + Works for any kind of item
– No feature selection needed
• - Cold Start:
– Need enough users in the system to find a match
• - Sparsity:
– The user/ratings matrix is sparse
– Hard to find users that have rated the same items
• - First rater:
– Cannot recommend an item that has not been
previously rated
– New items, Esoteric items
• - Popularity bias:
– Cannot recommend items to someone with
unique taste
– Tends to recommend popular items 66
Hybrid Approach
• CF + CB
• Content based system
– Maintain user profile based on content analysis
• Collaborative system
– Directly compare profiles to determine similar
users for recommendation
Summary
• Recommender systems allow online retailers to
customize their sites to meet consumer tastes.
– Aid browsing, suggest related items.
• Personaliztion is one of e-commerce’s advantages
compared to brick-and-mortar stores.
• Challenges: obtaining and mining data, making
intelligent and novel recommendations, ethics.
• Can perform comparisons across users or across
items.
– Trade off data needed versus detail of
recommendation.

You might also like