0% found this document useful (0 votes)

32 views

Evaluating Recommender Sytems

Uploaded by

riya.munjal.ug21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Evaluating Recommender Sytems

Uploaded by

riya.munjal.ug21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

Evaluating

Recommender Sytems
EVALUATION METRICS

• Mean Average precision at K

• It gives how much relevant is the list of recommended items. Here precision at K means Recommended items
in top k sets that are relevant.
• 2. Coverage
• It is the percentage of items in the training data model able to recommend in test sets. Or simply, the
percentage of a possible recommendation system can predict.
• 3. Personalization
• It is basically how many same items the model recommends to different users. Or, the dissimilarity between
users lists and recommendations.
• 4. Intralist Similarity
• It is an average cosine similarity of all items in a list of recommendations.
• Common Metrics Used
• Predictive accuracy metrics, classification accuracy metrics, rank accuracy metrics, and non-accuracy
measurements are the four major types of evaluation metrics for recommender systems.

• Predictive Accuracy Metrics

• Predictive accuracy or rating prediction measures address the subject of how near a recommender’s estimated
ratings are to genuine user ratings. This sort of measure is widely used for evaluating non-binary ratings.

• It is best suited for usage scenarios in which accurate prediction of ratings for all products is critical. Mean
Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Normalized
Mean Absolute Error (NMAE) are the most important measures for this purpose.
Offline recommender system
methodology
Cosine similarity
• To compute the similarity between a purchased item and
the new item for an item-centered system, we simply
take the cosine between 2 vectors representing those
items.
• Cosine similarity is the best match if there are many
high-dimensional features, especially in text mining.
Jaccard similarity

• Jaccard similarity is the size of the intersection divided

by the size of the union of two sets of items.
Top-K parameter

• The K parameter is the evaluation cutoff point. It

represents the number of top-ranked items to evaluate.
For example, you can focus on the quality of top-10
recommendations.
• To evaluate a recommendation or ranking system, you need:
• The model predictions. They include the ranked list of
user-item pairs. The complete dataset also contains features
that describe users or items. You’ll need them for some of
the metrics.
• The ground truth. You need to know the actual user-item
relevance to evaluate the quality of predictions. This might
be a binary or graded relevance score. It is often based on
the user interactions, such as clicks and conversions.
• The K. You need to pick the number of the top
recommendations to consider. This puts a constraint on
evaluations: you will disregard anything that happens after
this cutoff point.
• 1. Predictive metrics. They reflect the “correctness” of
recommendations and show how well the system finds
relevant items.
• 2. Ranking metrics. They reflect the ranking quality:
how well the system can sort the items from more
relevant to less relevant.
• 3. Behavioral metrics. These metrics reflect specific
properties of the system, such as how diverse or novel
the recommendations are.
Precision at K
• Precision shows how many recommendations among
the provided ones are relevant. It gives an assessment
of prediction “correctness.” It is intuitive and easy to
understand: Precision in ranking works the same as its
counterpart in classification quality evaluation.
Recall at K
• Recall at K measures the coverage of relevant items in
the top K.
• Recall at K shows how many relevant items, out of their
total number, you can successfully retrieve within the
top K recommendations.
F-score
• The F Beta score is a metric that balances Precision and
Recall.
• The F Beta score at K combines Precision and Recall
metrics into a single value to provide a balanced
assessment. The Beta parameter allows adjusting the
importance of Recall relative to Precision.
• If you set the Beta to 1, you will get the standard F1
score, a harmonic mean of Precision and Recall.
• The F Beta score is a good metric when you care about
both properties: correctness of predictions and ability to
cover as many relevant items as possible with the top-K.
The Beta parameter allows you to customize the
priorities.
• Precision and Recall depend heavily on the total number
of relevant items. Because of this, it might be
challenging to compare the performance across
different lists.
• In addition, metrics like Precision and Recall are not
rank-aware. They are indifferent to the position of
relevant items inside the top K.
• Consider two lists that both have 5 out of 10 matches.
In the first list, the relevant items are at the very top. In
the second, they are at the very bottom. The Precision
will be the same (50%) as long as the total number of
relevant items is.
Ranking quality metrics

• Ranking metrics help assess the ability to order the

items based on their relevance to the user or query. In
an ideal scenario, all the relevant items should appear
ahead of the less relevant ones. Ranking metrics help
measure how far you are from this.
• MRR
• MRR calculates the average of the reciprocal ranks of
the first relevant item.
• MRR (Mean Reciprocal Rank) shows how soon you
can find the first relevant item.
• To calculate MRR, you take the reciprocal of the rank of
the first relevant item and average this value across all
queries or users.
• For example, if the first relevant item appears in the
second position, this list's RR (Reciprocal Rank) is 1/2. If
the first relevant item takes the third place, then the RR
equals 1/3, and so on.
• Once you compute the RRs for all lists, you can average
it to get the resulting MRR for all users or queries.
• MRR is an easy-to-understand and intuitive metric. It is
beneficial when the top-ranked item matters: for
example, you expect the search engine to return a
relevant first result.
• However, the limitation is that MRR solely focuses on
the first relevant item and disregards all the rest. In
case you care about overall ranking, you might need
additional metrics.
MAP
• MAP measures the average Precision across different
Recall levels for a ranked list.
• ‍Mean Average Precision (MAP) at K evaluates the
average Precision at all relevant ranks within the list of
top K recommendations. This helps get a comprehensive
measure of recommendation system performance,
accounting for the quality of the ranking.
• To compute MAP, you first need to calculate the Average
Precision (AP) for each list: an average of Precision
values at all positions in K with relevant
recommendations.
• Once you compute the AP for every list, you can average
it across all users. Here is the complete formula:
• MAP helps address the limitations of “classic” Prediction
and Recall: it evaluates both the correctness of
recommendations and how well the system can sort the
relevant items inside the list.
• Due to the underlying formula, MAP heavily rewards
correct recommendations at the top of the list.
Otherwise, you will factor the errors at the top in every
consecutive Precision computation.
• MAP is a valuable metric when it is important to get the
top predictions right, like in information retrieval. As a
downside, this metric might be hard to communicate and
does not have an immediate intuitive explanation.
Hit rate

• Hit Rate measures the share of users that get at least

one relevant recommendation.
• Hit Rate at K calculates the share of users for which at
least one relevant item is present in the K. This metric is
very intuitive.
• You can get a binary score for each user: “1” if there is
at least a single relevant item in top K or “0” otherwise.
Then, you can compute the average hit rate across all
users.
Behavioral metrics

• Behavioral metrics help go “beyond accuracy” and

evaluate important qualities of a recommender system,
like the diversity and novelty of recommendations.
• 1. Diversity
• Recommendation diversity assesses how varied the
recommended items are for each user. It reflects the
breadth of item types or categories to which each user
is exposed.
• To compute this metric, you can measure the intra-list
diversity by evaluating the average Cosine Distance
between pairs of items inside the list. Then, you can
average it across all users.
• Diversity is helpful if you expect users to have a better
experience when they receive recommendations that
span a diverse range of topics, genres, or
characteristics.
• However, while diversity helps check if a system can
show a varied mix of items, it does not consider
relevance. You can use this metric with ranking or
predictive metrics to get a complete picture.
Novelty

• Novelty assesses how unique or unusual the recommended items

are. It measures the degree to which the suggested items differ
from popular ones.
• You can compute novelty as the negative logarithm (base 2) of
the probability of encountering a given item in a training set. High
novelty corresponds to long-tail items that few users interacted
with, and low novelty corresponds to popular items. Then, you can
average the novelty inside the list and across users.
• Novelty reflects the system's ability to recommend items that are
not well-known in the dataset. It is helpful for scenarios when you
expect users to get new and unusual recommendations to stay
engaged.
Serendipity
• Serendipity measures the unexpectedness or pleasant surprise in
recommendations. Serendipity evaluates the system's ability to
suggest items beyond the user's typical preferences or
expectations.
• Serendipity is challenging to quantify precisely, but one way to
approach it is by considering the dissimilarity (measured via Cosine
Distance) between successfully recommended items and a user's
historical preferences. Then, you can average it across users.
• Serendipity reflects the ability of the system to venture beyond the
predictable and offer new recommendations that users enjoy. It
promotes exploring diverse and unexpected content, adding an
element of delight and discovery.
Popularity bias

• Popularity bias refers to a phenomenon where the

recommendation favors popular items over more diverse or niche
ones. It can lead to a lack of personalization, causing users to see
the same widely popular items repeatedly. This bias may result in a
less diverse and engaging user experience.
• There are different ways to evaluate the popularity of
recommendations, for example:
• Coverage: the share of all items in the catalog present in
recommendations.
• Average recommendation popularity (ARP).
• Average overlap between the items in the lists.
• Gini index.
What are the steps to validate data from
recommendation systems?

1Define the objectives

2Assess the data sources
3Clean and transform the data
4Split the data into subsets
5Apply the recommendation models
6Evaluate the recommendations
7Here’s what else to consider
• Data validation is the process of checking the quality,
completeness, and consistency of the data before using it for
analysis or modeling.
•
Define the objectives
• The first step to validate data from recommendation systems is to
define the objectives and criteria of the validation. What are you
trying to achieve with the recommendations? What are the key
performance indicators (KPIs) that measure the success of the
recommendations? How do you define the quality and relevance of
the data? These questions will help you set the scope and
standards of the validation and align them with the business goals
and user needs.
2 Assess the data sources

• The next step is to assess the data sources that provide

the input for the recommendation systems. Data
sources can include user profiles, preferences, ratings,
feedback, browsing history, transactions, social media,
etc. You need to evaluate the reliability, availability, and
accessibility of these sources and identify any potential
issues or gaps. For example, you might want to check if
the data is updated regularly, if it covers enough users
and items, if it has enough diversity and variety, and if
it is compatible with the data formats and platforms you
use.
Clean and transform the data
• After assessing the data sources, you need to clean and
transform the data to make it suitable for the
recommendation systems. Cleaning involves removing
or correcting any errors, outliers, duplicates, missing
values, or inconsistencies in the data. Transforming
involves applying any operations or functions that
change the structure, format, or values of the data. For
example, you might want to normalize, standardize,
aggregate, or categorize the data to make it more
uniform and comparable.
Split the data into subsets
• The next step is to split the data into subsets for
different purposes. Typically, you will need three
subsets: training, validation, and testing. Training data
is used to build and train the recommendation models.
Validation data is used to tune and optimize the model
parameters and select the best model. Testing data is
used to evaluate and compare the performance and
accuracy of the models. You need to ensure that the
subsets are representative, balanced, and independent
of each other.
Apply the recommendation
models
• The next step is to apply the recommendation models
to the data subsets and generate the recommendations.
Recommendation models can be based on different
techniques, such as collaborative filtering, content-
based filtering, hybrid filtering, or deep learning. You
need to choose the appropriate models for your
objectives and data characteristics and test them on
different scenarios and settings. You also need to
monitor and record the results and outputs of the
models for further analysis.
Evaluate the recommendations

• he final step is to evaluate the recommendations and

validate their quality and effectiveness. You can use
different methods and metrics to measure the
performance of the recommendations, such as
accuracy, precision, recall, coverage, diversity, novelty,
serendipity, etc. You can also use feedback from users,
such as ratings, reviews, clicks, conversions, retention,
etc. to assess the satisfaction and engagement of the
recommendations. You need to compare the results with
the objectives and criteria you defined in the first step
and identify any strengths, weaknesses, or areas for
improvement.

Practical Statistical Process Control
From Everand
Practical Statistical Process Control
Colin Hardwick
5/5 (9)
First and Second Summative Test in Mathematics 8 For 3rd Quarter
100% (6)
First and Second Summative Test in Mathematics 8 For 3rd Quarter
2 pages
UNIT - III - Evaluating Recommendations
No ratings yet
UNIT - III - Evaluating Recommendations
13 pages
Notes (Product Management)-pages-3
No ratings yet
Notes (Product Management)-pages-3
3 pages
Deep Dive_ All the Ranking Metrics for Recommender Systems Explained!
No ratings yet
Deep Dive_ All the Ranking Metrics for Recommender Systems Explained!
30 pages
Lec 5-d Analytics Recommenders
No ratings yet
Lec 5-d Analytics Recommenders
39 pages
Recommendation Systems 2312.16015
No ratings yet
Recommendation Systems 2312.16015
25 pages
Metrics and Evaluation
No ratings yet
Metrics and Evaluation
28 pages
Study of Accuracy Metrics
No ratings yet
Study of Accuracy Metrics
2 pages
Personalised Recommendations To Customers
No ratings yet
Personalised Recommendations To Customers
3 pages
Recommendation System
No ratings yet
Recommendation System
11 pages
AaronBeatty_Amazon
No ratings yet
AaronBeatty_Amazon
15 pages
CS345A Data Mining: Recommendation Systems
No ratings yet
CS345A Data Mining: Recommendation Systems
26 pages
A Study of Different Similarity Metrics and Prediction Approaches in Collaborative Filtering For Recommendation
No ratings yet
A Study of Different Similarity Metrics and Prediction Approaches in Collaborative Filtering For Recommendation
150 pages
Recommenation Systems Project_Padma- Business Presentation (1) (1)
No ratings yet
Recommenation Systems Project_Padma- Business Presentation (1) (1)
19 pages
Chatbots & Recommendation Systems Final Review
No ratings yet
Chatbots & Recommendation Systems Final Review
49 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
12-recsys-1 - converted
No ratings yet
12-recsys-1 - converted
11 pages
Evaluating CF RS
No ratings yet
Evaluating CF RS
47 pages
2312.16015v2
No ratings yet
2312.16015v2
25 pages
Netflix Data Science Interview Question
No ratings yet
Netflix Data Science Interview Question
7 pages
Theory of Measurements in Software Engineering Lec 5
No ratings yet
Theory of Measurements in Software Engineering Lec 5
13 pages
Recommender Systems: Collaborative Filtering & Content-Based Recommending
No ratings yet
Recommender Systems: Collaborative Filtering & Content-Based Recommending
47 pages
oracle-ds-introduction-to-recommendation-engines
No ratings yet
oracle-ds-introduction-to-recommendation-engines
9 pages
DM - Lecture 5
No ratings yet
DM - Lecture 5
75 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
5 pages
Book Recommendation Project
No ratings yet
Book Recommendation Project
15 pages
Recommender System
No ratings yet
Recommender System
26 pages
Unit 4 - MLMM
No ratings yet
Unit 4 - MLMM
36 pages
FLAC 3rd Semester Notes
No ratings yet
FLAC 3rd Semester Notes
3 pages
Notes On Recommender Systems
No ratings yet
Notes On Recommender Systems
72 pages
2a_Evaluating_collaborative_filtering_recommender_systems
No ratings yet
2a_Evaluating_collaborative_filtering_recommender_systems
47 pages
Collaborative Filtering & Content-Based Recommending: CS 293S. T. Yang Slides Based On R. Mooney at UT Austin
No ratings yet
Collaborative Filtering & Content-Based Recommending: CS 293S. T. Yang Slides Based On R. Mooney at UT Austin
22 pages
Notes (Product Management)-pages-4
No ratings yet
Notes (Product Management)-pages-4
3 pages
Module5 Recommender Systems PartA
No ratings yet
Module5 Recommender Systems PartA
54 pages
(IJETA-V3I6P3) :mrs S Vidya
No ratings yet
(IJETA-V3I6P3) :mrs S Vidya
9 pages
Recommendation Systems
No ratings yet
Recommendation Systems
6 pages
Filtering and Recommender Systems: Content-Based and Collaborative
No ratings yet
Filtering and Recommender Systems: Content-Based and Collaborative
30 pages
RecSysEvaluation - 1
No ratings yet
RecSysEvaluation - 1
100 pages
Module4-RecommenderSystem
No ratings yet
Module4-RecommenderSystem
11 pages
Analysis of Recommender Systems' Algorithms
No ratings yet
Analysis of Recommender Systems' Algorithms
14 pages
Amazon Product Recommendation System Using SVD Algorithm
No ratings yet
Amazon Product Recommendation System Using SVD Algorithm
4 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
36 pages
.Trashed-1724941095-Recommender Systems
No ratings yet
.Trashed-1724941095-Recommender Systems
30 pages
Recommendation Engines
No ratings yet
Recommendation Engines
17 pages
Recommender Systems-Chapter 3
No ratings yet
Recommender Systems-Chapter 3
47 pages
RecSys Updated
No ratings yet
RecSys Updated
37 pages
10 Recommender Systems
No ratings yet
10 Recommender Systems
35 pages
Architecting Recommender Systems: Boston Machine Learning
No ratings yet
Architecting Recommender Systems: Boston Machine Learning
63 pages
Ranking Diversity Aug2009
No ratings yet
Ranking Diversity Aug2009
33 pages
Research
No ratings yet
Research
8 pages
RS Unit-5
No ratings yet
RS Unit-5
21 pages
P3
No ratings yet
P3
14 pages
Advances in Modelling and Analysis A: Keywords
No ratings yet
Advances in Modelling and Analysis A: Keywords
6 pages
Music Recommendation System
No ratings yet
Music Recommendation System
22 pages
AIML_presentation
No ratings yet
AIML_presentation
21 pages
CompSci HL P3 Case Study
No ratings yet
CompSci HL P3 Case Study
7 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
Music Recommendation
100% (1)
Music Recommendation
113 pages
Recommendations Using Collaborative Filtering
No ratings yet
Recommendations Using Collaborative Filtering
37 pages
推荐系统中的意外收获调查
No ratings yet
推荐系统中的意外收获调查
24 pages
Lecture Notes Differential Equation - First Order ODE
No ratings yet
Lecture Notes Differential Equation - First Order ODE
49 pages
(AMC8) Permutations and Combinations
50% (2)
(AMC8) Permutations and Combinations
4 pages
Volume of Prisms (Print)
No ratings yet
Volume of Prisms (Print)
5 pages
Monitoring and Enhanced Fatigue Evaluation of A Steel Railway Bridge
No ratings yet
Monitoring and Enhanced Fatigue Evaluation of A Steel Railway Bridge
10 pages
Worksheet-7 Important Results of Conics
No ratings yet
Worksheet-7 Important Results of Conics
10 pages
Trustworthy Data Collection From Implantable Medical Devices Via High-Speed Security Implementation Based On IEEE 1363
No ratings yet
Trustworthy Data Collection From Implantable Medical Devices Via High-Speed Security Implementation Based On IEEE 1363
8 pages
HSC Maths I Board Paper 2013
No ratings yet
HSC Maths I Board Paper 2013
2 pages
Mitosis in Onion Root Tips - DataClassroom
No ratings yet
Mitosis in Onion Root Tips - DataClassroom
1 page
Module Details 51110818
No ratings yet
Module Details 51110818
4 pages
Teaching Bayesian Method
No ratings yet
Teaching Bayesian Method
20 pages
Cookbook PDF
No ratings yet
Cookbook PDF
278 pages
Constant of Proportionality PowerPoint
No ratings yet
Constant of Proportionality PowerPoint
7 pages
Quiz
No ratings yet
Quiz
4 pages
MMEL Procedures (M&amp O)
No ratings yet
MMEL Procedures (M&amp O)
31 pages
Vector Mechanics For Engineers: Dynamics 11th Edition Ferdinand P. Beerdownload
100% (2)
Vector Mechanics For Engineers: Dynamics 11th Edition Ferdinand P. Beerdownload
86 pages
Lab 02 - Getting Familier To Ladder Logic Programming
No ratings yet
Lab 02 - Getting Familier To Ladder Logic Programming
12 pages
A Survey On Network Security For CyberPhysical Systems From Threats To Resilient Design
No ratings yet
A Survey On Network Security For CyberPhysical Systems From Threats To Resilient Design
40 pages
quantitative analysis for business module
No ratings yet
quantitative analysis for business module
18 pages
Metrology Unit 3 PPT 3
No ratings yet
Metrology Unit 3 PPT 3
34 pages
11th Maths Half Yearly Q P TM Viruthunagar DT
No ratings yet
11th Maths Half Yearly Q P TM Viruthunagar DT
4 pages
Buckling of Columns PDF
No ratings yet
Buckling of Columns PDF
11 pages
Fox Fluid Mechanics 8th Solved Problem 3.13
No ratings yet
Fox Fluid Mechanics 8th Solved Problem 3.13
2 pages
Portfolio Selection: Optimizing An Error: Assignment #1 (Based On Case From Textbook)
No ratings yet
Portfolio Selection: Optimizing An Error: Assignment #1 (Based On Case From Textbook)
10 pages
Bits For Mid1
100% (1)
Bits For Mid1
14 pages
Design A Dynamic Sliding Mode Controller
No ratings yet
Design A Dynamic Sliding Mode Controller
8 pages
Reliability Checking Through SPSS
No ratings yet
Reliability Checking Through SPSS
13 pages
A New Golden Section Method Based Maximum Power Point 2016 Energy Conversion
No ratings yet
A New Golden Section Method Based Maximum Power Point 2016 Energy Conversion
12 pages
2025 Physical Sciences Gr 12 Informal Test 1 MEMO f
No ratings yet
2025 Physical Sciences Gr 12 Informal Test 1 MEMO f
6 pages
Clinical Instrumentation, MLT 2760, BCC, Assessment Test 2
No ratings yet
Clinical Instrumentation, MLT 2760, BCC, Assessment Test 2
9 pages

Evaluating Recommender Sytems

Uploaded by

Evaluating Recommender Sytems

Uploaded by

Evaluating

• Mean Average precision at K

• Predictive Accuracy Metrics

• Jaccard similarity is the size of the intersection divided

• The K parameter is the evaluation cutoff point. It

• Ranking metrics help assess the ability to order the

• Hit Rate measures the share of users that get at least

• Behavioral metrics help go “beyond accuracy” and

• Novelty assesses how unique or unusual the recommended items

• Popularity bias refers to a phenomenon where the

1Define the objectives

• The next step is to assess the data sources that provide

• he final step is to evaluate the recommendations and

You might also like