IR Lecture 5b

Uploaded by

RAtna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views36 pages

IR Lecture 5b

Uploaded by

RAtna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

Evaluation

Evaluation
 The major goal of IR is to search
document relevant to a user query.
 The evaluation of the performance of IR
systems relies on the notion of
relevance.
What constitute relevance ?
Relevance
 Relevance is subjective in nature i.e. it
depends upon a specific user’s judgment.
 Given a query, the same document may be
judged as relevant by one user and non-
relevant by another user. Only the user can tell
the true relevance.
 however not possible to measure this “true
relevance”
 Most of the evaluation of IR systems so far has
been done on document test collections with
known relevance judgments.
 Another issue with relevance is the
degree of relevance.
 Traditionally, relevance has been
visualized as a binary concept i.e. a
document is judged either as relevant or
not relevant whereas relevance is a
continuous function (a document may
exactly what the user want or it may be
closely related)
Why System Evaluation?

 There are many retrieval models/ algorithms/

systems, which one is the best?
 What is the best component for:
• Ranking function (dot-product, cosine, …)
• Term selection (stop word removal, stemming…)
• Term weighting (TF, TF-IDF,…)
 How far down the ranked list will a user need
to look to find some/all relevant documents?
Evaluation of IR Systems
 The evaluation of IR system is the process of assessing
how well a system meets the information needs of its
users (Voorhees, 2001).
 Criteria's for evaluation
-Coverage of the collection
-Time lag
- Presentation format
- User effort
- Precision
- Recall
 Of these criteria, recall and precision
have most frequently been applied in
measuring information retrieval.
 Both these criteria are related with the
effectiveness aspect of IR system i.e. its
ability to retrieve relevant documents in
response to user query.
 Effectiveness is purely a measure of the ability
of the system to satisfy user in terms of the
relevance of documents retrieved
 Aspects of effectiveness include:
- whether the documents being returned are
relevant to the user
-whether they are presented in the order of
relevance
- whether a significant number of relevant
documents in the collection are being returned
to the user etc
Evaluation of IR Systems
 The IR evaluation models can be broadly
classified as system driven model and user-
centered model.
 System driven model focus on measuring how
well the system can rank documents
 user–centered evaluation model attempt to
measure the user’s satisfaction with the
system.
Effectiveness measures
 The most commonly used measures of
effectiveness are precision and recall.
These measures are based on relevance
judgments.
Evaluation of IR Systems

 Traditional goal of IR is to retrieve all and

only the relevant documents in response to
a query
 All is measured by recall: the proportion of
relevant documents in the collection which
are retrieved i.e. P(retrieved|relevant)
 Only is measured by precision: the
proportion of retrieved documents which are
relevant
Precision vs. Recall

All docs
Retrieved

RelRetrieved

Relevant

| RelRetrieved | | RelRetriev ed |
Precision  Recall 
| Retrieved | | Relevant |
 These definitions of precision and recall are
based on binary relevance judgment, which
means that every retrievable item is
recognizably “relevant”, or recognizably “not
relevant”.
 Hence, for every search result all retrievable
documents will be either
(i) relevant or non-relevant and
(ii) retrieved or not retrieved.
A B A B
Precision  Recall 
B A

where, A is set of relevant docments,

A  No. of relevant documents in the collection(NRrel)
B is set of retrieved documents
and B  No. of retrieved documents(NRret)
Trade-off between Recall and
Precision
Returns relevant documents but
misses many useful ones too The ideal
1
Precision

0 1
Recall Returns most relevant
documents but includes
lots of junk
Test collection approach
 The total number of relevant documents in a
collection must be known in order for recall to
be calculated.
 To provide a framework of evaluation of IR
systems, a number of test collections have
been developed (Cranfield, TREC etc.).
 These document collections are accompanied
by a set of queries and relevance judgments.
IR test collections
Collection Number of documents Number of queries
Cranfield 1400 225
CACM 3204 64
CISI 1460 112
LISA 6004 35
TIME 423 83
ADI 82 35
MEDLINE 1033 30
TREC-1 742,611 100
__________________________________________________________
_
Fixed Recall Levels
 One way to evaluate is to look at average
precision at fixed recall levels
• Provides the information needed for
precision/recall graphs
Document Cutoff Levels
 Another way to evaluate:
• Fix the number of documents retrieved at several levels:
• top 5
• top 10
• top 20
• top 50
• top 100
• Measure precision at each of these levels
• Take (weighted) average over results
 focuses on how well the system ranks the first k
documents.
Computing Recall/Precision Points

 For a given query, produce the ranked list

of retrievals.
 Mark each document in the ranked list that
is relevant according to the gold standard.
 Compute a recall/precision pair for each
position in the ranked list that contains a
relevant document.
Computing Recall/Precision Points:
An Example
n doc # relevant Let total # of relevant docs = 6
1 588 x Check each new recall point:
2 589 x
3 576 R=1/6=0.167; P=1/1=1
4 590 x
5 986 R=2/6=0.333; P=2/2=1
6 592 x
7 984 R=3/6=0.5; P=3/4=0.75
8 988
9 578 R=4/6=0.667; P=4/6=0.667
10 985
Missing one
11 103
relevant document.
12 591 Never reach
13 772 x R=5/6=0.833; p=5/13=0.38 100% recall
14 990
Interpolating a Recall/Precision Curve

 Interpolate a precision value for each

standard recall level:
• r {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,
j
1.0}
• r0 = 0.0, r1 = 0.1, …, r10=1.0
 The interpolated precision at the j-th
standard recall level is the maximum known
precision at any recall level greater than or
equal to j-th level.
Example: Interpolated Precision
Precision at observed recall points: The interpolated precision :
0.0 1.0
Recall Precision 0.1 1.0
0.25 1.0 0.2 1.0
0.4 0.67 0.3 0.8
0.55 0.8 0.4 0.8
0.8 0.6 0.5 0.8
1.0 0.5 0.6 0.6
0.7 0.6
0.8 0.6
0.9 0.5
1.0 0.5
Interpolated average precision =
0.745
Recall-Precision graph
Average Recall/Precision Curve
 Compute average precision at each
standard recall level across all queries.
 Plot average precision/recall curves to
evaluate overall system performance on
a document/query corpus.
Average Recall/Precision Curve
Model
1. doc = “atn”, query = “ntc”
2. doc = “atn”, query = “atc”
3. doc = “atc”, query = “atc”
4. doc = “atc”, query = “ntc”
5. doc = “ntc”, query = “ntc”
6. doc = “ltc”, query = “ltc”
7. doc = “nnn”, query= “nnn”
 1.000000 1.000000 1.000000 0.250000 0.250000 0.150000
0.150000 0.060606 0.016611 0.016611 0.016611
 1.000000 1.000000 0.333333 0.333333 0.022727 0.022727
0.030075 0.026042 0.026042 0.026042 0.026042
 1.000000 0.068966 0.081081 0.031250 0.024155 0.020115
0.022663 0.023377 0.021692 0.010138 0.010138
 0.111111 0.111111 0.166667 0.085714 0.078431 0.078431
0.087719 0.086957 0.063636 0.034335 0.034335
 1.000000 1.000000 1.000000 1.000000 0.200000 0.200000
0.200000 0.029703 0.029703 0.029703 0.029703
 1.000000 1.000000 0.636364 0.142857 0.142857 0.135922
0.100000 0.055866 0.024974 0.014123 0.014123
Problems with Precision/Recall
 Can’t know true recall value
• except in small collections
 Precision/Recall are related
• A combined measure sometimes more
appropriate
 Assumes batch mode
• Interactive IR is important and has different
criteria for successful searches
 Assumes a strict rank ordering matters.
Other measures: R-Precision
 R-Precision is the precision after R
documents have been retrieved, where R
is the number of relevant documents for a
topic.
• It de-emphasizes exact ranking of the
retrieved relevant documents.
• The average is simply the mean R-Precision
for individual topics in the run.
Other measures: F-measure
 F-measure takes into account both
precision and recall. It is defined as
harmonic mean of recall and precision.
2PR
F
PR
 Compared to arithmetic mean both need
to be high for harmonic mean to be high.
E-measure
 E-measure is a variant of F-measure that allows weighting
emphasis on precision over recall. It is defined as:
( 1  β ) PR ( 1  β )
2 2
E 2  2
β PR β 1

R P
 Value of  controls the trade-off between precision
and recall.
Setting to gives equal weight to precision and
recall (E = F)
>1 weight precision more whereas  gives more
weight to recall.
Normalized recall
 Normalized recall measures how close is
the set of the retrieved document to an
ideal retrieval in which the most relevant
NRrel documents appear in first NRrel
positions.
 If relevant documents are ranked 1,2,3,
… then Ideal rankNRrel
(IR) is given by
 r
IR  r 1

NRrel
 Let the average rank (AR) over the set of
relevant documents retrieved by the
system be: NRre

 Rank r

AR  r 1

NRrel
 Rankr represents the rank of the rth relevant
document
 The difference between AR and IR,
given by AR-IR, represents a measure of
the effectiveness of the system.
 This difference ranges from 0 (for the
perfect retrieval ) to (N-NRrel) for worst
case retrieval
 The expression AR-IR can be normalized by
dividing it by (N-NRrel) and then by
subtracting the result from 1, we get the
normalized recall (NR) given by:
AR  IR
NR  1 -
( N  NRrel )

 This measure ranges from 1 for the best

case to 0 for the worst case.
Evaluation Problems

 Realistic IR is interactive; traditional IR

methods and measures are based on
non-interactive situations
 Evaluating interactive IR requires human
subjects (no gold standard or
benchmarks)
[Ref.: See Borlund, 2000 & 2003; Borlund & Ingwersen,
1997 for IIR evaluation]

IBM Watsonx - Data Technical Essentials Final Quiz - Attempt Review
No ratings yet
IBM Watsonx - Data Technical Essentials Final Quiz - Attempt Review
15 pages
3rd Year Syllabus Computer Science & Engineering 2018-19
No ratings yet
3rd Year Syllabus Computer Science & Engineering 2018-19
21 pages
Archive Adobe Form
No ratings yet
Archive Adobe Form
4 pages
Dama Phoenix Dmbok2
50% (2)
Dama Phoenix Dmbok2
50 pages
IR Lecture 5b
No ratings yet
IR Lecture 5b
36 pages
5 Retrievalefective
No ratings yet
5 Retrievalefective
22 pages
Chapter 5 Retrieval Efective
No ratings yet
Chapter 5 Retrieval Efective
24 pages
CS336 MIR w5 Evaluation
No ratings yet
CS336 MIR w5 Evaluation
38 pages
3 Retrieval Evaluation
No ratings yet
3 Retrieval Evaluation
31 pages
Ch5 Retrieval Evaluation 2021
No ratings yet
Ch5 Retrieval Evaluation 2021
26 pages
5 Retrieval Evaluation
No ratings yet
5 Retrieval Evaluation
20 pages
6 Retrieval Evaluation
No ratings yet
6 Retrieval Evaluation
28 pages
SIT772 Lecture 10
No ratings yet
SIT772 Lecture 10
34 pages
IR - Chapter 5
No ratings yet
IR - Chapter 5
28 pages
Chapter 6-8IR Revised
No ratings yet
Chapter 6-8IR Revised
76 pages
Modern Information Retrieval
No ratings yet
Modern Information Retrieval
58 pages
5 retrievalEfective
No ratings yet
5 retrievalEfective
13 pages
5-Retrieval Effectiveness
No ratings yet
5-Retrieval Effectiveness
20 pages
Evaluation of Information Retrieval Systems: Thanks To Marti Hearst, Ray Larson, Chris Manning
No ratings yet
Evaluation of Information Retrieval Systems: Thanks To Marti Hearst, Ray Larson, Chris Manning
108 pages
IR Chapt 5
No ratings yet
IR Chapt 5
55 pages
Lecture 7 - Evaluation in IR, Relevance Feedback, Query Expansion
No ratings yet
Lecture 7 - Evaluation in IR, Relevance Feedback, Query Expansion
79 pages
Evaluation
No ratings yet
Evaluation
41 pages
L15 IRSW Evaluation
No ratings yet
L15 IRSW Evaluation
49 pages
5 Retrieval Effectiveness
No ratings yet
5 Retrieval Effectiveness
20 pages
Evaluation 1
No ratings yet
Evaluation 1
63 pages
6 Retrieval Effectiveness
No ratings yet
6 Retrieval Effectiveness
18 pages
Information Retrieval: IR Evaluation
No ratings yet
Information Retrieval: IR Evaluation
36 pages
Performance Evaluation of Information Retrieval Systems
No ratings yet
Performance Evaluation of Information Retrieval Systems
45 pages
L05-IR Models MMN
No ratings yet
L05-IR Models MMN
23 pages
1727759531-6 Evaluation in Information Retrieval
No ratings yet
1727759531-6 Evaluation in Information Retrieval
24 pages
Retrieval Performance Evaluation
No ratings yet
Retrieval Performance Evaluation
31 pages
IR_MOD3_NOTES
No ratings yet
IR_MOD3_NOTES
54 pages
ISR chap...6
No ratings yet
ISR chap...6
14 pages
chapter3-MA212-Evaluation
No ratings yet
chapter3-MA212-Evaluation
63 pages
lecture5-6
No ratings yet
lecture5-6
30 pages
Performance Evaluation of Information Retrieval Systems
No ratings yet
Performance Evaluation of Information Retrieval Systems
28 pages
Retrieval Evaluation
No ratings yet
Retrieval Evaluation
7 pages
IR Evaluation Tugas Kampus
No ratings yet
IR Evaluation Tugas Kampus
25 pages
Information Retrieval CMSC 476/676: Evaluation and Result Summaries
No ratings yet
Information Retrieval CMSC 476/676: Evaluation and Result Summaries
45 pages
Introduction To Telecom Technologies (Telecom) : Getachew Mamo
No ratings yet
Introduction To Telecom Technologies (Telecom) : Getachew Mamo
65 pages
Evaluation of Information Retrieval Systems
No ratings yet
Evaluation of Information Retrieval Systems
9 pages
unit3 ISR
No ratings yet
unit3 ISR
15 pages
IR Unit 5
No ratings yet
IR Unit 5
5 pages
TREC Evalution Measures
No ratings yet
TREC Evalution Measures
10 pages
Title: Perform Evaluation of Any Popular Search Engine Based On Relevancy. (E.g Google) Theory
No ratings yet
Title: Perform Evaluation of Any Popular Search Engine Based On Relevancy. (E.g Google) Theory
9 pages
10 Evaluation FSS20
No ratings yet
10 Evaluation FSS20
24 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Theory of Measurements in Software Engineering Lec 5
No ratings yet
Theory of Measurements in Software Engineering Lec 5
13 pages
unit 3(isr)
No ratings yet
unit 3(isr)
9 pages
1 Common Evaluation Measures: Recall
No ratings yet
1 Common Evaluation Measures: Recall
6 pages
Unit 4_ Experimental Evaluation of IR
No ratings yet
Unit 4_ Experimental Evaluation of IR
4 pages
Lecture 6
No ratings yet
Lecture 6
58 pages
Unit-V
No ratings yet
Unit-V
54 pages
09 Evaluation
No ratings yet
09 Evaluation
22 pages
Slides Chap04 PDF
No ratings yet
Slides Chap04 PDF
144 pages
3
No ratings yet
3
14 pages
Chapter 5
No ratings yet
Chapter 5
57 pages
Evaluation and Result Summaries
No ratings yet
Evaluation and Result Summaries
60 pages
Precision and Recall
No ratings yet
Precision and Recall
20 pages
Chapter Four (ISR)
No ratings yet
Chapter Four (ISR)
25 pages
Topic 6 W7 W8 - IREvaluation - Uodated
No ratings yet
Topic 6 W7 W8 - IREvaluation - Uodated
37 pages
Lecture8-Evaluation 2013
No ratings yet
Lecture8-Evaluation 2013
44 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Feedback Control Theory
From Everand
Feedback Control Theory
Bruce Francis
5/5 (1)
AKTU Attendance
No ratings yet
AKTU Attendance
2 pages
Pre Ph.D. Syllabus Research Methodology
No ratings yet
Pre Ph.D. Syllabus Research Methodology
3 pages
Noida Institute of Engineering & Technology: Admissionno Rollno Studentname
No ratings yet
Noida Institute of Engineering & Technology: Admissionno Rollno Studentname
2 pages
1 Web Technology Assignment No 1
No ratings yet
1 Web Technology Assignment No 1
5 pages
3 Seminar Guidelines Updated
No ratings yet
3 Seminar Guidelines Updated
1 page
DBMS - 3rd Year VI Semester - AICTE 2020-21 - 9 March 2021
No ratings yet
DBMS - 3rd Year VI Semester - AICTE 2020-21 - 9 March 2021
2 pages
ATAL Scheme Guidelines Final
100% (1)
ATAL Scheme Guidelines Final
4 pages
Assignment No 4 Java Script1
No ratings yet
Assignment No 4 Java Script1
20 pages
1 Compiler Technique (Part 1) : October 2015
No ratings yet
1 Compiler Technique (Part 1) : October 2015
25 pages
Prglist
No ratings yet
Prglist
2 pages
Constraints
No ratings yet
Constraints
9 pages
Applied Mathematics and Computation: Seyedali Mirjalili, Siti Zaiton Mohd Hashim, Hossein Moradian Sardroudi
No ratings yet
Applied Mathematics and Computation: Seyedali Mirjalili, Siti Zaiton Mohd Hashim, Hossein Moradian Sardroudi
13 pages
Brochure
No ratings yet
Brochure
3 pages
FIRST SESSIONAL CD PAPER SET 1 FOR Even SEM 2019-20
No ratings yet
FIRST SESSIONAL CD PAPER SET 1 FOR Even SEM 2019-20
2 pages
Assignment DBMS
No ratings yet
Assignment DBMS
8 pages
Lecture 5 - Data Transformation
No ratings yet
Lecture 5 - Data Transformation
7 pages
OceanofPDF - Com Essential SQLAlchemy - Rick Copeland
No ratings yet
OceanofPDF - Com Essential SQLAlchemy - Rick Copeland
301 pages
IBM watsonx.data L2
No ratings yet
IBM watsonx.data L2
8 pages
02 Database Development-NDN
No ratings yet
02 Database Development-NDN
31 pages
Primitives
100% (1)
Primitives
3 pages
15A05602 Data Warehousing & Mining
No ratings yet
15A05602 Data Warehousing & Mining
2 pages
Normalization Data Anomalies
No ratings yet
Normalization Data Anomalies
15 pages
Document Management System
No ratings yet
Document Management System
11 pages
DBMS NJNN
No ratings yet
DBMS NJNN
38 pages
Digital Transformation - AI _ Data
No ratings yet
Digital Transformation - AI _ Data
35 pages
DumitruGeorgiana (Tema5)
82% (11)
DumitruGeorgiana (Tema5)
22 pages
Week 1 Day 1: Xampp Installation Guide
No ratings yet
Week 1 Day 1: Xampp Installation Guide
8 pages
Anmol_300216372_Comp_230
No ratings yet
Anmol_300216372_Comp_230
22 pages
Datastage and Qualitystage Parallel Stages and Activities
No ratings yet
Datastage and Qualitystage Parallel Stages and Activities
154 pages
Concurrency Control Techniques
No ratings yet
Concurrency Control Techniques
9 pages
Adding Custom Pipe Supports To The CADWorx Specification Editor
100% (1)
Adding Custom Pipe Supports To The CADWorx Specification Editor
15 pages
Karthik Mobile: +91-8885296778 Email
No ratings yet
Karthik Mobile: +91-8885296778 Email
5 pages
Insertion
No ratings yet
Insertion
7 pages
Multi-Level Nested CSV TDV
No ratings yet
Multi-Level Nested CSV TDV
26 pages
Assignment 1 Part - 1
No ratings yet
Assignment 1 Part - 1
6 pages
Normlization 1
No ratings yet
Normlization 1
60 pages
ADBMS - Unit 1 - 21042018 - 032136AM
No ratings yet
ADBMS - Unit 1 - 21042018 - 032136AM
21 pages
Advance Database
100% (1)
Advance Database
7 pages
INVENTOR Object Library
No ratings yet
INVENTOR Object Library
58 pages

IR Lecture 5b

Uploaded by

IR Lecture 5b

Uploaded by

Evaluation

 There are many retrieval models/ algorithms/

 Traditional goal of IR is to retrieve all and

where, A is set of relevant docments,

 For a given query, produce the ranked list

 Interpolate a precision value for each

 This measure ranges from 1 for the best

 Realistic IR is interactive; traditional IR

You might also like