Lecture15 Learning Ranking
Lecture15 Learning Ranking
Retrieval
Introduction to
Information Retrieval
CS276: Information Retrieval and
Web Search
Christopher Manning and Pandu
Nayak
Lecture 15: Learning to Rank
Introduction to Information
Retrieval
Sec. 15.4
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Sec. 15.4.1
Simple example:
Using classification for ad hoc IR
Collect a training corpus of (q, d, r) triples
Relevance r is here binary (but may be multiclass,
with 37 values)
Introduction to Information
Retrieval
Sec. 15.4.1
Simple example:
Using classification for ad hoc IR
A linear score function is then
Score(d, q) = Score(, ) = a + b + c
And the linear classifier is
Decide relevant if Score(d, q) >
just like when we were doing text
classification
Introduction to Information
Retrieval
Sec. 15.4.1
Simple example:
Using classification for ad hoc IR
cosine score
0.05
R
R
0.025
R
R
R
R
R
R
N
N
N
R
N
N
0
2
R
N
4
5
Term proximity
Decision
surface
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Disk 4-5
Disk 3
Disk 4-5
WT10G
(web)
LM
0.1785
0.2503
0.2666
SVM
0.1728
0.2432
0.2750
LM
0.1773
0.2516
0.2656
SVM
0.1646
0.2355
0.2675
Introduction to Information
Retrieval
Sec. 15.4.2
Learning to rank
Classification probably isnt the right way to
think about approaching ad hoc IR:
Classification problems: Map to a unordered set of
classes
Regression problems: Map to a real value
Ordinal regression problems: Map to an ordered set
of classes
A fairly obscure sub-branch of statistics, but what we want
here
Introduction to Information
Retrieval
Learning to rank
Assume a number of categories C of relevance exist
These are totally ordered: c1 < c2 < < cJ
This is the ordinal regression setup
Introduction to Information
Retrieval
Point-wise learning
Goal is to learn a threshold to separate
each rank
Introduction to Information
Retrieval
Sec. 15.4.2
Introduction to Information
Retrieval
Sec. 15.4.2
Introduction to Information
Retrieval
Sec. 15.4.2
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Sec. 15.4.2
Introduction to Information
Retrieval
can be rewritten as
minw (1/2C)wTw + u
and for all u such that zu < 0, u 1 (wu)
Introduction to Information
Retrieval
wu
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
350,000 articles
106 queries
16,140 query-document pairs
3 class judgments: Definitely relevant (DR), Partially
Relevant (PR), Non-Relevant (NR)
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Experiments
OHSUMED (from LETOR)
Features:
6 that represent versions of tf, idf, and tf.idf
factors
BM25 score (IIR sec. 11.4.3)
A scoring function derived from a probabilistic
approach to IR, which has traditionally done well in
TREC evaluations, etc.
Introduction to Information
Retrieval
Experimental Results
(OHSUMED)
Introduction to Information
Retrieval
Definitive
Excellent
Good
Fair
Bad
Detrimental
8990
4403
3735
20463
36375
310
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Discontinuity Example
NDCG computed using rank positions
Ranking via retrieval scores
Slight changes to model parameters
Slight changes to retrieval scores
No change to ranking
No change to NDCG
d1
d2
d3
Rank
Relevance
Introduction to Information
Retrieval
Structural SVMs
[Tsochantaridis et
al., 2007]
Structural SVMs are a generalization of SVMs where the
output classification space is not binary or one of a set of
classes, but some complex object (such as a sequence
or a parse tree)
Here, it is a complete (weak) ranking of documents for a
query
The Structural SVM attempts to predict the complete
ranking for the input query and document set
The true labeling is a ranking where the relevant
documents are all ranked in the front, e.g.,
An incorrect labeling would be any other ranking, e.g.,
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Boosting: RankBoost
Ordinal Regression loglinear models
Neural Nets: RankNet
(Gradient-boosted) Decisision Trees
Introduction to Information
Retrieval
Introduction to Information
Retrieval
Summary
The idea of learning ranking functions has been
around for about 20 years
But only recently have ML knowledge, availability of
training datasets, a rich space of features, and
massive computation come together to make this a
hot research area
Its too early to give a definitive statement on what
methods are best in this area its still advancing
rapidly
But machine learned ranking over many features now
easily beats traditional hand-designed ranking
functions in comparative evaluations [in part by using the handdesigned functions as features!]
Introduction to Information
Retrieval
Resources
IIR secs 6.1.23 and 15.4
LETOR benchmark datasets
Website with data, links to papers, benchmarks, etc.
http://research.microsoft.com/users/LETOR/
Everything you need to start research in this area!