0% found this document useful (0 votes)

133 views

Lecture15 Learning Ranking

This document provides an introduction to using machine learning techniques for information retrieval ranking tasks. It discusses how early research in this area was limited by small datasets and less sophisticated machine learning models. More recently, major search engines have successfully adopted learning to rank approaches using large numbers of complex features. Two main methods are described: pointwise learning, which aims to separate relevance levels with thresholds, and pairwise learning using Ranking SVMs, which formulates the problem as classifying pairs of documents as correctly or incorrectly ranked. An example Ranking SVM model for IR ranking is presented.

Uploaded by

Đừng Buông Tay Anh

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

133 views

Lecture15 Learning Ranking

Uploaded by

Đừng Buông Tay Anh

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

Introduction to Information

Retrieval

Introduction to

Information Retrieval
CS276: Information Retrieval and
Web Search
Christopher Manning and Pandu
Nayak
Lecture 15: Learning to Rank

Introduction to Information
Retrieval

Machine learning for IR

ranking?

Sec. 15.4

Weve looked at methods for ranking documents in

IR
Cosine similarity, inverse document frequency,
proximity, pivoted document length normalization,
Pagerank,

Weve looked at methods for classifying documents

using supervised machine learning classifiers
Nave Bayes, Rocchio, kNN, SVMs

Surely we can also use machine learning to rank

the documents displayed in search results?
Sounds like a good idea
A.k.a. machine-learned relevance or learning to rank

Introduction to Information
Retrieval

Machine learning for IR ranking

This good idea has been actively
researched and actively deployed by
major web search engines in the last 7
or so years
Why didnt it happen earlier?
Modern supervised ML has been around for
about 20 years
Nave Bayes has been around for about 50
years

Introduction to Information
Retrieval

Machine learning for IR ranking

Theres some truth to the fact that the IR
community wasnt very connected to the ML
community
But there were a whole bunch of precursors:
Wong, S.K. et al. 1988. Linear structure in
information retrieval. SIGIR 1988.
Fuhr, N. 1992. Probabilistic methods in information
retrieval. Computer Journal.
Gey, F. C. 1994. Inferring probability of relevance
using the method of logistic regression. SIGIR 1994.
Herbrich, R. et al. 2000. Large Margin Rank
Boundaries for Ordinal Regression. Advances in
Large Margin Classifiers.

Introduction to Information
Retrieval

Why werent early attempts very

successful/influential?
Sometimes an idea just takes time to be
appreciated
Limited training data
Especially for real world use (as opposed to
writing academic papers), it was very hard to
gather test collection queries and relevance
judgments that are representative of real user
needs and judgments on documents returned
This has changed, both in academia and industry

Poor machine learning techniques

Insufficient customization to IR problem
Not enough features for ML to show value

Introduction to Information
Retrieval

Why wasnt ML much needed?

Traditional ranking functions in IR used a
very small number of features, e.g.,
Term frequency
Inverse document frequency
Document length

It was easy to tune weighting coefficients

by hand
And people did

Introduction to Information
Retrieval

Why is ML needed now?

Modern systems especially on the Web use a
great number of features:
Arbitrary useful features not a single unified model

Log frequency of query word in anchor text?

Query word in color on page?
# of images on page?
# of (out) links on page?
PageRank of page?
URL length?
URL contains ~?
Page edit recency?
Page length?

The New York Times (2008-06-03) quoted Amit

Singhal as saying Google was using over 200
such features.

Introduction to Information
Retrieval

Sec. 15.4.1

Simple example:
Using classification for ad hoc IR
Collect a training corpus of (q, d, r) triples
Relevance r is here binary (but may be multiclass,
with 37 values)

Document is represented by a feature vector

x = (, ) is cosine similarity, is minimum query
window size
is the the shortest text span that includes all
query words
Query term proximity is a very important new
weighting factor

Train a machine learning model to predict the

class r of a document-query pair

Introduction to Information
Retrieval

Sec. 15.4.1

Simple example:
Using classification for ad hoc IR
A linear score function is then
Score(d, q) = Score(, ) = a + b + c
And the linear classifier is
Decide relevant if Score(d, q) >
just like when we were doing text
classification

Introduction to Information
Retrieval

Sec. 15.4.1

Simple example:
Using classification for ad hoc IR
cosine score

0.05

R
R

0.025
R

R
R

R
R
R
N

N
N

R
N
N

0
2

R
N

4
5
Term proximity

Decision
surface

Introduction to Information
Retrieval

More complex example of using

classification for search ranking
[Nallapati 2004]

We can generalize this to classifier

functions over more features
We can use methods we have seen
previously for learning the linear classifier
weights

Introduction to Information
Retrieval

An SVM classifier for information

retrieval [Nallapati 2004]

Let g(r|d,q) = wf(d,q) + b

SVM training: want g(r|d,q) 1 for nonrelevant
documents and g(r|d,q) 1 for relevant
documents
SVM testing: decide relevant iff g(r|d,q) 0
Features are not word presence features (how would
you deal with query words not in your training data?)
but scores like the summed (log) tf of all query terms
Unbalanced data (which can result in trivial alwayssay-nonrelevant classifiers) is dealt with by
undersampling nonrelevant documents during
training (just take some at random)
[there are other
ways of doing this cf. Cao et al. later]

Introduction to Information
Retrieval

An SVM classifier for information

retrieval [Nallapati 2004]
Experiments:
4 TREC data sets
Comparisons with Lemur, a state-of-the-art
open source IR engine (Language Model (LM)based see IIR ch. 12)
Linear kernel normally best or almost as good
as quadratic kernel, and so used in reported
results
6 features, all variants of tf, idf, and tf.idf
scores

Introduction to Information
Retrieval

An SVM classifier for information

retrieval [Nallapati 2004]
Train \
Test
Disk 3

Disk 4-5

Disk 3

Disk 4-5

WT10G
(web)

0.1785

0.2503

0.2666

SVM

0.1728

0.2432

0.2750

0.1773

0.2516

0.2656

SVM

0.1646

0.2355

0.2675

At best the results are about equal to LM

Actually a little bit below

Papers advertisement: Easy to add more

features
This is illustrated on a homepage finding task on
WT10G:
Baseline LM 52% success@10, baseline SVM 58%
SVM with URL-depth, and in-link features: 78%
S@10

Introduction to Information
Retrieval

Sec. 15.4.2

Learning to rank
Classification probably isnt the right way to
think about approaching ad hoc IR:
Classification problems: Map to a unordered set of
classes
Regression problems: Map to a real value
Ordinal regression problems: Map to an ordered set
of classes
A fairly obscure sub-branch of statistics, but what we want
here

This formulation gives extra power:

Relations between relevance levels are modeled
Documents are good versus other documents for
query given collection; not an absolute scale of
goodness

Introduction to Information
Retrieval

Learning to rank
Assume a number of categories C of relevance exist
These are totally ordered: c1 < c2 < < cJ
This is the ordinal regression setup

Assume training data is available consisting of

document-query pairs represented as feature
vectors i and relevance ranking ci
We could do point-wise learning, where we try to
map items of a certain relevance rank to a
subinterval (e.g, Crammer et al. 2002 PRank)
But most work does pair-wise learning, where the
input is a pair of results for a query, and the class is
the relevance ordering relationship between them

Introduction to Information
Retrieval

Point-wise learning
Goal is to learn a threshold to separate
each rank

Introduction to Information
Retrieval

Sec. 15.4.2

Pairwise learning: The Ranking

SVM
[Herbrich et al. 1999, 2000; Joachims et al. 2002]

Aim is to classify instance pairs as

correctly ranked or incorrectly ranked
This turns an ordinal regression problem back
into a binary classification problem

We want a ranking function f such that

ci > ck iff f(i) > f(k)
or at least one that tries to do this with
minimal error
Suppose that f is a linear function
f(i) = wi

Introduction to Information
Retrieval

The Ranking SVM

[Herbrich et al. 1999, 2000; Joachims et al. 2002]

Ranking Model: f(i)

Sec. 15.4.2

Introduction to Information
Retrieval

Sec. 15.4.2

The Ranking SVM

[Herbrich et al. 1999, 2000; Joachims et al. 2002]

Then (combining the two equations on the

last slide):
ci > ck iff w(i k) > 0
Let us then create a new instance space
from such pairs:
u = (di, dj, q) = i k
zu = +1, 0, 1 as ci >,=,< ck
We can build model over just cases for
which zu = 1
From training data S = { u}, we train an
SVM

Introduction to Information
Retrieval

Two queries in the original

space

Introduction to Information
Retrieval

Two queries in the pairwise

space

Introduction to Information
Retrieval

Sec. 15.4.2

The Ranking SVM

[Herbrich et al. 1999, 2000; Joachims et al. 2002]

The SVM learning task is then like other

examples that we saw before
Find w and u 0 such that
wTw + C u is minimized, and
for all u such that zu < 0, wu 1 u

We can just do the negative zu, as ordering

is antisymmetric
You can again use SVMlight (or other good
SVM libraries) to train your model (SVMrank
specialization)

Introduction to Information
Retrieval

Aside: The SVM loss function

The minimization
minw wTw + C u
and for all u such that zu < 0, wu 1 u

can be rewritten as
minw (1/2C)wTw + u
and for all u such that zu < 0, u 1 (wu)

Now, taking = 1/2C, we can reformulate this

as
minw [1 (wu)]+ + wTw
Where []+ is the positive part (0 if a term is
negative)

Introduction to Information
Retrieval

Aside: The SVM loss function

Regularizer
Hinge loss
The reformulation
ofw
minw [1 (wu)]+ + wTw
shows that an SVM can be thought of as
having an empirical hinge loss
combined with a weight regularizer
Loss

Introduction to Information
Retrieval

Adapting the Ranking SVM for

(successful) Information Retrieval
[Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou
Huang, Hsiao-Wuen Hon SIGIR 2006]

A Ranking SVM model already works well

Using things like vector space model scores as
features
As we shall see, it outperforms them in
evaluations

But it does not model important aspects of

practical IR well
This paper addresses two customizations
of the Ranking SVM to fit an IR utility
model

Introduction to Information
Retrieval

The ranking SVM fails to model the IR

problem well
1. Correctly ordering the most relevant
documents is crucial to the success of an IR
system, while misordering less relevant
results matters little
The ranking SVM considers all ordering violations
as the same

2. Some queries have many (somewhat)

relevant documents, and other queries few.
If we treat all pairs of results for a query
equally, queries with many results will
dominate the learning
But actually queries with few relevant results are at
least as important to do well on

Introduction to Information
Retrieval

Based on the LETOR test

collection

From Microsoft Research Asia

An openly available standard test collection with
pregenerated features, baselines, and research
results for learning to rank
Its availability has really driven research in this
area
OHSUMED, MEDLINE subcollection for IR

350,000 articles
106 queries
16,140 query-document pairs
3 class judgments: Definitely relevant (DR), Partially
Relevant (PR), Non-Relevant (NR)

TREC GOV collection (predecessor of GOV2, cf. IIR

p. 142)
1 million web pages
125 queries

Introduction to Information
Retrieval

Principal components projection of 2

queries
[solid = q12, open = q50; circle = DR, square = PR,
triangle = NR]

Introduction to Information
Retrieval

Ranking scale importance

discrepancy
[r3 = Definitely Relevant, r2 = Partially Relevant, r1 =
Nonrelevant]

Introduction to Information
Retrieval

Number of training documents per

query discrepancy [solid = q12, open =
q50]

Introduction to Information
Retrieval

Recap: Two Problems with Direct

Application of the Ranking SVM
Cost sensitiveness: negative effects of making errors
on top ranked documents
d: definitely relevant, p: partially relevant, n: not
relevant
ranking 1: p d p n n n n
ranking 2: d p n p n n n
Query normalization: number of instance pairs varies
according to query
q1: d p p n n n n
q2: d d p p p n n n n n
q1 pairs: 2*(d, p) + 4*(d, n) + 8*(p, n) = 14
q2 pairs: 6*(d, p) + 10*(d, n) + 15*(p, n) = 31

Introduction to Information
Retrieval

These problems are solved with a new

Loss function

weights for type of rank difference

Estimated empirically from effect on NDCG

weights for size of ranked result set

Linearly scaled versus biggest result set

Introduction to Information
Retrieval

Experiments
OHSUMED (from LETOR)
Features:
6 that represent versions of tf, idf, and tf.idf
factors
BM25 score (IIR sec. 11.4.3)
A scoring function derived from a probabilistic
approach to IR, which has traditionally done well in
TREC evaluations, etc.

Introduction to Information
Retrieval

Experimental Results
(OHSUMED)

Introduction to Information
Retrieval

MSN Search [now Bing]

Second experiment with MSN search
Collection of 2198 queries
6 relevance levels rated:

Definitive
Excellent
Good
Fair
Bad
Detrimental

8990
4403
3735
20463
36375
310

Introduction to Information
Retrieval

Experimental Results (MSN

search)

Introduction to Information
Retrieval

Alternative: Optimizing Rank-Based

Measures
[Yue et al. SIGIR 2007]

If we think that NDCG is a good

approximation of the users utility function
from a result ranking
Then, lets directly optimize this measure
As opposed to some proxy (weighted pairwise
prefs)

But, there are problems

Objective function no longer decomposes
Pairwise prefs decomposed into each pair

Objective function is flat or discontinuous

Introduction to Information
Retrieval

Discontinuity Example
NDCG computed using rank positions
Ranking via retrieval scores
Slight changes to model parameters
Slight changes to retrieval scores
No change to ranking
No change to NDCG

NDCG = 0.63Retrieval Score 0.9 0.6 0.3

NDCG discontinuous
w.r.t model
parameters!

Rank

Relevance

Introduction to Information
Retrieval

Structural SVMs

[Tsochantaridis et

al., 2007]
Structural SVMs are a generalization of SVMs where the
output classification space is not binary or one of a set of
classes, but some complex object (such as a sequence
or a parse tree)
Here, it is a complete (weak) ranking of documents for a
query
The Structural SVM attempts to predict the complete
ranking for the input query and document set
The true labeling is a ranking where the relevant
documents are all ranked in the front, e.g.,
An incorrect labeling would be any other ranking, e.g.,

There are an intractable number of rankings, thus

an intractable number of constraints!

Introduction to Information
Retrieval

Structural SVM training

[Tsochantaridis et al., 2007]
Structural SVM training proceeds incrementally by starting with a
working set of constraints, and adding in the most violated
constraint at each iteration

Original SVM Problem

Exponential constraints
Most are dominated by a
small set of important
constraints

Structural SVM Approach

Repeatedly finds the next most

violated constraint
until a set of constraints which
is a good approximation is found

Introduction to Information
Retrieval

Other machine learning methods for

learning to rank
Of course!
Ive only presented the use of SVMs for
machine learned relevance, but other
machine learning methods have also been
used successfully

Boosting: RankBoost
Ordinal Regression loglinear models
Neural Nets: RankNet
(Gradient-boosted) Decisision Trees

Introduction to Information
Retrieval

The Limitation of Machine

Learning

Everything that we have looked at (and most

work in this area) produces linear models of
features by weighting different base features
This contrasts with most of the clever ideas of
traditional IR, which are nonlinear scalings and
combinations of basic measurements
log term frequency, idf, pivoted length
normalization

At present, ML is good at weighting features,

but not at coming up with nonlinear scalings
Designing the basic features that give good signals
for ranking remains the domain of human creativity

Introduction to Information
Retrieval

Summary
The idea of learning ranking functions has been
around for about 20 years
But only recently have ML knowledge, availability of
training datasets, a rich space of features, and
massive computation come together to make this a
hot research area
Its too early to give a definitive statement on what
methods are best in this area its still advancing
rapidly
But machine learned ranking over many features now
easily beats traditional hand-designed ranking
functions in comparative evaluations [in part by using the handdesigned functions as features!]

And there is every reason to think that the

importance of machine learning in IR will only
increase in the future.

Introduction to Information
Retrieval

Resources
IIR secs 6.1.23 and 15.4
LETOR benchmark datasets
Website with data, links to papers, benchmarks, etc.
http://research.microsoft.com/users/LETOR/
Everything you need to start research in this area!

Nallapati, R. Discriminative models for

information retrieval. SIGIR 2004.
Cao, Y., Xu, J. Liu, T.-Y., Li, H., Huang, Y. and
Hon, H.-W. Adapting Ranking SVM to
Document Retrieval, SIGIR 2006.
Y. Yue, T. Finley, F. Radlinski, T. Joachims. A
Support Vector Method for Optimizing Average
Precision. SIGIR 2007.

Lecture15 Learning Ranking
No ratings yet
Lecture15 Learning Ranking
46 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
28 pages
Lecture8-Evaluation 2013
No ratings yet
Lecture8-Evaluation 2013
44 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
40 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Materi Pertemuan Ke-1-Dno 2018-1
No ratings yet
Materi Pertemuan Ke-1-Dno 2018-1
42 pages
Chap 13
No ratings yet
Chap 13
68 pages
Information Storage and Retrival
No ratings yet
Information Storage and Retrival
31 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
32 pages
Evaluation and Result Summaries
No ratings yet
Evaluation and Result Summaries
60 pages
14 Vcat
No ratings yet
14 Vcat
66 pages
Ip 8
No ratings yet
Ip 8
51 pages
Liu 2009
No ratings yet
Liu 2009
109 pages
IR_2 unit
No ratings yet
IR_2 unit
46 pages
Ranked Retrieval: Thus Far, Our Queries Have All Been Boolean
No ratings yet
Ranked Retrieval: Thus Far, Our Queries Have All Been Boolean
40 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
62 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
67 pages
1.introduction Information Retrival
No ratings yet
1.introduction Information Retrival
31 pages
Relevance Feedback
No ratings yet
Relevance Feedback
47 pages
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
No ratings yet
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
16 pages
Neural IR
No ratings yet
Neural IR
45 pages
5-Introduction To Information Retrieval
No ratings yet
5-Introduction To Information Retrieval
3 pages
Information Retrieval System and The Pagerank Algorithm
No ratings yet
Information Retrieval System and The Pagerank Algorithm
37 pages
6 Tfidf
No ratings yet
6 Tfidf
48 pages
Manning Christopher, Prabhakar Raghavan, Hinrich Schu Tze: Introduction To Information Retrieval
No ratings yet
Manning Christopher, Prabhakar Raghavan, Hinrich Schu Tze: Introduction To Information Retrieval
4 pages
lecture6-tfidf Vector Space Model (2)
No ratings yet
lecture6-tfidf Vector Space Model (2)
45 pages
Vector Space and IR Evaluation
No ratings yet
Vector Space and IR Evaluation
41 pages
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
No ratings yet
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
77 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
54 pages
F-IR
No ratings yet
F-IR
30 pages
IR Evaluation Tugas Kampus
No ratings yet
IR Evaluation Tugas Kampus
25 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
42 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
Relevance Feedback: Improving Results
No ratings yet
Relevance Feedback: Improving Results
41 pages
C3 IndexConstruction
No ratings yet
C3 IndexConstruction
46 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
49 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
IRS Unit 3 by Krishna
No ratings yet
IRS Unit 3 by Krishna
50 pages
lecture16-linkanalysis.pptx
No ratings yet
lecture16-linkanalysis.pptx
58 pages
8.relavance Feedback - II
No ratings yet
8.relavance Feedback - II
52 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
60 pages
5-Probabilistic-Retrieval-FSS20
No ratings yet
5-Probabilistic-Retrieval-FSS20
40 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
48 pages
4_IRModels
No ratings yet
4_IRModels
46 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Lecture10 Efficient Scoring
No ratings yet
Lecture10 Efficient Scoring
19 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Chapter4 Indexconstruction
No ratings yet
Chapter4 Indexconstruction
49 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
38 pages
Information Retrieval Models: Vector Space Models: Chengxiang Zhai
No ratings yet
Information Retrieval Models: Vector Space Models: Chengxiang Zhai
30 pages
1 Overview
No ratings yet
1 Overview
44 pages
Information Retrieval: Introduction To
No ratings yet
Information Retrieval: Introduction To
48 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
Completed Unit II 17.7.17
No ratings yet
Completed Unit II 17.7.17
113 pages
IR Models
No ratings yet
IR Models
65 pages
I R Rank
No ratings yet
I R Rank
52 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
Bentley MXROAD V8i Fundamental ASTASOFT Com PDF
No ratings yet
Bentley MXROAD V8i Fundamental ASTASOFT Com PDF
2 pages
Imx183clk-J CQJ-J Flyer
No ratings yet
Imx183clk-J CQJ-J Flyer
2 pages
Current & Saving Account Statement: Milind Nayan S O Late Garib Shah Yoga Niketan Ashram Muni Ki Reti Rishikesh Rishikesh
No ratings yet
Current & Saving Account Statement: Milind Nayan S O Late Garib Shah Yoga Niketan Ashram Muni Ki Reti Rishikesh Rishikesh
3 pages
ETABS-Exmple Using Is 456
No ratings yet
ETABS-Exmple Using Is 456
33 pages
Peterbilt - Medium Duty Trucks - Owners Manual
No ratings yet
Peterbilt - Medium Duty Trucks - Owners Manual
280 pages
Object Oriented Analysis and Design Class Notes
No ratings yet
Object Oriented Analysis and Design Class Notes
72 pages
CH 14
100% (3)
CH 14
4 pages
R&S Receiver Architecture
No ratings yet
R&S Receiver Architecture
78 pages
TEJ Exam Hall Ticket - HTM
No ratings yet
TEJ Exam Hall Ticket - HTM
2 pages
CPS3500 5000 7500pro Series InfoGuide
No ratings yet
CPS3500 5000 7500pro Series InfoGuide
4 pages
Audio Amplifiers
No ratings yet
Audio Amplifiers
63 pages
Lenovo-Legion-Pro-5i-16IRX9-Datasheet
No ratings yet
Lenovo-Legion-Pro-5i-16IRX9-Datasheet
2 pages
Extended Response-1 - 240229 - 120756
No ratings yet
Extended Response-1 - 240229 - 120756
20 pages
TRTS0980EN-US 0317w
100% (3)
TRTS0980EN-US 0317w
600 pages
On The Problem of Finding All Minimum Spanning Trees: João Guilherme Martinez
No ratings yet
On The Problem of Finding All Minimum Spanning Trees: João Guilherme Martinez
159 pages
Iec 60027letter Symbols To Be Used in Electrical Technology
No ratings yet
Iec 60027letter Symbols To Be Used in Electrical Technology
16 pages
PDMS Shortcuts - The Piping Engineering World
100% (1)
PDMS Shortcuts - The Piping Engineering World
10 pages
BookMindAI English
No ratings yet
BookMindAI English
7 pages
Tapping Calc Menu
No ratings yet
Tapping Calc Menu
61 pages
Kistler Pliki Produktow Process Automation Nn188
No ratings yet
Kistler Pliki Produktow Process Automation Nn188
170 pages
VIZ-PRO Magnetic Dry Erase Board, 24 X 18 Inches, Silver Aluminium Frame Office Products
No ratings yet
VIZ-PRO Magnetic Dry Erase Board, 24 X 18 Inches, Silver Aluminium Frame Office Products
1 page
SCWCD Exam
No ratings yet
SCWCD Exam
39 pages
DDoad A Document - Scribd
No ratings yet
DDoad A Document - Scribd
2 pages
278 hw5
No ratings yet
278 hw5
20 pages
Learn Anything From: The Internet
No ratings yet
Learn Anything From: The Internet
3 pages
User Interface Design
No ratings yet
User Interface Design
43 pages
Brno 2dec10 Epm
No ratings yet
Brno 2dec10 Epm
11 pages
Tekla Ts Etabs Link
100% (1)
Tekla Ts Etabs Link
17 pages
Angela Resume
No ratings yet
Angela Resume
1 page
ec200ueg91xu_series_esim_application_note_v1-0
No ratings yet
ec200ueg91xu_series_esim_application_note_v1-0
42 pages

Lecture15 Learning Ranking

Uploaded by

Lecture15 Learning Ranking

Uploaded by

Introduction to Information

Machine learning for IR

Weve looked at methods for ranking documents in

Weve looked at methods for classifying documents

Surely we can also use machine learning to rank

Machine learning for IR ranking

Machine learning for IR ranking

Why werent early attempts very

Poor machine learning techniques

Why wasnt ML much needed?

It was easy to tune weighting coefficients

Why is ML needed now?

Log frequency of query word in anchor text?

The New York Times (2008-06-03) quoted Amit

Document is represented by a feature vector

Train a machine learning model to predict the

More complex example of using

We can generalize this to classifier

An SVM classifier for information

Let g(r|d,q) = wf(d,q) + b

An SVM classifier for information

An SVM classifier for information

At best the results are about equal to LM

Papers advertisement: Easy to add more

This formulation gives extra power:

Assume training data is available consisting of

Pairwise learning: The Ranking

Aim is to classify instance pairs as

We want a ranking function f such that

The Ranking SVM

Ranking Model: f(i)

The Ranking SVM

Then (combining the two equations on the

Two queries in the original

Two queries in the pairwise

The Ranking SVM

The SVM learning task is then like other

We can just do the negative zu, as ordering

Aside: The SVM loss function

Now, taking = 1/2C, we can reformulate this

Aside: The SVM loss function

Adapting the Ranking SVM for

A Ranking SVM model already works well

But it does not model important aspects of

The ranking SVM fails to model the IR

2. Some queries have many (somewhat)

Based on the LETOR test

From Microsoft Research Asia

TREC GOV collection (predecessor of GOV2, cf. IIR

Principal components projection of 2

Ranking scale importance

Number of training documents per

Recap: Two Problems with Direct

These problems are solved with a new

weights for type of rank difference

weights for size of ranked result set

MSN Search [now Bing]

Experimental Results (MSN

Alternative: Optimizing Rank-Based

If we think that NDCG is a good

But, there are problems

Objective function is flat or discontinuous

NDCG = 0.63Retrieval Score 0.9 0.6 0.3

There are an intractable number of rankings, thus

Structural SVM training

Original SVM Problem

Structural SVM Approach

Repeatedly finds the next most

Other machine learning methods for

The Limitation of Machine

Everything that we have looked at (and most

At present, ML is good at weighting features,

And there is every reason to think that the

Nallapati, R. Discriminative models for

You might also like