Probabilistic Information Retrieval Model
Probabilistic Information Retrieval Model
http://informationretrieval.org
Hinrich Schütze
2011-05-23
1 / 51
Overview
1 Recap
2 Probabilistic Approach to IR
5 Appraisal&Extensions
2 / 51
Outline
1 Recap
2 Probabilistic Approach to IR
5 Appraisal&Extensions
3 / 51
Relevance feedback: Basic idea
4 / 51
Rocchio illustrated
x
~qopt ~ ~ NR x
µR − µ
x x
~µR
x
x
~µNR
5 / 51
Types of query expansion
6 / 51
Query expansion at search engines
7 / 51
Take-away today
8 / 51
Outline
1 Recap
2 Probabilistic Approach to IR
5 Appraisal&Extensions
9 / 51
Relevance feedback from last lecture
10 / 51
Probabilistic Approach to Retrieval
11 / 51
Probabilistic IR Models at a Glance
12 / 51
Exercise: Probabilistic model vs. other models
Boolean model
Probabilistic models support ranking and thus are better than
the simple Boolean model.
Vector space model
The vector space model is also a formally defined model that
supports ranking.
Why would we want to look for an alternative to the vector
space model?
13 / 51
Probabilistic vs. vector space model
14 / 51
Outline
1 Recap
2 Probabilistic Approach to IR
5 Appraisal&Extensions
15 / 51
Basic Probability Theory
For events A and B
Joint probability P(A ∩ B) of both events occurring
Conditional probability P(A|B) of event A occurring given that
event B has occurred
Chain rule gives fundamental relationship between joint and
conditional probabilities:
P(AB) = P(B|A)P(A)
17 / 51
Outline
1 Recap
2 Probabilistic Approach to IR
5 Appraisal&Extensions
18 / 51
The Document Ranking Problem
19 / 51
Probability Ranking Principle (PRP)
PRP in brief
If the retrieved documents (w.r.t a query) are ranked
decreasingly on their probability of relevance, then the
effectiveness of the system will be the best that is obtainable
PRP in full
If [the IR] system’s response to each [query] is a ranking of the
documents [...] in order of decreasing probability of relevance
to the [query], where the probabilities are estimated as
accurately as possible on the basis of whatever data have been
made available to the system for this purpose, the overall
effectiveness of the system to its user will be the best that is
obtainable on the basis of those data
20 / 51
Binary Independence Model (BIM)
21 / 51
Binary incidence matrix
Anthony Julius The Hamlet Othello Macbeth ...
and Caesar Tempest
Cleopatra
Anthony 1 1 0 0 0 1
Brutus 1 1 0 1 0 0
Caesar 1 1 0 1 1 1
Calpurnia 0 1 0 0 0 0
Cleopatra 1 0 0 0 0 0
mercy 1 0 1 1 1 1
worser 1 0 1 1 1 0
...
Each document is represented as a binary vector ∈ {0, 1}|V | .
22 / 51
Binary Independence Model
23 / 51
Binary Independence Model
24 / 51
Binary Independence Model
P(R|d, q) is modeled using term incidence vectors as P(R|~x , ~q )
25 / 51
Deriving a Ranking Function for Query Terms (1)
26 / 51
Deriving a Ranking Function for Query Terms (2)
So:
M
Y P(xt |R = 1, ~q )
O(R|~x , ~q ) = O(R|~q ) ·
P(xt |R = 0, ~q )
t=1
27 / 51
Exercise
28 / 51
Deriving a Ranking Function for Query Terms (3)
29 / 51
Deriving a Ranking Function for Query Terms (4)
30 / 51
Deriving a Ranking Function for Query Terms
31 / 51
Deriving a Ranking Function for Query Terms
Including the query terms found in the document into the right
product, but simultaneously dividing by them in the left product,
gives:
Y pt (1 − ut ) Y 1 − pt
O(R|~x , ~q ) = O(R|~q ) · ·
ut (1 − pt ) 1 − ut
t:xt =qt =1 t:qt =1
32 / 51
Deriving a Ranking Function for Query Terms
Equivalent: rank documents using the log odds ratios for the terms
in the query ct :
pt (1 − ut ) pt ut
ct = log = log − log
ut (1 − pt ) (1 − pt ) 1 − ut
The odds ratio is the ratio of two odds: (i) the odds of the
term appearing if the document is relevant (pt /(1 − pt )), and
(ii) the odds of the term appearing if the document is
nonrelevant (ut /(1 − ut ))
ct = 0: term has equal odds of appearing in relevant and
nonrelevant docs
ct positive: higher odds to appear in relevant documents
ct negative: higher odds to appear in nonrelevant documents
33 / 51
Term weight ct in BIM
pt ut
ct = log (1−p t)
− log 1−u t
functions as a term weight.
P
Retrieval status value for document d: RSVd = xt =qt =1 ct .
So BIM and vector space model are identical on an
operational level . . .
. . . except that the term weights are different.
In particular: we can use the same data structures (inverted
index etc) for the two models.
34 / 51
How to compute probability estimates
pt = s/S
ut = (df t − s)/(N − S)
s/(S − s)
ct = K (N, df t , S, s) = log
(df t − s)/((N − df t ) − (S − s))
35 / 51
Avoiding zeros
36 / 51
Exercise
37 / 51
Simplifying assumption
38 / 51
Probability estimates in relevance feedback
39 / 51
Probability estimates in adhoc retrieval
1 Recap
2 Probabilistic Approach to IR
5 Appraisal&Extensions
41 / 51
History and summary of assumptions
42 / 51
How different are vector space and BIM?
43 / 51
Okapi BM25: Overview
44 / 51
Okapi BM25: Starting point
45 / 51
Okapi BM25 basic weighting
46 / 51
Exercise
47 / 51
Okapi BM25 weighting for long queries
49 / 51
Take-away today
50 / 51
Resources
Chapter 11 of IIR
Resources at http://cislmu.org
51 / 51