Lect 2 - Boolean Retrieval
Lect 2 - Boolean Retrieval
Alternative
models of IR
DS414 information Retrieval & Search Engines 9
Classical IR Model
⚫ based on mathematical knowledge that was easily
recognized and well understood
⚫ simple, efficient and easy to implement
⚫ The three classical information retrieval models are:
-Boolean
-Vector space
-Probabilistic models
Ex. query:
● retrieved (matching)
Returns a set of documents that “exactly” satisfy the query (Boolean expression)
Used?
database (its index). The search engine can then analyze and
what data do
we want to search? format conversion. international?
which part contains “meaning”?
word units? stopping? stemming?
DS414 information Retrieval & Search Engines 23
Search process (online)
normalizer
like wink he like drink
Terms
modified token
indexer He 2 4
likes 1 2
Inverted index
DS414 information Retrieval & Search Engines
wink 3 2 29
Term docID Term docID Term Doc # Term freq
ambitious 2 ambitious 2 1
I 1
Doc 1
it 2 1
the 1 hath 1 julius 1 1
capitol 1 I 1 killed 1 2
brutus 1 I 1
let 2 1
me 1 1
I did enact Julius killed 1 i' 1 noble 2 1
preprocess
with 2
step.). me 1 with 2 1
Doc 2
caesar 2 noble 2
the 2 so 2
noble 2 the 1 Dictionary &
the 2
posting
brutus 2
hath 2 told 2
So let it be with you 2
told 2
Caesar. The noble you 2 was 1 • Multiple term entries
Brutus hath told you caesar 2 was 2 in a single document
are merged.
with 2
Caesar was ambitious was 2
ambitious 2 • Frequency information
DS414 information Retrieval & Search Engines is added 30
Inverted matrix
Indexer steps Term
ambitious
Doc #
2
Term freq
1 docID
be 2 1
brutus 1 1
brutus 2 1
capitol 1 1
caesar 1 1
caesar 2 2
did 1 1
enact 1 1
hath 2 1
I 1 2
i' 1 1
it 2 1
julius 1 1
killed 1 2
let 2 1
me 1 1
noble 2 1
so 2 1
the 1 1
the 2 1
told 2 1
you 2 1
was 1 1
was 2 1
with 2 1
DS414 information Retrieval & Search Engines 31
Indexer steps
2 4 8 16 32 64 128 Brutus
2 8
1 2 3 5 8 13 21 34 Caesar
If the list lengths are x and y, the merge takes O(x+y) operations.