0% found this document useful (0 votes)

1K views

Lecture 3-Skip Pointers and Phrase Queries

This document discusses techniques for improving the speed of merging postings lists during information retrieval queries. It introduces skip pointers, which allow queries to skip over postings that will not match, speeding up the merging process. It recommends placing skip pointers every square root of the postings list length on average. The document also discusses using positional postings to support phrase queries by checking that terms appear in the right order within a document.

Uploaded by

Yash Gupta

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views

Lecture 3-Skip Pointers and Phrase Queries

Uploaded by

Yash Gupta

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 12

Introduction to Information

Retrieval

Introduction to

Information Retrieval
Faster postings merges:
Skip pointers/Skip lists

Introduction to Information
Retrieval

Recall basic merge

Walk through the two postings
simultaneously, in time linear in the total
number of postings entries
2

48
11

64
17

128
21

Brutus

31 Caesar

If the list lengths are m and n, the merge takes O(m+n)

operations.
Can we do better?
Yes (if the index isnt changing too fast).

Introduction to Information
Retrieval

Augment postings with skip pointers (at

indexing time)
128

128

Why?
To skip postings that will not figure in the search
results.
How?
Where do we place skip pointers?

Introduction to Information
Retrieval

Query processing with skip pointers

128

Suppose weve stepped through the lists until we process 8 on

each list. We match it and advance.
We then have 41 and 11 on the lower. 11 is smaller.
But the skip successor of 11 on the lower list is 31, so
we can skip ahead past the intervening postings.

Introduction to Information
Retrieval

Where do we place skips?

Tradeoff:
More skips shorter skip spans more likely
to skip. But lots of comparisons to skip
pointers.
Fewer skips few pointer comparison, but
then long skip spans few successful skips.

Introduction to Information
Retrieval

Placing skips
Simple heuristic: for postings of length L, use L evenlyspaced skip pointers
[Moffat and Zobel 1996]
This ignores the distribution of query terms.
Easy if the index is relatively static; harder if L keeps
changing because of updates.
This definitely used to help; with modern hardware it may
not unless youre memory-based [Bahle et al. 2002]

Introduction to Information
Retrieval

Positional postings and phrase queries

Many complex or technical concepts and many
organization and product names are multiword
compounds or phrases.
Most recent search engines support a double
quotes syntax (stanford university) for phrase
queries.
As many as 10% of web queries are phrase
queries, and many more are implicit phrase
queries (such as person names), entered without
use of double quotes.

Introduction to Information
Retrieval

1. Biword indexes
One approach to handling phrases is to consider
every pair of consecutive terms in a document as
a phrase.
For example, the text Friends, Romans,
Countrymen would generate the biwords:
friends romans
romans countrymen

In this model, we treat each of these biwords as a

vocabulary term.
The concept of a biword index can be extended to
longer sequences of words, and if the index
includes variable length word sequences, it is
generally referred to as a phrase index.

Introduction to Information
Retrieval

2. Positional indexes
A biword index is not the standard solution.
Rather, a positional index is most commonly
employed.
Here, for each term in the vocabulary, we store
postings of the form docID: {hposition1,
position2, . . . } e.g.
to, 993427:
(1, 6: (7, 18, 33, 72, 86, 231);
2, 5: (1, 17, 74, 222, 255);
4, 5: (8, 16, 190, 429, 433);
5, 2: (363, 367);
7, 3: (13, 23, 191); ..... . . )
be, 178239:
(1, 2: (17, 25);
4, 5: (17, 191, 291, 430, 434);

Introduction to Information
Retrieval

2. Positional indexes
To process a phrase query, we still need to access
the inverted index entries for each distinct term.
As before, we would start with the least frequent
term and then work to further restrict the list of
possible candidates.
In the merge operation, the same general
technique is used as before, but rather than simply
checking that both terms are in a document, we
also need to check that their positions of
appearance in the document are compatible with
the phrase query being evaluated.

Introduction to Information
Retrieval

Example: Satisfying phrase queries

Suppose the postings lists for to and be are as in previous slide, and
the query is to be or not to be. The postings lists to access are:
to, be, or, not. We will examine intersecting the postings lists for to
and be. We first look for documents that contain both terms. Then,
we look for places in the lists where there is an occurrence of be
with a token index one higher than a position of to, and then we
look for another occurrence of each word with token index 4 higher
than the first occurrence. In the above lists, the pattern of
occurrences that is a possible match is:

to:
be:

(. . . ; 4: (. . . ,429,433); . . . )
(. . . ; 4(. . . ,430,434); . . . )

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
722.9 Introduction To Mercedes
91% (11)
722.9 Introduction To Mercedes
46 pages
Async JavaScript and HTTP Requests - Learn JavaScript - Requests Cheatsheet - Codecademy
No ratings yet
Async JavaScript and HTTP Requests - Learn JavaScript - Requests Cheatsheet - Codecademy
4 pages
4 Types of Drawings
No ratings yet
4 Types of Drawings
41 pages
Sample Sigconf
No ratings yet
Sample Sigconf
4 pages
Lecture 5-Dictionaries and Tolerant Retrieval
No ratings yet
Lecture 5-Dictionaries and Tolerant Retrieval
48 pages
ISR U 1&2 Tech-Knowledge
No ratings yet
ISR U 1&2 Tech-Knowledge
68 pages
Database PYQ
No ratings yet
Database PYQ
63 pages
Probabilistic Information Retrieval Model
No ratings yet
Probabilistic Information Retrieval Model
51 pages
Natural Language Processing (CSE4022) : by N. Ilakiyaselvan
No ratings yet
Natural Language Processing (CSE4022) : by N. Ilakiyaselvan
80 pages
Faculty Name: Dr. Humera Khanam Subject Name:NLP
No ratings yet
Faculty Name: Dr. Humera Khanam Subject Name:NLP
206 pages
PPT08-Natural Language Processing
100% (1)
PPT08-Natural Language Processing
44 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
SE Technical
No ratings yet
SE Technical
234 pages
Data Science & ML Syllabus
No ratings yet
Data Science & ML Syllabus
12 pages
Soft Computing Decode
No ratings yet
Soft Computing Decode
142 pages
Lecture-1-Introduction To Natural Language Processing-2021
No ratings yet
Lecture-1-Introduction To Natural Language Processing-2021
46 pages
Completed Unit II 17.7.17
No ratings yet
Completed Unit II 17.7.17
113 pages
Cp5151 Advanced Data Structures and Algorithims
No ratings yet
Cp5151 Advanced Data Structures and Algorithims
3 pages
Cs8080 Unit3 Text Classification and Clustering
No ratings yet
Cs8080 Unit3 Text Classification and Clustering
171 pages
Part I IR VTU M Tech SSE
No ratings yet
Part I IR VTU M Tech SSE
72 pages
QA Review: IR-based Question Answering
No ratings yet
QA Review: IR-based Question Answering
11 pages
Probability & Statistics
No ratings yet
Probability & Statistics
351 pages
Aids I Book Sem 6
No ratings yet
Aids I Book Sem 6
223 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
DDM Two Marks SET1
No ratings yet
DDM Two Marks SET1
7 pages
Full download Discrete Mathematics T. Veerarajan pdf docx
No ratings yet
Full download Discrete Mathematics T. Veerarajan pdf docx
36 pages
PDF Computer fundamentals architecture and organisation 4th Edition B. Ram download
100% (7)
PDF Computer fundamentals architecture and organisation 4th Edition B. Ram download
85 pages
Computer Organisation Makaut
No ratings yet
Computer Organisation Makaut
163 pages
Completed UNIT-III 20.9.17
No ratings yet
Completed UNIT-III 20.9.17
61 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
CHP - 1 - Fundamentals of Digital Image Min
No ratings yet
CHP - 1 - Fundamentals of Digital Image Min
15 pages
01cs6105 s1 Advanced Data Structures and Algorithms
No ratings yet
01cs6105 s1 Advanced Data Structures and Algorithms
2 pages
Java Assignments (1&2) : Assignment 1
No ratings yet
Java Assignments (1&2) : Assignment 1
46 pages
Statistics TechNeo
No ratings yet
Statistics TechNeo
226 pages
Irs Question Papers
No ratings yet
Irs Question Papers
6 pages
2.notes CS8080 - Information Retrieval Technique
No ratings yet
2.notes CS8080 - Information Retrieval Technique
164 pages
DBDM Unit-3
No ratings yet
DBDM Unit-3
30 pages
Software Testing
No ratings yet
Software Testing
174 pages
Information Retrieval
No ratings yet
Information Retrieval
31 pages
Cs8080informationretrievaltechniquesunit Ipptpdfversion 220423092105
No ratings yet
Cs8080informationretrievaltechniquesunit Ipptpdfversion 220423092105
240 pages
Cp7004 Image Processing and Analysis 1
No ratings yet
Cp7004 Image Processing and Analysis 1
8 pages
Information Retrieval 1
100% (2)
Information Retrieval 1
12 pages
Image and Video Compression Fundamentals Techniques and Applications 1st Edition Madhuri A. Joshi 2024 scribd download
100% (6)
Image and Video Compression Fundamentals Techniques and Applications 1st Edition Madhuri A. Joshi 2024 scribd download
67 pages
Irs Important Questions
0% (1)
Irs Important Questions
3 pages
Machine Learning Notes 1686281543
No ratings yet
Machine Learning Notes 1686281543
113 pages
Bda Chapter 1 Techneo
No ratings yet
Bda Chapter 1 Techneo
27 pages
EE8012 - Soft Computing
No ratings yet
EE8012 - Soft Computing
340 pages
Dip Notes
No ratings yet
Dip Notes
190 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
97 pages
274 - Soft Computing LECTURE NOTES
No ratings yet
274 - Soft Computing LECTURE NOTES
499 pages
Completed Final UNIT-V 9.10.17
100% (1)
Completed Final UNIT-V 9.10.17
74 pages
UE20CS332 Unit2 Slides PDF
No ratings yet
UE20CS332 Unit2 Slides PDF
264 pages
CS6007 Information Retrieval
No ratings yet
CS6007 Information Retrieval
8 pages
Lec01 Conceptlearning
100% (1)
Lec01 Conceptlearning
49 pages
Cs3491-Artificial Intelligence and Machine Learning-819461728-Ai Unit 1
No ratings yet
Cs3491-Artificial Intelligence and Machine Learning-819461728-Ai Unit 1
73 pages
18AI61
No ratings yet
18AI61
3 pages
What Is A Sequential Search and What Is An Example of One?
No ratings yet
What Is A Sequential Search and What Is An Example of One?
3 pages
ML Spectrum
No ratings yet
ML Spectrum
144 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
CD Unit - 1
No ratings yet
CD Unit - 1
38 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Get Programming Ruby 3 3 The Pragmatic Programmers Guide 5 / converted Edition Noel Rappin free all chapters
No ratings yet
Get Programming Ruby 3 3 The Pragmatic Programmers Guide 5 / converted Edition Noel Rappin free all chapters
79 pages
Sonar: Presented By: Mike Puato & Neil Corpuz
100% (1)
Sonar: Presented By: Mike Puato & Neil Corpuz
33 pages
VBA For Excel
No ratings yet
VBA For Excel
25 pages
CH 3
No ratings yet
CH 3
38 pages
Chloe NG - GM 2021
No ratings yet
Chloe NG - GM 2021
33 pages
MiCOM P630 Setting (Error)
No ratings yet
MiCOM P630 Setting (Error)
24 pages
VIVA QUESTIONS.docx
No ratings yet
VIVA QUESTIONS.docx
9 pages
SL Swagger
No ratings yet
SL Swagger
23 pages
Multifunctional Sensors For Air Defence
No ratings yet
Multifunctional Sensors For Air Defence
19 pages
Dynamics in Two Dimensions
No ratings yet
Dynamics in Two Dimensions
105 pages
HiP Gen Cat
No ratings yet
HiP Gen Cat
144 pages
FlexiPacket MultiRadio Product Description
No ratings yet
FlexiPacket MultiRadio Product Description
12 pages
Indian Institue of Technology 1
No ratings yet
Indian Institue of Technology 1
186 pages
Ajp PR1
No ratings yet
Ajp PR1
5 pages
Principles of Naval Architecture PDF
100% (3)
Principles of Naval Architecture PDF
435 pages
Chapter Iii Final
No ratings yet
Chapter Iii Final
8 pages
Efficiency and Resolution of HPGe and NaITl Detection
No ratings yet
Efficiency and Resolution of HPGe and NaITl Detection
5 pages
Instant Download (Ebook) Math Triumphs Foundations for Algebra 1 Teacher Edition Glencoe McGraw-Hill 2010 by Glencoe ISBN 9780078908477, 0078908477 PDF All Chapters
100% (1)
Instant Download (Ebook) Math Triumphs Foundations for Algebra 1 Teacher Edition Glencoe McGraw-Hill 2010 by Glencoe ISBN 9780078908477, 0078908477 PDF All Chapters
54 pages
Ranpelen PP Random Copolymer: Description
No ratings yet
Ranpelen PP Random Copolymer: Description
2 pages
Darpa Evaluation Using Snort
No ratings yet
Darpa Evaluation Using Snort
19 pages
Archaea From Coal Mines and Its Role in Bio Conversion of Low Rank Coal To Bio-Methane
No ratings yet
Archaea From Coal Mines and Its Role in Bio Conversion of Low Rank Coal To Bio-Methane
10 pages
Sun Fuse: UL/CSA 6x32mm Ceramic Slow Blow Fuse
No ratings yet
Sun Fuse: UL/CSA 6x32mm Ceramic Slow Blow Fuse
1 page
mock_Maths_Paper_1_e_2022
No ratings yet
mock_Maths_Paper_1_e_2022
30 pages
Narrative
No ratings yet
Narrative
8 pages
Technical Data Sheet TR24-3-T US
No ratings yet
Technical Data Sheet TR24-3-T US
2 pages
Chapter 1 - Objectives
No ratings yet
Chapter 1 - Objectives
7 pages