0% found this document useful (0 votes)
234 views22 pages

Ir MCQ-1

This document contains a true/false quiz on information retrieval topics. It includes 59 multiple choice and true/false questions that cover concepts like structured vs. unstructured data, recall and precision, Boolean retrieval models, inverted indexes, tokenization, normalization, stemming, and query processing. The questions test understanding of foundational information retrieval techniques and how modern search engines and legal search services like Westlaw operate.

Uploaded by

Medo Only
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
234 views22 pages

Ir MCQ-1

This document contains a true/false quiz on information retrieval topics. It includes 59 multiple choice and true/false questions that cover concepts like structured vs. unstructured data, recall and precision, Boolean retrieval models, inverted indexes, tokenization, normalization, stemming, and query processing. The questions test understanding of foundational information retrieval techniques and how modern search engines and legal search services like Westlaw operate.

Uploaded by

Medo Only
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

TRUE -FALSE &MCQ

--------------------------------------------------------------------------------------------------------------------------------------

1-Information RetrievaL is finding material of an structured nature that


satisfies an information need from within large collections .
.ANSWER:FALSE
Information RetrievaL is finding material of an unstructured nature that
satisfies an information need from within large collections .
2-unstructured is better than structured data in the mid-nineties in
market Cap
ANSWER:FALSE
structured is better than unstructured data in the mid-nineties in
market Cap
----------------------------------------------------------------------------------------
3- Unstructured is better than structured data today in market cap
ANSWER:TRUE
4- Fraction of retrieved docs that are relevant to the user’s information
need is……………………………………………………………………………………………
A-RECALL
B-PERCISION
C-COLLECTION
D-TASK
ANSWER:B
5- Fraction of relevant docs in collection that are retrieved is …………
A-RECALL
B-PERCISION
C-COLLECTION
D-TASK
ANSWER:A
---------------------------------------------------------------------------------------
6- A model of information retrieval in which we can pose any query in
Which search terms are combined with the operators AND, OR, and
NOT is……………………………………………………………………..
A-Ad-hoc Retrieval
B- Ranked Retrieval Model
C- Boolean Retrieval Model
D-Collection
ANSWER:C
--------------------------------------------------------------------------------------------
7- Find documents in a collection of documents relevant to a certain
user need
A-Ad-Hoc Retrieval
B-- Ranked Retrieval Model
C- Boolean Retrieval Model
D-Retrieval Model
ANSWER:A
8-grep is --------- Command
A-LINUX
B-UNIX
C-WINDOWS
ANSWER:B
9-In Bigger Collection Can’t build the matrix BECAUSE It Is sparse
ANSWER:TRUE
---------------------------------------------------------------------------------------
10-In Inverted Index We need variable-size Called :
A-Postings Lists
B-QUEUE
C-LISTS
ANSWER:A
----------------------------------------------------------------------------------------------
11- Cut character sequence into word tokens is called:
A- Normalization
B- Stemming
C- Stop words
D- Tokenization
ANSWER::D
--------------------------------------------------------------------------------------------------------------------------------------------

12- Map text and query term to same form is called:


A- Normalization
B- Stemming
C- Stop words
D- Tokenization
ANSWER::A
13- In information retrieval, extremely common words which would
appear to be of little value in helping select documents that are
excluded from the index vocabulary are called:
A- Normalization
B- Stemming
C- Stop words
D- Tokenization
ANSWER:C
14- Doc. frequency information is added in sort step
ANSWER: FALSE
- Doc. frequency information is added in Dictionary & Postings step
15- If the list lengths are x and y, the merge takes O(x+y)
.operations
ANSWER:TRUE
16-document matches condition or not is called:
A-IS PRECISE
B-VIEWS
C-BOOLEAN
D-NOT
ANSWER:A
17-Many search systems you still use are Boolean:
A-Email
B- library catalog
C- Mac OS X Spotlight
.D-All Of The Above
ANSWER:D
18- WestLaw is Largest commercial paying subscribers legal search
service .
ANSWER:TRUE
19-WestLaw started 1975 and ranking added 1992
ANSWER:TRUE
:--------------------------------------------------------------------------------------------
20-IN WESTLAW Tens of terabytes of data; ~700,000 users
ANSWER:TRUE
21-IN WESTLAW Majority of users still use boolean queries
ANSWER:TRUE
22- IN WESTLAW SPACE is conjunction, not disjunction.
ANSWER:FALSE
SPACE is disjunction, not conjunction
-------------------------------------------------------------------------------------------
23- What is the best order for query processing is called:
A-QUERY
B-QUERY PERFORMANCE
C- Query optimization
D-LONG QUERY
ANSWER:C
24- Data generally is two type in generally
ANSWER:FALSE
Data generally is THREE type in generally
structured data
semi structured data:
unstructured data
-------------------------------------------------------------------------------------------------------------------------------------

25- ………………………..is semi structured data


A-XML FILES
B-DATABASE
C-VIDEO
D-AUDIO
ANSWER: A
-----------------------------------------------------------------------------------------
26- ………………………..is unstructured data
A-text
B-video
C-audio
D-ALL OF THE ABOVE
ANSWER: D
27-- ………………………..is structured data
A-DATABASE
B-video
C-audio
D-text
ANSWER: A
28-search engine is large scale
ANSWER: TRUE
29-Major Steps in Inverted Index Construction IS :
A-Collect the documents to be indexed
B-Tokenize the text
C-Do linguistic preprocessing of tokens
D-Index the documents that each term occurs in
E-ALL OF THE ABOVE
ANSWER:E
---------------------------------------------------------------------------------------------------------------------------------------------

30-An IR system should be designed to offer choices of granularity


ANSWER:TRUE
----------------------------------------------------------------------------------------------------------------------------------------------

31- IN Obtaining the character sequence in a document


We need to convert byte sequence into sequence of characters
ANSWER:TRUE
32-Encoding Scheme IS -------------------------------
A-ASCII
B-Unicode UTF-8
C-vendor-specific standards
D-ALL OF THE ABOVE
ANSWER:D
33-A ……….. is an instance of a sequence of characters that are grouped
together as a useful semantic unit
A-TOKENZATION
B-NORMALIZATION
C-STEMMING
D-TOKEN
ANSWER:D
34-Each such token type is now a candidate for an index entry (Term),
before further processing
ANSWER: FALSE
Each such token type is now a candidate for an index entry (Term),
after further processing
----------------------------------------------------------------------------------------------------------------------------------------------------

35-……………………..all tokens containing the same character sequence


A-TOKEN
B-TYPE
C-TERM
D-COLLECTION
ANSWER:B
36-……………………………………………..normalized type that is included in
the dictionary
A-TOKEN
B-TYPE
C-TERM
D-COLLECTION
ANSWER:C
37- whitespace is issue in normalization
ANSWER:FALSE
whitespace is issue in TOKENZATION
------------------------------------------------------------------------------------------
38- Hyphen It can be effective to get the user to put in possible
hyphens and the system generates all three forms (e.g., over-eager)
ANSWER:TRUE
--------------------------------------------------------------------------------------------------------------------------------------------------

39- Approaches to handle No Spaces ISSUE:


A- word segmentation
B- Machine learning sequence models
C- K-grams
D-ALL OF THE ABOVE
ANSWER:D
40- ……………….. is the process of canonicalzing tokens so that matches
occur despite differences in the character sequences.
A-TOKENZATION
B-NORMALIZATION
C-STEMMING
D-LIMMITAZATION
ANSWER:B
------------------------------------------------------------------------------------------------------------------------------------------------------

41-IN NORMALIZATION most commonly implicitly define equivalence


OF CLASSES using mapping rules AND list of synonyms
ANSWER:TRUE
42-………………….. Reduce all letters to lower case
A-NORMALIZATION
B-TOKENZATION
C-STEMMING
D-CASE FLODING
ANSWER:D
43-…………………. Reduce variant forms to base form
A-NORMALIZATION
B- Lemmatization
C-STEMMING
D-CASE FLODING
ANSWER:B
44-……………………………. Reduce terms to their “roots” before indexing
A-NORMALIZATION
B- Lemmatization
C-STEMMING
D-CASE FLODING
ANSWER:C
45-………………… IS Commonest algorithm for stemming English
A-INVERTED INDEX
B- SORTED
C- Porter’s
D-BUBBLE SORT
ANSWER:C
46- A positional index expands postings storage substantially
ANSWER:TRUE
47- a positional index is now standardly used because of the power and
usefulness of phrase and proximity queries whether used explicitly or
implicitly in a ranking retrieval system
ANSWER:TRUE
48- Biword indexes Index every consecutive pair of terms in the text as
a phrase
ANSWER:TRUE
49- Index size depends on maximum document size IN Positional index
size
ANSWER: FALSE
Index size depends on average document size
----------------------------------------------------------------------------------------------
50- Combination schemes have approaches For particular phrases:
A- Michael Jackson
B- Britney Spears
C- Williams et
D-BOTH A AND B
ANSWER:D
.----------------------------------------------------------------------------------------
51-Williams et al. (2004) evaluate a more sophisticated mixed indexing
scheme
ANSWER: TRUE
-----------------------------------------------------------------------------------------
52- A parametric search interface allows the user to combine a full-text
query with selections on field values
ANSWER:TRUE
53- A ………………. is an identified region within a doc
A-ZONE
B-AREA
C-MODEL
D-TYPE
ANSWER:A
---------------------------------------------------------------------------------------
54- Contents of a zone are free text
ANSWER:TRUE
55- consider a collection in which each document has three zones:
author, title and body
Suppose we set g1 = 0.2, g2 = 0.3 and g3 = 0.5 where g1, g2 and g3
.represents the author, title and body zone weights
If the term shakespeare appear in the title and body zones but not the
author zone of a document, the score of this document would be
A-0.6
B-0.7
C-0.8
D-0.9
ANSWER:C
56- a set of documents satisfying a query expression is called:
A-RANKED RETRIEVAL
B-BOOLEAN RETRIEVAL
C-AD- HOC RETRIEVAL
D-INFORMATION RETRIEVAL
ANSWER:A
57-…………………….. Rather than a query language of operators and expressions,
the user’s query is just one or more words in a human language
A-FREE TEXT QUERY
B-QUERY OPTIMIZATION
C-QUERY LANGUAGE
D-QUERY
ANSWER:A
58- The number of times that a word or term occurs in a document is called the
A-IDF
B-TERM FREQUENCY
C-QUERY OPTIMIZATION
D-GOOD QUERY
ANSWER:B
59- John is quicker than Mary and Mary is quicker than John have the same
vectors this is called bag of words model
ANSWER:TRUE
.60- Relevance increase proportionally with term frequency
ANSWER:FALSE
.Relevance does not increase proportionally with term frequency
-------------------------------------------------------------------------------------------------------
61- Rare terms are less informative than frequent terms
ANSWER:FALSE
Rare terms are MORE informative than frequent terms
----------------------------------------------------------------------------------------------------
62------ is the document frequency of t: the number of documents that contain T
A-TF
B-IDF
C-DF
D-TF-IDF
ANSWER:C
.---------------------------------------------------------------------------------------------------------
.63- The TF-IDF weight of a term is the sum of its tf weight and its idf weight
ANSWER:FALSE
.: The tf-idf weight of a term is the product of its tf weight and its idf weight
--------------------------------------------------------------------------------------------------------
64- ………… Best known weighting scheme in information retrieval
A-TF
B-IDF
C-DF
D-TF-IDF
ANSWER:D
65- tf-idf weighting decreases with the number of occurrences within a
document
ANSWER: FALSE
TF-IDF WEIGHTING Increases with the number of occurrences within a
document
-----------------------------------------------------------------------------------------------------
66-TF-IDF WEIGHTING Increases with the rarity of the term in the collection
ANSWER:TRUE
67- The Vector Space Model IS The set of documents in a collection may be
.viewed as a set of vectors in a vector space
ANSWER:TRUE
68-IN VECTOR SPACE MODEL TERMS ARE points or vectors
ANSWER:FALSE
Terms are axes of the space
----------------------------------------------------------------------------------------------------
69- Documents are points or vectors in this space
ANSWER:TRUE
70- Very high-dimensional are tens of millions of dimensions when you apply
this to a web search engine and These are very sparse vectors - most entries are
.zero
ANSWER:TRUE
71------------ two documents with very similar content can have a significant
.vector difference simply because one is much longer than the other
A-FIRST ATTEMPT
B-DRAWBACK
C-SECOND ATTEMPT
D-THIRD ATTEMPT
ANSWER:B
----------------------------------------------------------------------------------------------------------
72---------------------- the magnitude of the vector difference between two
.document vectors
A-FIRST ATTEMPT
B-DRAWBACK
C-SECOND ATTEMPT
D-THIRD ATTEMPT
ANSWER:A
73- Vector Similarity solution to compensate for the effect of document length
.is to compute sin similarity
ANSWER:FALSE
Vector Similarity solution to compensate for the effect of document length is to
compute COSINE similarity
74- Rank documents in increasing order of the angle between query and
document
ANSWER:FALSE
Rank documents in decreasing order of the angle between query and document
------------------------------------------------------------------------------------------------------
75- Rank documents in increasing order of cosine query,document
ANSWER:TRUE
76- Cosine is a monotonically decreasing function for the interval [0o, 180o]
ANSWER:TRUE
77- Number of documents/hour AND (Average document size) IT IS ………
A- Usability
B- Effectiveness
C- Efficiency
D- ALL OF THE ABOVE
ANSWER:C
---------------------------------------------------------------------------------------------------------
78- Precision, Recall, F-measure is ………………………………………………………
A- Usability
B- Effectiveness
C- Efficiency
D- ALL OF THE ABOVE
ANSWER:B
79- Ability to express complex information needs it is ………………………………
A- Usability
B- Effectiveness
C- Efficiency
D- ALL OF THE ABOVE
ANSWER:A
80- the information need is translated into a query
ANSWER:TRUE
81- Relevance is assessed relative to QUERY not information need
ANSWER:FALSE
Relevance is assessed relative to the information need not the query
----------------------------------------------------------------------------------------------
82- TREC - National Institute of Standards and Technology (NIST) has
run a large IR test bed for many years
ANSWER:TRUE
83- Human experts mark, for each query and for each doc, Relevant or
Nonrelevant
ANSWER:TRUE
84- (relevant retrieved / retrieved) is recall
ANSWER:FALSE
Precision=(relevant retrieved / retrieved)
----------------------------------------------------------------------------------------------
85-Recall=(relevant retrieved/relevant)
ANSWER:TRUE
86-Accuracy is a commonly used evaluation measure in machine
learning classification work
ANSWER:TRUE
87-You can get low recall (but high precision) by retrieving all docs for
all queries
ANSWER:FALSE
You can get high recall (but low precision) by retrieving all docs for all
queries
88-Recall is a decreasing function of the number of docs retrieved
ANSWER:FALSE
Recall is a non-decreasing function of the number of docs retrieved
---------------------------------------------------------------------------------------------------
89-In a good system, precision decreases as either the number of docs
retrieved or recall increases
ANSWER:TRUE
90-An advantage of a positional index is that it reduces the asymptotic
complexity of a postings intersection operation
ANSWER:TRUE
91-Which is a good idea for using skip pointers ?
a. Fewer skips, larger skip spans
b. None
c. Depends upon the no. of comparisons needed
d. More skips, shorter skip spans

Answer: c. Depends upon the no. of comparisons needed


--------------------------------------------------------------------------
92-Boolean retrieval model does not provide PROVISION FOR
A-Ranked search
b. Proximity search
c. Phrase search
d. Both proximity and ranked
search
Answer: d. Both proximity
and ranked search
93-A large repository of
documents in IR is called
A-Corpus
b. Database
c. Dictionary
d. Collection
----ANSWER:A
--------------------
94-Variable-size postings
lists is used when::
a. More seek time is desired
and the corpus is dynamic
b. Less seek time is desired
and the corpus is dynamic
c. Less seek time is desired
and the corpus is static
d. More seek time is desired
and the corpus is dynamic

Answer: d. More seek time is


desired and the corpus is
dynamic

95-Postings list should be sorted by:


A-Document Frequency
b. DocID
c. TermID
d. Term frequency
Answer: b
96-Term-document incidence matrix is:
a. Sparse
b. Depends upon the data
c. Dense
d. Cannot predict
Answer: a. Sparse
97-Lemmatization is a technique for:
A-Ranking documents
b. Case folding
c. Normalization
d. Tokenization

Answer: c. Normalization
98-Unstructured data tends to refer to information on the web and
is processed using:
Both
b. Database systems
c. IR systems
d. None
Answer: c. IR systems
--99-If list lengths are x and y, merge takes:
a. O(Yn) operations
b. O(xy) operations
c. O(xn) operations
d. O(x+y) operations

Answer: d. O(x+y) operations


100-The goal of IR is to:
a.find documents relevant to an information need
b. find documents relevant to an information need from a
given document set
c. find documents relevant to an information need from a
large document set
d. find documents relevant to an information need from a
small document set

Answer: c. find documents relevant to an information need


from a large document set
‫باألخالق_والعلم_نرتقي‬#

You might also like