一、判断题
1、In distributed indexing, document-partitioned strategy is to store on each node all the documents that contain the terms in a certain range.
T F
解析:F。在磁盘分区索引技术中,每个节点均会存放部分索引,而不是所有的索引。因为分布式索引的方式是按文档序号排序的,如果按包含的terms分类,在储存故障时,关于这个terms的文档全没了,不抗风险。
2、When evaluating the performance of data retrieval, it is important to measure the relevancy of the answer set.
T F
解析:F。这个说的是data retrieval,错。Information retrieval才需要measure the relevancy of the answer set。
3、Precision is more important than recall when evaluating the explosive detection in airport security.
T F
解析:F。False,在机场安全的危险品探测中应该是Recall率更重要。
4、While accessing a term by hashing in an inverted file index, range searches are expensive.
T F
解析:T。因为hash表是直接使用hash函数定位的时间是常数的,而使用搜索树则是O(logn)的。但是hash表的储存不灵活有缺点。
二、选择题
1、When measuring the relevancy of the answer set, if the precision is high but the recall is low, it means that:
A.most of the relevant documents are retrieved, but too many irrelevant documents are returned as well
B.most of the retrieved documents are relevant, but still a lot of relevant documents are missed
C.most of the relevant documents are retrieved, but the benchmark set is not large enough
D.most of the retrieved documents are relevant, but the benchmark set is not large enough
解析:B。召回率很低,但是精确的高,说明相关性很高。
2、Which of the following is NOT concerned for measuring a search engine?
A.How fast does it index
B.How fast does it search
C.How friendly is the interface
D.How relevant is the answer set
解析:C。界面有多友好显然不是吧。。。
3、There are 28000 documents in the database. The statistic data for one query are shown in the following table. The recall is: __
Relevant | Irrelevant | |
---|---|---|
Retrieved |