0% found this document useful (0 votes)

22 views

TREC Experiment and Evaluation in Inform

Uploaded by

Nguyễn Văn Hùng Dũng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

TREC Experiment and Evaluation in Inform

Uploaded by

Nguyễn Văn Hùng Dũng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Computing Reviews, the leading online review service for computing lite... http://www.reviews.com/review/Review_review.cfm?review_id=133676...

TREC : experiment and evaluation in information retrieval (Digital Libraries and

Electronic Publishing)
Voorhees E., Harman D., The MIT Press, 2005. 368 pp. Type: Book

Date Reviewed: Dec 7 2006

The Text Retrieval Evaluation Conference (TREC), coordinated by the US National Institute
of Standards and Technology (NIST), is the largest information retrieval (IR)
experimentation effort in existence. Starting with TREC-1 in 1992, and continuing yearly, Information
Search And
TREC gives participating groups the opportunity to have their IR systems compete in
Retrieval (H.3.3
several IR experiments, called tracks. TREC has had a big influence on research in particular
)
approaches to IR: tracks have often initiated small research communities around a problem,
and TREC has occupied a large segment of the IR community as a whole. Thus, whatever Digital Libraries
one may think about the TREC approach to IR testing, a book detailing the methods used (H.3.7 )
and results achieved (through 2003) is important. This book is a useful overview for Large Text
researchers in the field, a must-read for prospective TREC participants, and a glimpse into a Archives (H.3.6
world of research for graduate students. ... )
Library
The book has three parts and an epilogue. Part 1 presents the essentials. TREC is based on
Automation
the Cranfield paradigm [1]. Chapter 1 quotes the oft-repeated Cranfield “conclusion” that (H.3.6 )
“using words in the texts themselves was very effective” (page 3). What Cranfield actually
showed, however, is that systems that compute expected relevance scores by matching System
query words with title words agree, to a great extent, with human judges, whose relevance Management
(K.6.4 )
judgments are heavily influenced by the match of query words with title words (hardly a
surprising result). This can be seen in side studies on the nature of the relevance judgments Systems And
[1]. Chapter 2 describes the test collection corpus, the creation of topics (information needs Software (H.3.4
descriptions), and the relevance judgments. I would have liked to see more about the )
instructions given to relevance judges, and thus the nature of the relevance judgments,
which are a crucial element. Chapter 3 discusses retrieval performance measures, with a
focus on the monolingual English ad hoc track in TREC-1 and TREC-2.

Part 2 (chapters 4 through 10) reports on the various TREC tracks, A track is a specific
experiment, defined by, first, the type of task (ad hoc retrieval, filtering,
question-answering, and so on); second, the type of material (printed text, spoken text,
images, music, and so on); third, the presence of errors in the text (from optical character
recognition (OCR) or automatic speech recognition); fourth, whether the data is
monolingual or cross-lingual; and, fifth, the language(s) involved (most tracks are
monolingual (English)). Each track report describes, over the life of the track, the specific
task, the assembly and size of the test collection, the participants, the methods and
evaluation measures used, and the results achieved (unfortunately, not including the
overlap in retrieval by the different systems).

In Part 3 (chapters 11 through 17), selected participants report on their work at TREC. Parts
2 and 3 give complementary views of the work at TREC. Think of a table, with a column for
each track and a row for each participating group. A chapter in Part 2 reports on the total
work in a column (a track), both globally and by participant (a cell in the table); a chapter
in Part 3 reports on the total work in a row (a research group), both globally and by track (a
cell in the table). There should be more cross-references between Parts 2 and 3 to connect
information on the same table cell given in different places.

The epilogue, by Karen Sparck-Jones, is a “metareflection” on what was learned from TREC,
and on the future of TREC. It is not, nor can it be, a summary and inventory of IR
techniques, evaluation methods, and results emanating from TREC; it is, rather, a
high-level summary, commentary, and development of vision, particularly with regard to
the Web, intranets, and digital libraries. The “Reference Summary” and “TREC Messages”
sections should have been integrated into chapter 1, to provide better structure and a
high-level perspective from the outset. These sections make some bold claims about the
success of fully automated methods in IR, but these claims are not supported (see below).
Sections 7 and 8, short but most important, paint a vision of an integrated information

1 of 3 1/14/2007 7:00 PM
Computing Reviews, the leading online review service for computing lite... http://www.reviews.com/review/Review_review.cfm?review_id=133676...

management system that lets the user execute and combine several tasks, such as
document retrieval, information extraction, topic detection and tracking, (multi-document)
summarization, and translation, and suggests that TREC establish a “common,
multi-purpose evaluation framework.” I could not agree more.

The book is more of a collection of independent chapters than an integrated whole, resulting
in redundancies and inconsistencies. The track reports lack a common format. There are
many inconsistencies in the notation used for the formulas for term weighting and
document scoring. The same quantities, such as term frequency within a document, are
designated with different symbols, making understanding and comparing these formulas
unnecessarily difficult. Results are often given as a family of recall-precision curves labeled
by research group rather than by IR technique used, which is what really matters; the
reader must make this connection by laboriously checking in the text. Throughout the book,
a better layout (for example, of bulleted lists) would support faster reading and better
comprehension.

The book provides a great deal of detail about the work in TREC, and its historical evolution,
but a much more systematic and formalized presentation would be needed to let the reader
see the large picture. For example, an overview table showing the different retrieval
methods used across tracks in different years would be very useful, as would be a table of
evaluation measures used by track and year. (For a very broad-brush overview of basic
techniques and results through TREC-6, see Sparck-Jones’ papers [2,3]).

There are some issues with TREC and the claims made in the book; TREC has fundamental
limitations that, while sometimes acknowledged, are often forgotten when stating results.
The test collection corpora have been assembled primarily based on availability; they are
dominated by news, and are not representative of much else (even though several other
document types are included). This is critical: TREC does not support claims for text
retrieval in general, but only sharply limited claims on retrieving news items from
newspaper text.

Test topics are also problematic; it is unknown whether they are representative of all
possible user topics. Topics induce more variance in retrieval performance than systems
(page 94 and elsewhere). TREC’s comparison of systems on average performance over a set
of topics hides the real story; what is needed is research into the reasons for the
topic-to-topic differences in performance, and methods for adapting IR systems to the
topics at hand (as suggested by the SMART team, page 313), which includes finding out
which system does well with which kind of topic. A similar problem of adaptation is ignored
in TREC’s use of a single measure of performance for a given topic, when, in reality,
different users have different requirements with respect to recall, precision, and other
performance characteristics, and systems should be evaluated on their ability to adapt to
specific user requirements. TREC relevance judgments are problematic. For example, IBM
cites inconsistencies in judging as “undercutting” their approach of using hypernyms from
WordNet for answering “What is” questions (page 412). Finally, TREC takes the query
statement (topic statement) for granted, and has each system work from the same
statement. There is wide intuitive agreement (but no empirical proof) that formulating the
right query is half the battle in IR. So, a system could improve users’ success by helping
them to understand and state their information need, and then formulate it properly. This
very important system function is ignored in TREC.

TREC does not measure absolute retrieval performance, but merely compares the
performance of participating systems. This is the justification for limiting relevance
judgments to documents found by the participating systems (the pool). Since almost all
systems use approaches based on words in the text, what is really being measured is the
overlap of relevant documents found by essentially similar systems, which might leave out
whole classes of relevant documents. This methodology does not support claims of absolute
retrieval performance, which is what users are interested in.

TREC collections, topics, and protocols are changing from year to year in order to address
new problems in IR research. This makes longitudinal studies of changes in retrieval
effectiveness difficult, putting further into question the claim that “retrieval effectiveness
approximately doubled during those eight years” (1992-1999) (chapter 4, page 79, and
elsewhere).

Real understanding of how IR systems work often gets buried in the quest to make yet
another ad hoc refinement to a weighting formula. Real understanding requires examining
topic variables (mentioned in the book) and document variables (barely mentioned), in
conjunction with system variables, to explain retrieval results, and then conducting a careful
analysis of successes and failures, looking for patterns, as done in Lancaster’s paper [4].

There is no question that TREC has had a considerable influence on research in information
retrieval: all of the chapters in Part 3 attest to that. While entering a competition may have
enticed research groups to participate, the system rankings (which are based on not very

2 of 3 1/14/2007 7:00 PM
Computing Reviews, the leading online review service for computing lite... http://www.reviews.com/review/Review_review.cfm?review_id=133676...

meaningful averages) may have been less important than what each group learned from its
own experiments, and from discussions with other groups. For example, IBM states, “Of
more ultimate value to IBM was qualitative evidence for how people interpreted the Web
search syntax” (page 407). The testing environment described above would emphasize
learning and the interchange of ideas based on detailed study of experimental results,
rather than competition between systems. Combined with Karen Sparck-Jones’ vision, this
would chart a course for TREC toward a new level, and a broader scope of research, in
information retrieval.

Reviewer: D. Soergel Review #: CR133676

1) Cleverdon, C.; Mills, J.; Keen, M. Factors determining the performance of indexing
systems. Volume 1: Design and Volume 2: Test results. Aslib Cranfield Research
Project, Cranfield, UK, 1966.

2) Sparck-Jones, K. Reflections on TREC. Information Processing and Management 31,

3(1995), 291–314.

3) Sparck-Jones, K. Further reflections on TREC. Information Processing and

Management 36, 1(2000), 37–85.

4) Lancaster, F.W. MEDLARS: report on the evaluation of its operating efficiency.

American Documentation 20, 2(1969), 119–142.

Would you recommend this review? yes no

Other reviews under "Information Search And Retrieval": Date

Google’s Pagerank and beyond: the science of search engine rankings Dec 6 2006
Langville A., Meyer C., PRINCETON UNIVERSITY PRESS, Princeton, NJ, 2006. 234 pp. Type: Book

Temporal pre-fetching of dynamic Web pages Sep 6 2006

Lam K., Ngan C. Information Systems 31(3): 149-169, 2006. Type: Article

Similarity search: the metric space approach (Advances in Database Systems) Aug 31 2006
Zezula P., Amato G., Dohnal V., Batko M., Springer-Verlag New York, Inc., Secaucus, NJ, 2005.
220 pp. Type: Book

more...

E-Mail This Printer-Friendly

3 of 3 1/14/2007 7:00 PM

Dynamics of Physical Systems
From Everand
Dynamics of Physical Systems
Robert H., Jr. Cannon
No ratings yet
Stable Adaptive Systems
From Everand
Stable Adaptive Systems
Kumpati S. Narendra
5/5 (1)
Data Structures and Algorithm Analysis in Java, Third Edition
From Everand
Data Structures and Algorithm Analysis in Java, Third Edition
Clifford A. Shaffer
4/5 (4)
An Introduction to Information Theory
From Everand
An Introduction to Information Theory
Fazlollah M. Reza
No ratings yet
Pyterrier Readthedocs Io en Latest
No ratings yet
Pyterrier Readthedocs Io en Latest
106 pages
IRS UNIT_V
No ratings yet
IRS UNIT_V
6 pages
Communication Nets: Stochastic Message Flow and Delay
From Everand
Communication Nets: Stochastic Message Flow and Delay
Leonard Kleinrock
3/5 (1)
The TREC 2006 Terabyte Track
No ratings yet
The TREC 2006 Terabyte Track
14 pages
Building A Question Answering Test Collection: Judgments
No ratings yet
Building A Question Answering Test Collection: Judgments
8 pages
The_TREC_question_answering_track
No ratings yet
The_TREC_question_answering_track
9 pages
Mastering Algorithms and Data Structures
From Everand
Mastering Algorithms and Data Structures
Manish Soni
No ratings yet
Towards best practice in the Archetype Development Process
From Everand
Towards best practice in the Archetype Development Process
Alberto Moreno Conde
No ratings yet
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
The Fiesta Data Model: A novel approach to the representation of heterogeneous multimodal interaction data
From Everand
The Fiesta Data Model: A novel approach to the representation of heterogeneous multimodal interaction data
Peter Menke
No ratings yet
Network Analysis and Synthesis: A Modern Systems Theory Approach
From Everand
Network Analysis and Synthesis: A Modern Systems Theory Approach
Brian D. O. Anderson
5/5 (2)
2008d Sigirforum Murdock
No ratings yet
2008d Sigirforum Murdock
4 pages
Tacit and Explicit Understanding in Computer Support: Gerry Stahl's eLibrary, #2
From Everand
Tacit and Explicit Understanding in Computer Support: Gerry Stahl's eLibrary, #2
Gerry Stahl
No ratings yet
Dynamic Programming: Models and Applications
From Everand
Dynamic Programming: Models and Applications
Eric V. Denardo
2/5 (1)
Relevancy Based Content Search in Semantic Web
No ratings yet
Relevancy Based Content Search in Semantic Web
2 pages
TCP Congestion Control: A Systems Approach
From Everand
TCP Congestion Control: A Systems Approach
Larry L Peterson
No ratings yet
Search Engine Techniques
No ratings yet
Search Engine Techniques
10 pages
Review Article
No ratings yet
Review Article
14 pages
Knowledge Reasoning: Fundamentals and Applications
From Everand
Knowledge Reasoning: Fundamentals and Applications
Fouad Sabry
No ratings yet
An Analysis of The Askmsr Question-Answering System
No ratings yet
An Analysis of The Askmsr Question-Answering System
54 pages
OS Search Engine Comparison
No ratings yet
OS Search Engine Comparison
46 pages
A Comparison of Open Source Search Engine
No ratings yet
A Comparison of Open Source Search Engine
46 pages
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Semantic Network: Fundamentals and Applications
From Everand
Semantic Network: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction To Modern Information Retrieval (2nd Edition) : Ali Shiri
No ratings yet
Introduction To Modern Information Retrieval (2nd Edition) : Ali Shiri
3 pages
Combinatorial Algorithms: Enlarged Second Edition
From Everand
Combinatorial Algorithms: Enlarged Second Edition
T. C. Hu
3.5/5 (2)
Employing A Domain Specific Ontology To Perform Semantic Search
No ratings yet
Employing A Domain Specific Ontology To Perform Semantic Search
13 pages
Curs AI
No ratings yet
Curs AI
35 pages
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Computational Fluid Dynamics with Moving Boundaries
From Everand
Computational Fluid Dynamics with Moving Boundaries
Wei Shyy
No ratings yet
Iseeker: A Client-Side Internet Search Application
No ratings yet
Iseeker: A Client-Side Internet Search Application
97 pages
Conceptual Dependency Theory: Fundamentals and Applications
From Everand
Conceptual Dependency Theory: Fundamentals and Applications
Fouad Sabry
No ratings yet
Semantic Information Extraction in University Domain
No ratings yet
Semantic Information Extraction in University Domain
15 pages
TREC Evalution Measures
No ratings yet
TREC Evalution Measures
10 pages
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
Essays in Computer-Supported Collaborative Learning: Gerry Stahl's eLibrary, #9
From Everand
Essays in Computer-Supported Collaborative Learning: Gerry Stahl's eLibrary, #9
Gerry Stahl
4/5 (3)
Basic Matrix Theory
From Everand
Basic Matrix Theory
Leonard E. Fuller
No ratings yet
Information Retrieval Algorithms: A Survey: Prabhakar Raghavan
No ratings yet
Information Retrieval Algorithms: A Survey: Prabhakar Raghavan
8 pages
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
1-Introduction-MIR
No ratings yet
1-Introduction-MIR
35 pages
Building A Foundation System For Producing Short Answers To Factual Questions
No ratings yet
Building A Foundation System For Producing Short Answers To Factual Questions
10 pages
Sem Ser Ori
No ratings yet
Sem Ser Ori
69 pages
Beck Repo10
No ratings yet
Beck Repo10
19 pages
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
1-Overview of Information Retrieval_new
No ratings yet
1-Overview of Information Retrieval_new
47 pages
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
LOTED: a semantic web portal for the management of tenders from the European Community
From Everand
LOTED: a semantic web portal for the management of tenders from the European Community
Francesco Valle
No ratings yet
Fach Univ08
No ratings yet
Fach Univ08
26 pages
Architectural Metapatterns: The Pattern Language of Software Architecture
From Everand
Architectural Metapatterns: The Pattern Language of Software Architecture
Denys Poltorak
No ratings yet
Information Retrieval Techniques(1)
No ratings yet
Information Retrieval Techniques(1)
59 pages
Modern Information Retrieval: A Brief Overview
No ratings yet
Modern Information Retrieval: A Brief Overview
9 pages
Fach Usin08
No ratings yet
Fach Usin08
16 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Substantive Theory and Constructive Measures: A Collection of Chapters and Measurement Commentary on Causal Science
From Everand
Substantive Theory and Constructive Measures: A Collection of Chapters and Measurement Commentary on Causal Science
Mark Everett Stone
No ratings yet
swj248 PDF
No ratings yet
swj248 PDF
8 pages
HCIR 2010 Proceedings
No ratings yet
HCIR 2010 Proceedings
132 pages
11 - Question Answering Systems
No ratings yet
11 - Question Answering Systems
34 pages
Get Interactive Information Retrieval in Digital Environments 1st Edition Iris Xie PDF ebook with Full Chapters Now
100% (8)
Get Interactive Information Retrieval in Digital Environments 1st Edition Iris Xie PDF ebook with Full Chapters Now
75 pages
Information Retrieval Thesis Topics
100% (3)
Information Retrieval Thesis Topics
6 pages
SPLADE v2: Sparse Lexical and Expansion Model For Information Retrieval
No ratings yet
SPLADE v2: Sparse Lexical and Expansion Model For Information Retrieval
6 pages
Dense Passage Retrieval For Open-Domain Question Answering
No ratings yet
Dense Passage Retrieval For Open-Domain Question Answering
13 pages
Algorithms: Evaluation of Diversification Techniques For Legal Information Retrieval
No ratings yet
Algorithms: Evaluation of Diversification Techniques For Legal Information Retrieval
24 pages
Low-Cost Evaluation Techniques For Information Retrieval Systems: A Review
No ratings yet
Low-Cost Evaluation Techniques For Information Retrieval Systems: A Review
12 pages
FullProceedings
No ratings yet
FullProceedings
79 pages
Syllabus
No ratings yet
Syllabus
9 pages
The IIR Evaluation Model: A Framework For Evaluation of Interactive Information Retrieval Systems
No ratings yet
The IIR Evaluation Model: A Framework For Evaluation of Interactive Information Retrieval Systems
8 pages
Cs8080 - Irt - Notes All
No ratings yet
Cs8080 - Irt - Notes All
281 pages
Sppu Ir Apr 2023
No ratings yet
Sppu Ir Apr 2023
2 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Artificial Intelligence As Evidence
No ratings yet
Artificial Intelligence As Evidence
100 pages
Rankvicuna: Zero-Shot Listwise Document Reranking With Open-Source Large Language Models
No ratings yet
Rankvicuna: Zero-Shot Listwise Document Reranking With Open-Source Large Language Models
10 pages
Modern Information Retrieval
No ratings yet
Modern Information Retrieval
58 pages
The Use of XML in A Video Digital Librar
No ratings yet
The Use of XML in A Video Digital Librar
320 pages
IRS Important Questions
No ratings yet
IRS Important Questions
3 pages
Evaluation Metrics and Evaluation
No ratings yet
Evaluation Metrics and Evaluation
9 pages
1-Overview of Information Retrieval
No ratings yet
1-Overview of Information Retrieval
44 pages
Information-Based Models For Ad Hoc IR: Stéphane Clinchant Eric Gaussier Eric - Gaussier@
No ratings yet
Information-Based Models For Ad Hoc IR: Stéphane Clinchant Eric Gaussier Eric - Gaussier@
8 pages
Full Download New Agendas for Human computer Interaction A Special Double Issue of human computer Interaction Human Computer Interaction Vol 15 Nos 2 3 1st Edition Wendy A. Kellogg PDF DOCX
100% (9)
Full Download New Agendas for Human computer Interaction A Special Double Issue of human computer Interaction Human Computer Interaction Vol 15 Nos 2 3 1st Edition Wendy A. Kellogg PDF DOCX
50 pages
indri_vs_solr
No ratings yet
indri_vs_solr
4 pages
Information Processing and Management: Junmei Wang, Min Pan, Tingting He, Xiang Huang, Xueyan Wang, Xinhui Tu T
No ratings yet
Information Processing and Management: Junmei Wang, Min Pan, Tingting He, Xiang Huang, Xueyan Wang, Xinhui Tu T
20 pages
Amharic Question Answering For Biography
No ratings yet
Amharic Question Answering For Biography
10 pages
Information Science: Tefko Saracevic
No ratings yet
Information Science: Tefko Saracevic
13 pages
Chapter 5 IR Evaluation
No ratings yet
Chapter 5 IR Evaluation
45 pages
Introduction IR
No ratings yet
Introduction IR
61 pages
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
No ratings yet
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
61 pages

TREC Experiment and Evaluation in Inform

Uploaded by

TREC Experiment and Evaluation in Inform

Uploaded by

Computing Reviews, the leading online review service for computing lite... http://www.reviews.com/review/Review_review.cfm?review_id=133676...

TREC : experiment and evaluation in information retrieval (Digital Libraries and

Date Reviewed: Dec 7 2006

Reviewer: D. Soergel Review #: CR133676

2) Sparck-Jones, K. Reflections on TREC. Information Processing and Management 31,

3) Sparck-Jones, K. Further reflections on TREC. Information Processing and

4) Lancaster, F.W. MEDLARS: report on the evaluation of its operating efficiency.

Would you recommend this review? yes no

Other reviews under "Information Search And Retrieval": Date

Temporal pre-fetching of dynamic Web pages Sep 6 2006

E-Mail This Printer-Friendly

Reproduction in whole or in part without permission is prohibited. Copyright © 2000-2007 Reviews.com

You might also like