SlideShare a Scribd company logo
Text Databases and Information Retrieval
ELLEN RILOFF and LEE HOLLAAR
Department of Computer Science, University of Utah ͗riloff,hollaar@cs.utah.edu͘

The goal of a traditional information
retrieval (IR) system is to search an
information repository, such as a text
database, and retrieve documents that
are potentially relevant to a query.
Since query-based IR systems must operate in real time, they must be able to
search large volumes of text quickly and
efficiently. Other information-retrieval
applications, such as text categorization, text routing, and text filtering, are
also becoming increasingly important.
These applications are generally concerned with long-term information
needs, where a topic is expected to be of
interest for an extended period of time.
Text categorization systems assign predefined category labels to texts. For example, a text categorization system for
computer science might use categories
such as operating systems, programming languages, artificial intelligence,
or information retrieval. Text routing
systems typically accept a set of user
profiles and automatically classify texts
so that relevant texts can be routed to
appropriate users [Harman 1994]. Text
filtering systems accept a list of topics
that are, or are not, of interest and
allow only texts that satisfy the filter to
pass through to the user [Belkin and
Croft 1992]. Text categorization systems
are typically applied to static databases,
while text routing and text filtering systems are usually applied to incoming
data streams.
Information-retrieval systems must
grapple with all of the ambiguities and
idiosyncrasies inherent in natural language, such as synonymy (e.g., “start”,
“begin”, and “initiate” have essentially
the same meaning) and polysemy (e.g.,

“shot” has many different meanings, including the act of shooting, an injection,
a quantity of liquor, a photograph, pellets, or an attempt). Phrases also require special attention because multiword
expressions
often
have
a
composite meaning different from the
individual words. For example, a “hot
dog” does not usually refer to a warm
canine, and an “operating system” does
not usually refer to a system that is
simply operating.
Most information-retrieval systems
preprocess a document collection into an
inverted file that allows the system to
determine quickly which words appear in
each document. Stopword lists are commonly used to remove highly frequent
words, such as “the” and “of,” under the
assumption that they don’t contribute
much to the meaning of a text. Stemming
algorithms are sometimes used to reduce
a word to its root form so that different
morphological variations will match
[Frakes and Baeza-Yates 1992]. An alternative text-representation scheme uses
superimposed codewords to produce a
fixed-length vector from the binary representations of words. The fixed-length vector is especially useful for parallel and
hardware systems, but this method can
sometimes hallucinate words that don’t
actually appear in the original document.
Traditional information-retrieval methods retrieve documents by searching for
relevant words or phrases. Most commercial IR systems allow the user to define a
query using keywords and standard Boolean operators. These systems retrieve
documents that precisely match the
query. The vector-space model [Salton

Copyright © 1996, CRC Press.

ACM Computing Surveys, Vol. 28, No. 1, March 1996
134

•

Ellen Riloff and Lee Hollaar

1971] is a well-known method for automatic indexing that views each document
and query as a vector in an N-dimensional space, where N is the number of
relevant terms in the database. The
query vector is compared to all of the
document vectors using a similarity metric. Another retrieval model for automatic
indexing uses probability estimates to determine whether a document satisfies a
user’s query. For example, Bayesian inference networks have been used to compute the belief associated with a query for
each document in a database.
Relevance feedback techniques can
improve performance by asking the user
for feedback about the retrieved texts
[Salton 1989; Van Rijsbergen 1979]. The
user labels a subset of the retrieved
texts as relevant, and this information
is fed back into the system to modify the
original query, usually by adding new
terms or by changing the weights of the
original query terms. Relevance feedback has consistently been shown to
improve the performance of IR systems.
Experiments with richer text representations have also been conducted using natural-language processing (NLP)
techniques. Syntactic approaches have
been used to generate more complex
indexing terms consisting of phrases
and head-modifier structures. Knowledge-based NLP systems have been
used to generate conceptual meaning
representations of queries and documents. Information extraction techniques [Lehnert and Sundheim 1991]
have also been shown to be effective for
text classification problems, and represent a compromise between word-based
techniques and in-depth natural-language processing.
The future holds great promise for
integrating information-retrieval techniques with natural-language processing systems. The strengths of these
methodologies are largely complementary. IR systems use shallow text representations, which allows them to process large amounts of text quickly and
efficiently. But the accuracy of these
ACM Computing Surveys, Vol. 28, No. 1, March 1996

systems often suffers because of a lack
of semantic analysis, especially for complex information requests. Natural-language processing systems, on the other
hand, usually perform conceptual analyses, which allows them to produce
richer meanings and representations.
However, NLP techniques are more
computationally expensive and therefore are more difficult to scale up to
large text collections.
The information-retrieval community is
facing new challenges posed by larger
and more heterogeneous text databases,
which have led to an explosion of new
approaches and methodologies. As longer
texts become available on-line, new approaches are needed to process texts that
discuss multiple topics. A variety of techniques for subtopic identification and passage-based retrieval are actively being explored. Another area of active research is
intelligent information retrieval, which
draws upon techniques from artificial intelligence to generate richer text representations. Natural-language processing
methods (such as information extraction),
case-based reasoning techniques, and machine learning algorithms are all being
applied to information retrieval tasks in
the hopes of building more effective retrieval systems (for example, see ACM
[1995]). Intelligent information retrieval
is an exciting new direction for IR research.
REFERENCES
ACM. 1995. Proceedings of the 18th Annual
International ACM SIGIR Conference on
Research and Development in Information Retrieval. ACM, New York.
BELKIN, N. AND CROFT, W. B. 1992. Information
filtering and information retrieval: Two sides
of the same coin? Commun. ACM 35, 12,
29 –38.
FRAKES, W. B. AND BAEZA-YATES, R., EDS.
1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ.
HARMAN, D., ED. 1994. The Second Text REtrieval Conference (TREC2). National Institute of Standards and Technology Special
Publication 500 –215, Gaithersburg, MD.
LEHNERT, W. G. AND SUNDHEIM, B. 1991. A per-
Text Databases and Information Retrieval
formance evaluation of text analysis technologies. AI Mag. 12, 3, 81–94.
SALTON, G., ED. 1971. The SMART Retrieval
System: Experiments in Automatic Document
Processing. Prentice-Hall, Englewood Cliffs,
NJ.

•

135

SALTON, G. 1989. Automatic Text Processing:
The Transformation, Analysis, and Retrieval
of Information by Computer. Addison-Wesley,
Reading, MA.
VAN RIJSBERGEN, C. J. 1979. Information Retrieval (2nd Ed.). Butterworths, London.

ACM Computing Surveys, Vol. 28, No. 1, March 1996
Ad

More Related Content

What's hot (20)

Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
dhatchayaninandu
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
alaa223
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalization
Ameenababs
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
alaa223
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
maxfalc
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Carsten Eickhoff
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1
CorinaF
 
Text mining
Text miningText mining
Text mining
Pankaj Thakur
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...
IRJET Journal
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
hplap
 
Text mining
Text miningText mining
Text mining
Koshy Geoji
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
s0P5a41b
 
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
ijceronline
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
Post coordinate indexing .. Library and information science
Post coordinate indexing .. Library and information sciencePost coordinate indexing .. Library and information science
Post coordinate indexing .. Library and information science
harshaec
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Roi Blanco
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
BAIRAVI T
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
 
Lectures 1,2,3
Lectures 1,2,3Lectures 1,2,3
Lectures 1,2,3
alaa223
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalization
Ameenababs
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
maxfalc
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Carsten Eickhoff
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1
CorinaF
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...
IRJET Journal
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
hplap
 
Technical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search EngineTechnical Whitepaper: A Knowledge Correlation Search Engine
Technical Whitepaper: A Knowledge Correlation Search Engine
s0P5a41b
 
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
Survey on Existing Text Mining Frameworks and A Proposed Idealistic Framework...
ijceronline
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
Post coordinate indexing .. Library and information science
Post coordinate indexing .. Library and information sciencePost coordinate indexing .. Library and information science
Post coordinate indexing .. Library and information science
harshaec
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Roi Blanco
 

Viewers also liked (20)

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Arjen de Vries
 
Deductive Databases Presentation
Deductive Databases PresentationDeductive Databases Presentation
Deductive Databases Presentation
Maroun Baydoun
 
Deductive Databases
Deductive DatabasesDeductive Databases
Deductive Databases
Maroun Baydoun
 
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Cataldo Musto
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Sean Laurent
 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012
Chris Westin
 
MongoDB Devops Madrid February 2012
MongoDB Devops Madrid February 2012MongoDB Devops Madrid February 2012
MongoDB Devops Madrid February 2012
Juan Vicente Herrera Ruiz de Alejo
 
Seth Edwards on MongoDB
Seth Edwards on MongoDBSeth Edwards on MongoDB
Seth Edwards on MongoDB
Skills Matter
 
Getting Started with MongoDB
Getting Started with MongoDBGetting Started with MongoDB
Getting Started with MongoDB
Pankaj Bajaj
 
MongoDB
MongoDBMongoDB
MongoDB
Tharun Srinivasa
 
Intro to NoSQL and MongoDB
 Intro to NoSQL and MongoDB Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
MongoDB
 
Mastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellMastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript Shell
Scott Hernandez
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
christkv
 
An Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and KeynoteAn Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and Keynote
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Knoldus Inc.
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
MongoSF
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Dineesha Suraweera
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
Victoria Malaya
 
Plan de entrenamiento Maratón de Madrid Mes 3
Plan de entrenamiento Maratón de Madrid Mes 3Plan de entrenamiento Maratón de Madrid Mes 3
Plan de entrenamiento Maratón de Madrid Mes 3
Juan Vicente Herrera Ruiz de Alejo
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Arjen de Vries
 
Deductive Databases Presentation
Deductive Databases PresentationDeductive Databases Presentation
Deductive Databases Presentation
Maroun Baydoun
 
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Cataldo Musto
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Sean Laurent
 
mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012mongodb-brief-intro-february-2012
mongodb-brief-intro-february-2012
Chris Westin
 
Seth Edwards on MongoDB
Seth Edwards on MongoDBSeth Edwards on MongoDB
Seth Edwards on MongoDB
Skills Matter
 
Getting Started with MongoDB
Getting Started with MongoDBGetting Started with MongoDB
Getting Started with MongoDB
Pankaj Bajaj
 
Intro to NoSQL and MongoDB
 Intro to NoSQL and MongoDB Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
MongoDB
 
Mastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript ShellMastering the MongoDB Javascript Shell
Mastering the MongoDB Javascript Shell
Scott Hernandez
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
christkv
 
An Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and KeynoteAn Evening with MongoDB - Orlando: Welcome and Keynote
An Evening with MongoDB - Orlando: Welcome and Keynote
MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Knoldus Inc.
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
MongoSF
 
Ad

Similar to Text databases and information retrieval (20)

Ijetcas14 624
Ijetcas14 624Ijetcas14 624
Ijetcas14 624
Iasir Journals
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
cscpconf
 
Hci
HciHci
Hci
Er. Saurabh Singh
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
Bhaskar Chatterjee
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
INFOGAIN PUBLICATION
 
A0210110
A0210110A0210110
A0210110
inventionjournals
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
unyil96
 
A SEMANTIC RETRIEVAL SYSTEM FOR EXTRACTING RELATIONSHIPS FROM BIOLOGICAL CORPUS
A SEMANTIC RETRIEVAL SYSTEM FOR EXTRACTING RELATIONSHIPS FROM BIOLOGICAL CORPUS A SEMANTIC RETRIEVAL SYSTEM FOR EXTRACTING RELATIONSHIPS FROM BIOLOGICAL CORPUS
A SEMANTIC RETRIEVAL SYSTEM FOR EXTRACTING RELATIONSHIPS FROM BIOLOGICAL CORPUS
AIRCC Publishing Corporation
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
AIRCC Publishing Corporation
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
ijcsit
 
Hypertext
HypertextHypertext
Hypertext
patrickalfredwaluchio
 
Multilingualism in Information Retrieval System
Multilingualism in Information Retrieval SystemMultilingualism in Information Retrieval System
Multilingualism in Information Retrieval System
Ariel Hess
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
Hammad Afzal
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
ijcsity
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
IJERA Editor
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
IRJET Journal
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
cscpconf
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
ijcseit
 
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYINTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGY
cscpconf
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
INFOGAIN PUBLICATION
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
unyil96
 
A SEMANTIC RETRIEVAL SYSTEM FOR EXTRACTING RELATIONSHIPS FROM BIOLOGICAL CORPUS
A SEMANTIC RETRIEVAL SYSTEM FOR EXTRACTING RELATIONSHIPS FROM BIOLOGICAL CORPUS A SEMANTIC RETRIEVAL SYSTEM FOR EXTRACTING RELATIONSHIPS FROM BIOLOGICAL CORPUS
A SEMANTIC RETRIEVAL SYSTEM FOR EXTRACTING RELATIONSHIPS FROM BIOLOGICAL CORPUS
AIRCC Publishing Corporation
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
AIRCC Publishing Corporation
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
ijcsit
 
Multilingualism in Information Retrieval System
Multilingualism in Information Retrieval SystemMultilingualism in Information Retrieval System
Multilingualism in Information Retrieval System
Ariel Hess
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
Hammad Afzal
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
nimmyjans4
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
ijcsity
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
IJERA Editor
 
IRJET - BOT Virtual Guide
IRJET -  	  BOT Virtual GuideIRJET -  	  BOT Virtual Guide
IRJET - BOT Virtual Guide
IRJET Journal
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
cscpconf
 
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESMULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUES
ijcseit
 
Ad

More from unyil96 (20)

Xml linking
Xml linkingXml linking
Xml linking
unyil96
 
Xml data clustering an overview
Xml data clustering an overviewXml data clustering an overview
Xml data clustering an overview
unyil96
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
unyil96
 
Web page classification features and algorithms
Web page classification features and algorithmsWeb page classification features and algorithms
Web page classification features and algorithms
unyil96
 
The significance of linking
The significance of linkingThe significance of linking
The significance of linking
unyil96
 
Techniques for automatically correcting words in text
Techniques for automatically correcting words in textTechniques for automatically correcting words in text
Techniques for automatically correcting words in text
unyil96
 
Strict intersection types for the lambda calculus
Strict intersection types for the lambda calculusStrict intersection types for the lambda calculus
Strict intersection types for the lambda calculus
unyil96
 
Smart meeting systems a survey of state of-the-art
Smart meeting systems a survey of state of-the-artSmart meeting systems a survey of state of-the-art
Smart meeting systems a survey of state of-the-art
unyil96
 
Semantically indexed hypermedia linking information disciplines
Semantically indexed hypermedia linking information disciplinesSemantically indexed hypermedia linking information disciplines
Semantically indexed hypermedia linking information disciplines
unyil96
 
Searching in metric spaces
Searching in metric spacesSearching in metric spaces
Searching in metric spaces
unyil96
 
Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...
unyil96
 
Realization of natural language interfaces using
Realization of natural language interfaces usingRealization of natural language interfaces using
Realization of natural language interfaces using
unyil96
 
Ontology visualization methods—a survey
Ontology visualization methods—a surveyOntology visualization methods—a survey
Ontology visualization methods—a survey
unyil96
 
On nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsOn nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domains
unyil96
 
Nonmetric similarity search
Nonmetric similarity searchNonmetric similarity search
Nonmetric similarity search
unyil96
 
Multidimensional access methods
Multidimensional access methodsMultidimensional access methods
Multidimensional access methods
unyil96
 
Machine transliteration survey
Machine transliteration surveyMachine transliteration survey
Machine transliteration survey
unyil96
 
Machine learning in automated text categorization
Machine learning in automated text categorizationMachine learning in automated text categorization
Machine learning in automated text categorization
unyil96
 
Is this document relevant probably
Is this document relevant probablyIs this document relevant probably
Is this document relevant probably
unyil96
 
Integrating content search with structure analysis for hypermedia retrieval a...
Integrating content search with structure analysis for hypermedia retrieval a...Integrating content search with structure analysis for hypermedia retrieval a...
Integrating content search with structure analysis for hypermedia retrieval a...
unyil96
 
Xml linking
Xml linkingXml linking
Xml linking
unyil96
 
Xml data clustering an overview
Xml data clustering an overviewXml data clustering an overview
Xml data clustering an overview
unyil96
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
unyil96
 
Web page classification features and algorithms
Web page classification features and algorithmsWeb page classification features and algorithms
Web page classification features and algorithms
unyil96
 
The significance of linking
The significance of linkingThe significance of linking
The significance of linking
unyil96
 
Techniques for automatically correcting words in text
Techniques for automatically correcting words in textTechniques for automatically correcting words in text
Techniques for automatically correcting words in text
unyil96
 
Strict intersection types for the lambda calculus
Strict intersection types for the lambda calculusStrict intersection types for the lambda calculus
Strict intersection types for the lambda calculus
unyil96
 
Smart meeting systems a survey of state of-the-art
Smart meeting systems a survey of state of-the-artSmart meeting systems a survey of state of-the-art
Smart meeting systems a survey of state of-the-art
unyil96
 
Semantically indexed hypermedia linking information disciplines
Semantically indexed hypermedia linking information disciplinesSemantically indexed hypermedia linking information disciplines
Semantically indexed hypermedia linking information disciplines
unyil96
 
Searching in metric spaces
Searching in metric spacesSearching in metric spaces
Searching in metric spaces
unyil96
 
Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...Searching in high dimensional spaces index structures for improving the perfo...
Searching in high dimensional spaces index structures for improving the perfo...
unyil96
 
Realization of natural language interfaces using
Realization of natural language interfaces usingRealization of natural language interfaces using
Realization of natural language interfaces using
unyil96
 
Ontology visualization methods—a survey
Ontology visualization methods—a surveyOntology visualization methods—a survey
Ontology visualization methods—a survey
unyil96
 
On nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domainsOn nonmetric similarity search problems in complex domains
On nonmetric similarity search problems in complex domains
unyil96
 
Nonmetric similarity search
Nonmetric similarity searchNonmetric similarity search
Nonmetric similarity search
unyil96
 
Multidimensional access methods
Multidimensional access methodsMultidimensional access methods
Multidimensional access methods
unyil96
 
Machine transliteration survey
Machine transliteration surveyMachine transliteration survey
Machine transliteration survey
unyil96
 
Machine learning in automated text categorization
Machine learning in automated text categorizationMachine learning in automated text categorization
Machine learning in automated text categorization
unyil96
 
Is this document relevant probably
Is this document relevant probablyIs this document relevant probably
Is this document relevant probably
unyil96
 
Integrating content search with structure analysis for hypermedia retrieval a...
Integrating content search with structure analysis for hypermedia retrieval a...Integrating content search with structure analysis for hypermedia retrieval a...
Integrating content search with structure analysis for hypermedia retrieval a...
unyil96
 

Recently uploaded (20)

ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
 
Salesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docxSalesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docx
José Enrique López Rivera
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
 
Salesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docxSalesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docx
José Enrique López Rivera
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 

Text databases and information retrieval

  • 1. Text Databases and Information Retrieval ELLEN RILOFF and LEE HOLLAAR Department of Computer Science, University of Utah ͗riloff,[email protected]͘ The goal of a traditional information retrieval (IR) system is to search an information repository, such as a text database, and retrieve documents that are potentially relevant to a query. Since query-based IR systems must operate in real time, they must be able to search large volumes of text quickly and efficiently. Other information-retrieval applications, such as text categorization, text routing, and text filtering, are also becoming increasingly important. These applications are generally concerned with long-term information needs, where a topic is expected to be of interest for an extended period of time. Text categorization systems assign predefined category labels to texts. For example, a text categorization system for computer science might use categories such as operating systems, programming languages, artificial intelligence, or information retrieval. Text routing systems typically accept a set of user profiles and automatically classify texts so that relevant texts can be routed to appropriate users [Harman 1994]. Text filtering systems accept a list of topics that are, or are not, of interest and allow only texts that satisfy the filter to pass through to the user [Belkin and Croft 1992]. Text categorization systems are typically applied to static databases, while text routing and text filtering systems are usually applied to incoming data streams. Information-retrieval systems must grapple with all of the ambiguities and idiosyncrasies inherent in natural language, such as synonymy (e.g., “start”, “begin”, and “initiate” have essentially the same meaning) and polysemy (e.g., “shot” has many different meanings, including the act of shooting, an injection, a quantity of liquor, a photograph, pellets, or an attempt). Phrases also require special attention because multiword expressions often have a composite meaning different from the individual words. For example, a “hot dog” does not usually refer to a warm canine, and an “operating system” does not usually refer to a system that is simply operating. Most information-retrieval systems preprocess a document collection into an inverted file that allows the system to determine quickly which words appear in each document. Stopword lists are commonly used to remove highly frequent words, such as “the” and “of,” under the assumption that they don’t contribute much to the meaning of a text. Stemming algorithms are sometimes used to reduce a word to its root form so that different morphological variations will match [Frakes and Baeza-Yates 1992]. An alternative text-representation scheme uses superimposed codewords to produce a fixed-length vector from the binary representations of words. The fixed-length vector is especially useful for parallel and hardware systems, but this method can sometimes hallucinate words that don’t actually appear in the original document. Traditional information-retrieval methods retrieve documents by searching for relevant words or phrases. Most commercial IR systems allow the user to define a query using keywords and standard Boolean operators. These systems retrieve documents that precisely match the query. The vector-space model [Salton Copyright © 1996, CRC Press. ACM Computing Surveys, Vol. 28, No. 1, March 1996
  • 2. 134 • Ellen Riloff and Lee Hollaar 1971] is a well-known method for automatic indexing that views each document and query as a vector in an N-dimensional space, where N is the number of relevant terms in the database. The query vector is compared to all of the document vectors using a similarity metric. Another retrieval model for automatic indexing uses probability estimates to determine whether a document satisfies a user’s query. For example, Bayesian inference networks have been used to compute the belief associated with a query for each document in a database. Relevance feedback techniques can improve performance by asking the user for feedback about the retrieved texts [Salton 1989; Van Rijsbergen 1979]. The user labels a subset of the retrieved texts as relevant, and this information is fed back into the system to modify the original query, usually by adding new terms or by changing the weights of the original query terms. Relevance feedback has consistently been shown to improve the performance of IR systems. Experiments with richer text representations have also been conducted using natural-language processing (NLP) techniques. Syntactic approaches have been used to generate more complex indexing terms consisting of phrases and head-modifier structures. Knowledge-based NLP systems have been used to generate conceptual meaning representations of queries and documents. Information extraction techniques [Lehnert and Sundheim 1991] have also been shown to be effective for text classification problems, and represent a compromise between word-based techniques and in-depth natural-language processing. The future holds great promise for integrating information-retrieval techniques with natural-language processing systems. The strengths of these methodologies are largely complementary. IR systems use shallow text representations, which allows them to process large amounts of text quickly and efficiently. But the accuracy of these ACM Computing Surveys, Vol. 28, No. 1, March 1996 systems often suffers because of a lack of semantic analysis, especially for complex information requests. Natural-language processing systems, on the other hand, usually perform conceptual analyses, which allows them to produce richer meanings and representations. However, NLP techniques are more computationally expensive and therefore are more difficult to scale up to large text collections. The information-retrieval community is facing new challenges posed by larger and more heterogeneous text databases, which have led to an explosion of new approaches and methodologies. As longer texts become available on-line, new approaches are needed to process texts that discuss multiple topics. A variety of techniques for subtopic identification and passage-based retrieval are actively being explored. Another area of active research is intelligent information retrieval, which draws upon techniques from artificial intelligence to generate richer text representations. Natural-language processing methods (such as information extraction), case-based reasoning techniques, and machine learning algorithms are all being applied to information retrieval tasks in the hopes of building more effective retrieval systems (for example, see ACM [1995]). Intelligent information retrieval is an exciting new direction for IR research. REFERENCES ACM. 1995. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. BELKIN, N. AND CROFT, W. B. 1992. Information filtering and information retrieval: Two sides of the same coin? Commun. ACM 35, 12, 29 –38. FRAKES, W. B. AND BAEZA-YATES, R., EDS. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, NJ. HARMAN, D., ED. 1994. The Second Text REtrieval Conference (TREC2). National Institute of Standards and Technology Special Publication 500 –215, Gaithersburg, MD. LEHNERT, W. G. AND SUNDHEIM, B. 1991. A per-
  • 3. Text Databases and Information Retrieval formance evaluation of text analysis technologies. AI Mag. 12, 3, 81–94. SALTON, G., ED. 1971. The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, NJ. • 135 SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA. VAN RIJSBERGEN, C. J. 1979. Information Retrieval (2nd Ed.). Butterworths, London. ACM Computing Surveys, Vol. 28, No. 1, March 1996