0% found this document useful (0 votes)
21 views14 pages

Info Retrieval

The document discusses information retrieval (IR), which involves obtaining information from a collection that is relevant to a user's information need. IR systems sort and rank documents based on user queries. Documents are transformed into representations using various models, including set-theoretic, algebraic, probabilistic, and feature-based models. Information extraction differs from IR in that it focuses on extracting specific pieces of information from documents, rather than retrieving entire documents. The document also covers issues and merits of IR systems, as well as some of their applications such as digital libraries, music search, and web search.

Uploaded by

SIDDHI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

Info Retrieval

The document discusses information retrieval (IR), which involves obtaining information from a collection that is relevant to a user's information need. IR systems sort and rank documents based on user queries. Documents are transformed into representations using various models, including set-theoretic, algebraic, probabilistic, and feature-based models. Information extraction differs from IR in that it focuses on extracting specific pieces of information from documents, rather than retrieving entire documents. The document also covers issues and merits of IR systems, as well as some of their applications such as digital libraries, music search, and web search.

Uploaded by

SIDDHI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Module 3: NLP

Aarti Dharmani
What do these images tell?
Information Retrieval
• It is the process of obtaining information system resources that are
relevant to an information need from a collection of those resources.
• It is the science of searching for information in a document,
searching for documents themselves, and also searching for the
metadata that describes data, and for databases of texts, images or
sounds.
• In simple words, it works to sort and rank documents based on the
queries of a user.
• An information retrieval process begins when a user or searcher
enters a query into the system.
• Queries are formal statements of information needs, for example
search strings in web search engines.
• In information retrieval a query does not uniquely identify a single
object in the collection.
• Instead, several objects may match the query, perhaps with different
degrees of relevance.
Model Types
• For effectively retrieving relevant documents by IR strategies, the
documents are typically transformed into a suitable representation.
• Each retrieval strategy incorporates a specific model for its document
representation purposes.
• The models are categorized according to two dimensions: the
mathematical basis and the properties of the model.
First dimension: mathematical basis
• Set-theoretic models represent documents as sets of words or phrases.
Similarities are usually derived from set-theoretic operations on those sets.
Common models are:
Egs: Standard Boolean model, Extended Boolean model, Fuzzy retrieval
• Algebraic models represent documents and queries usually as vectors,
matrices, or tuples. The similarity of the query vector and document vector is
represented as a scalar value.
Egs: Vector space model, Generalized vector space model, Topic-based
Vector Space Model, Extended Boolean model, Latent semantic indexing a.k.
a. latent semantic analysis
• Probabilistic models treat the process of document retrieval as a probabilistic
inference. Similarities are computed as probabilities that a document is
relevant for a given query. Probabilistic theorems like the Bayes' theorem are
often used in these models.
Egs: Binary Independence Model, Probabilistic relevance model on which is
based the okapi (BM25) relevance function, Uncertain inference, Language
model, Divergence-from-randomness model, Latent Dirichlet allocation
• Feature-based retrieval models view documents as vectors of values of
feature functions (or just features) and seek the best way to combine these
features into a single relevance score, typically by learning to rank methods.
Feature functions are arbitrary functions of document and query, and as such
can easily incorporate almost any other retrieval model as just another
feature.
Is there a difference between Information
Extraction and Information Retrieval?
Issues with IR systems
• Query evaluation
- Uncertainity
- Vagueness

• Ambiguition

• Document indexing (occurs because of the above issues)


Merit of IR systems
• To save the time of the readers when they search for their necessary information.
• The searching process is easy to understand.
• Current information is available in the storage database.
• Users can access multi-database to use multiple keywords/concepts at the same time.
• To serve multi-users at the same time.
• There have no geographical barriers to search for information from anywhere in the
world.
• Easy to store all of our search results.
• To retrieve information, form our query as several formats i.e. books, journals, PDFs,
documents, format, etc.
• Searching cost is less than manual searching.
• It has a resource sharing service section.
• To prove users’ friendly search logic.
Demerits of IR systems
• High establishment cost.
• Maximum library users and staff have not enough IT knowledge to
run this system.
• Lack of training facility.
• Electricity supply problem.
• Lack of networking and internet facility.
• Slow speeds of the internet delay the retrieval system.
• Sometimes it gives irrelevant information.
Applications
• Digital Library
• Music search
• Media search
• Web search
• Domain specific (geographic, chemical,software,legal to name a few!)

You might also like