Info Retrieval

The document discusses information retrieval (IR), which involves obtaining information from a collection that is relevant to a user's information need. IR systems sort and rank documents based on user queries. Documents are transformed into representations using various models, including set-theoretic, algebraic, probabilistic, and feature-based models. Information extraction differs from IR in that it focuses on extracting specific pieces of information from documents, rather than retrieving entire documents. The document also covers issues and merits of IR systems, as well as some of their applications such as digital libraries, music search, and web search.

Uploaded by

SIDDHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views14 pages

Info Retrieval

Uploaded by

SIDDHI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Module 3: NLP

Aarti Dharmani
What do these images tell?
Information Retrieval
• It is the process of obtaining information system resources that are
relevant to an information need from a collection of those resources.
• It is the science of searching for information in a document,
searching for documents themselves, and also searching for the
metadata that describes data, and for databases of texts, images or
sounds.
• In simple words, it works to sort and rank documents based on the
queries of a user.
• An information retrieval process begins when a user or searcher
enters a query into the system.
• Queries are formal statements of information needs, for example
search strings in web search engines.
• In information retrieval a query does not uniquely identify a single
object in the collection.
• Instead, several objects may match the query, perhaps with different
degrees of relevance.
Model Types
• For effectively retrieving relevant documents by IR strategies, the
documents are typically transformed into a suitable representation.
• Each retrieval strategy incorporates a specific model for its document
representation purposes.
• The models are categorized according to two dimensions: the
mathematical basis and the properties of the model.
First dimension: mathematical basis
• Set-theoretic models represent documents as sets of words or phrases.
Similarities are usually derived from set-theoretic operations on those sets.
Common models are:
Egs: Standard Boolean model, Extended Boolean model, Fuzzy retrieval
• Algebraic models represent documents and queries usually as vectors,
matrices, or tuples. The similarity of the query vector and document vector is
represented as a scalar value.
Egs: Vector space model, Generalized vector space model, Topic-based
Vector Space Model, Extended Boolean model, Latent semantic indexing a.k.
a. latent semantic analysis
• Probabilistic models treat the process of document retrieval as a probabilistic
inference. Similarities are computed as probabilities that a document is
relevant for a given query. Probabilistic theorems like the Bayes' theorem are
often used in these models.
Egs: Binary Independence Model, Probabilistic relevance model on which is
based the okapi (BM25) relevance function, Uncertain inference, Language
model, Divergence-from-randomness model, Latent Dirichlet allocation
• Feature-based retrieval models view documents as vectors of values of
feature functions (or just features) and seek the best way to combine these
features into a single relevance score, typically by learning to rank methods.
Feature functions are arbitrary functions of document and query, and as such
can easily incorporate almost any other retrieval model as just another
feature.
Is there a difference between Information
Extraction and Information Retrieval?
Issues with IR systems
• Query evaluation
- Uncertainity
- Vagueness

• Ambiguition

• Document indexing (occurs because of the above issues)

Merit of IR systems
• To save the time of the readers when they search for their necessary information.
• The searching process is easy to understand.
• Current information is available in the storage database.
• Users can access multi-database to use multiple keywords/concepts at the same time.
• To serve multi-users at the same time.
• There have no geographical barriers to search for information from anywhere in the
world.
• Easy to store all of our search results.
• To retrieve information, form our query as several formats i.e. books, journals, PDFs,
documents, format, etc.
• Searching cost is less than manual searching.
• It has a resource sharing service section.
• To prove users’ friendly search logic.
Demerits of IR systems
• High establishment cost.
• Maximum library users and staff have not enough IT knowledge to
run this system.
• Lack of training facility.
• Electricity supply problem.
• Lack of networking and internet facility.
• Slow speeds of the internet delay the retrieval system.
• Sometimes it gives irrelevant information.
Applications
• Digital Library
• Music search
• Media search
• Web search
• Domain specific (geographic, chemical,software,legal to name a few!)

ISE Information Retrieval Mod-V (Uploaded by Snaptricks.in)
No ratings yet
ISE Information Retrieval Mod-V (Uploaded by Snaptricks.in)
48 pages
Power BI: Sample Interview Questions
No ratings yet
Power BI: Sample Interview Questions
10 pages
MEAN Stack Syllabus: The Front End Frameworks - Big Picture
No ratings yet
MEAN Stack Syllabus: The Front End Frameworks - Big Picture
10 pages
Unit Iii - Information Retrieval Design Features of Information Retrieval Systems
No ratings yet
Unit Iii - Information Retrieval Design Features of Information Retrieval Systems
57 pages
mod 4
No ratings yet
mod 4
35 pages
Artificial_Intelligence_in_Information_Retrieval
No ratings yet
Artificial_Intelligence_in_Information_Retrieval
5 pages
11 Multimedia Media IR
No ratings yet
11 Multimedia Media IR
19 pages
IRS B Tech CSE Part 1
No ratings yet
IRS B Tech CSE Part 1
161 pages
LIBS 894 Assignment Three Classic Models
No ratings yet
LIBS 894 Assignment Three Classic Models
8 pages
L001
No ratings yet
L001
49 pages
The Definitive Datawindow 2 Covers PowerBuilder
No ratings yet
The Definitive Datawindow 2 Covers PowerBuilder
825 pages
Lec 1- Intro- Unit 1 information technology
No ratings yet
Lec 1- Intro- Unit 1 information technology
102 pages
1_IR_Introductionn (1)
No ratings yet
1_IR_Introductionn (1)
30 pages
UNIT 1 Notes
No ratings yet
UNIT 1 Notes
16 pages
Everything in Brief Introduction
No ratings yet
Everything in Brief Introduction
5 pages
NLP - Module 5
No ratings yet
NLP - Module 5
58 pages
chapter 1 ir (1)
No ratings yet
chapter 1 ir (1)
37 pages
Information Retreival Methods
No ratings yet
Information Retreival Methods
19 pages
sap installation_Rajesh
No ratings yet
sap installation_Rajesh
28 pages
@vtucode.in-2021-scheme-DBMS-lab-malual-5th-semester
No ratings yet
@vtucode.in-2021-scheme-DBMS-lab-malual-5th-semester
62 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
NLP UNIT-II(PART-I)
No ratings yet
NLP UNIT-II(PART-I)
19 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
PE II6
No ratings yet
PE II6
166 pages
Ir Assignment
No ratings yet
Ir Assignment
12 pages
kyle Garrison Resume
No ratings yet
kyle Garrison Resume
2 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
Unit II
No ratings yet
Unit II
73 pages
Chap 4 Text IR PDF
No ratings yet
Chap 4 Text IR PDF
19 pages
Information Retrieval Detailed Lecture Nov 2023
No ratings yet
Information Retrieval Detailed Lecture Nov 2023
39 pages
Information Retrieval Models
No ratings yet
Information Retrieval Models
4 pages
Lab1-Algorithms For Information Retrieval. Introduction
No ratings yet
Lab1-Algorithms For Information Retrieval. Introduction
13 pages
Information Retrieval: Adt-V Unit
No ratings yet
Information Retrieval: Adt-V Unit
106 pages
DailyCode[1]
No ratings yet
DailyCode[1]
49 pages
Part B
No ratings yet
Part B
12 pages
of-280fbpkmhy
No ratings yet
of-280fbpkmhy
9 pages
01 Introduction to ISR
No ratings yet
01 Introduction to ISR
34 pages
Information Storage and Retrieval
No ratings yet
Information Storage and Retrieval
45 pages
Information Survey
No ratings yet
Information Survey
35 pages
Applied Business Statistics, 7 Ed. by Ken Black
No ratings yet
Applied Business Statistics, 7 Ed. by Ken Black
24 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
Web Information Retrieval
No ratings yet
Web Information Retrieval
10 pages
Information Retrieval System-Chapter-1
No ratings yet
Information Retrieval System-Chapter-1
23 pages
1 IR Intro
No ratings yet
1 IR Intro
30 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
Adt Unit 5
No ratings yet
Adt Unit 5
31 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
67 pages
Information Retrieval 1
No ratings yet
Information Retrieval 1
10 pages
Unit-1-Introduction To Statistical Analysis
No ratings yet
Unit-1-Introduction To Statistical Analysis
103 pages
IR Introduction
100% (1)
IR Introduction
6 pages
IR Cs Sem 6
No ratings yet
IR Cs Sem 6
16 pages
A Novel Big Data Analytics Framework For Smart Cities
No ratings yet
A Novel Big Data Analytics Framework For Smart Cities
30 pages
All Units Notes TYBSC-CS-Information-Retrieval
No ratings yet
All Units Notes TYBSC-CS-Information-Retrieval
89 pages
Admin II Dumps
No ratings yet
Admin II Dumps
24 pages
IR Notes.docx
No ratings yet
IR Notes.docx
14 pages
Search and Retrieval of Information
No ratings yet
Search and Retrieval of Information
7 pages
Documentation Ir
No ratings yet
Documentation Ir
58 pages
Web Mining UNIT-II Chapter-01 - 02 - 03
No ratings yet
Web Mining UNIT-II Chapter-01 - 02 - 03
19 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
1 IR Chapter-One
No ratings yet
1 IR Chapter-One
47 pages
Information Storage and Retrieval
No ratings yet
Information Storage and Retrieval
5 pages
PYTHON Core & Advanced Course Content - Cloud Learn ERP
No ratings yet
PYTHON Core & Advanced Course Content - Cloud Learn ERP
8 pages
ISE Information Retrieval Mod-V
No ratings yet
ISE Information Retrieval Mod-V
48 pages
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
No ratings yet
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
7 pages
Unit 1: Introduction and Data Pre-Processing
No ratings yet
Unit 1: Introduction and Data Pre-Processing
71 pages
Lecture - 7
No ratings yet
Lecture - 7
4 pages
Information Search and Retrieval
No ratings yet
Information Search and Retrieval
23 pages
Apache Helix
No ratings yet
Apache Helix
54 pages
Level-3 practical baste
No ratings yet
Level-3 practical baste
5 pages
Cohesity Deployment Guide Oracle Data Protection Oracle Adapter
No ratings yet
Cohesity Deployment Guide Oracle Data Protection Oracle Adapter
41 pages
Crypto Hash Algorithm-Based Blockchain Technology For Managing Decentralized Ledger Database in Oil and Gas Industry
No ratings yet
Crypto Hash Algorithm-Based Blockchain Technology For Managing Decentralized Ledger Database in Oil and Gas Industry
26 pages
Chapter Five System Analysis
No ratings yet
Chapter Five System Analysis
5 pages
Firestore Subcollections in Flutter
No ratings yet
Firestore Subcollections in Flutter
19 pages
Hemant's Resume
No ratings yet
Hemant's Resume
1 page
Business Analyst BD&BI - Logistics - AMAZON - Job Offer (Linkedin)
No ratings yet
Business Analyst BD&BI - Logistics - AMAZON - Job Offer (Linkedin)
2 pages
N-MIS-10.0
No ratings yet
N-MIS-10.0
43 pages
Young D.H.-statistical Treatment of Experimental Data-MGH (1962)
No ratings yet
Young D.H.-statistical Treatment of Experimental Data-MGH (1962)
190 pages
Database Administration Todd
No ratings yet
Database Administration Todd
23 pages
DBMS Practical Example For Students
No ratings yet
DBMS Practical Example For Students
33 pages
Blockchain Litmus Test
No ratings yet
Blockchain Litmus Test
12 pages
Syllabus
No ratings yet
Syllabus
9 pages
Reflect Essay
No ratings yet
Reflect Essay
10 pages
Actuate 8 and Maximo 6 Communications Overview: M05088-Actuate 8 Troubleshooting Procedures
No ratings yet
Actuate 8 and Maximo 6 Communications Overview: M05088-Actuate 8 Troubleshooting Procedures
6 pages
Computerized Library System 1
100% (1)
Computerized Library System 1
7 pages
How to Research Qualitatively: Tips for Scientific Working
From Everand
How to Research Qualitatively: Tips for Scientific Working
Martin Gertler
No ratings yet
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Sybase Interview Questions and Answers ..... NEW
100% (1)
Sybase Interview Questions and Answers ..... NEW
7 pages
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Info Retrieval

Uploaded by

Info Retrieval

Uploaded by

Module 3: NLP

• Document indexing (occurs because of the above issues)

You might also like