1520784495 Lec5 Ir Introduction
1520784495 Lec5 Ir Introduction
IN INFORMATION RETRIEVAL
AND WEB SEARCH
Lecture 1:
Introduction
S. M. Vahidipour
[email protected]
Outline
2
Text Books
Search Engines:
Information Retrieval in Practice
3
Text Books
4
Text Books
5
Search and Information Retrieval
106: Million, 109: billion, 1012: Trillion, 1015: Quadrillion, 1018: Quintillion, …
6
Search and Information Retrieval
7
Information Retrieval
8
Data/Information
□ Storage
□ Search
9
Data/Information
□ Structured
□ Unstructured
10
Structured vs. Unstructured Data
11
What is a Document?
Examples:
Web pages, email, books, news stories, scholarly papers, text
messages, Word™, Powerpoint™, PDF, forum postings, patents, IM
(Instant Messages) sessions, etc.
Common properties
Significant text content
Some structure (≈ attributes in DB)
□ Papers: title, author, date
□ Email: subject, sender, destination, date
12
Comparing Text
Comparing the query text to the document text and determining what is
a good match is the core issue of information retrieval.
Exact matching of words is not enough
Many different ways to write the same thing in a “natural language” like
English
Does a news story containing the text “karl benz built the first automobile in 1886” match
the query “car inverter”?
Defining the meaning of a word, a sentence, a paragraph, or a story is
more difficult than defining the meaning of a database field.
13
Dimensions of IR
IR is more than just text, and more than just web search
although these are central
People doing IR work with different media, different types of search
applications, and different tasks
Three dimensions of IR
□ Content
□ Applications
□ Tasks
20
The Content Dimension
15
The Application D imension
Vertical search
P2P search
□ Restricted domain/topic
□ No centralized control
□ Books, movies, suppliers □ File sharing, shared locality
16
The Task Dimension
17
Main Issues in IR
Relevance
□ A relevant document contains the information a user was looking for when
he/she submitted the query
Evaluation
□ How well does the ranking meet the expectation of the user
Users and information needs
□ Users of a search engine are the ultimate judges of quality
18
IR and Search Engines
19
Outline
20
Search Engine
Basic architecture
Main issues
Indexing
Text acquisition
Text
transformation
Index creation
Querying
User interaction
Ranking
Evaluation
21
Overview of Traditional Retrieval Models
Boolean retrieval
Vector space model
Probabilistic models
22
Overview of Evaluation Metrics
Effectiveness metrics
Efficiency metrics
23
Advanced Retrieval Models
30
Word Mismatch Problem
25
Advanced/Specific IR Tasks
27
Personalized Search
28
Information Extraction
29
Cross- language Retrieval
30
Question Answering
31
Recommendation Systems
32
Enterprise Search
33
Digital Library
40
Structured Text Retrieval
35
Multimedia Retrieval
36
Questions?
37