0% found this document useful (0 votes)

2 views

1520784495 Lec5 Ir Introduction

The document outlines a course on advanced topics in information retrieval and web search, including an introduction to the field and its significance in daily web activities. It discusses various aspects of information retrieval, including definitions, dimensions, and applications, as well as the architecture of search engines and traditional retrieval models. Additionally, it highlights advanced retrieval models and specific tasks within the field, such as personalized search and question answering.

Uploaded by

hoanglinh90198

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

1520784495 Lec5 Ir Introduction

Uploaded by

hoanglinh90198

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

ADVANCED TOPICS

IN INFORMATION RETRIEVAL
AND WEB SEARCH

Lecture 1:
Introduction

S. M. Vahidipour
[email protected]
Outline

□ Introduction to the Course

□ Overview of the Semester

2
Text Books

Search Engines:
Information Retrieval in Practice

W. Bruce Croft, Donald Metzler, Trevor Strohman

Pearson Education, 2010

3
Text Books

Modern Information Retrieval:

The Concepts and Technology behind Search
(2nd Edition)

Ricardo Baeza-Yates, Berthier Ribeiro-Neto

ACM Press Books, 2010

4
Text Books

Introduction to Information Retrieval

C. Manning, P. Raghavan, and H. Schütze

Cambridge University Press, 2008

5
Search and Information Retrieval

 Search on the Web is a daily activity for many people

throughout the world
□ Google: 40,000 searches per second (3.5 billion per
day; 1.2 trillion per year)
□ Yahoo: 3,200 searches per second (280 million per day;
8.4 billion per month)
□ Bing: 927 searches per second ( 80 million per day;
2.4 billion per month)

106: Million, 109: billion, 1012: Trillion, 1015: Quadrillion, 1018: Quintillion, …

6
Search and Information Retrieval

□ Search and communication are most popular uses of the computer.

□ Applications involving search are everywhere.
□ The field of computer science that is most involved with R&D for search
is information retrieval (IR).

7
Information Retrieval

“Information retrieval is a field concerned with the structure, analysis,

organization, storage, searching, and retrieval of information.”
(Salton, 1968)

□ General definition that can be applied to many types of information

and search applications
□ Still appropriate after 40 years.
□ Primary focus of IR since the 50s has been on text and documents

8
Data/Information

□ Storage

□ Search

9
Data/Information

□ Structured

□ Unstructured

10
Structured vs. Unstructured Data

11
What is a Document?

 Examples:
 Web pages, email, books, news stories, scholarly papers, text
messages, Word™, Powerpoint™, PDF, forum postings, patents, IM
(Instant Messages) sessions, etc.
 Common properties
 Significant text content
 Some structure (≈ attributes in DB)
□ Papers: title, author, date
□ Email: subject, sender, destination, date

12
Comparing Text

Comparing the query text to the document text and determining what is
a good match is the core issue of information retrieval.
Exact matching of words is not enough
 Many different ways to write the same thing in a “natural language” like
English
 Does a news story containing the text “karl benz built the first automobile in 1886” match
the query “car inverter”?
 Defining the meaning of a word, a sentence, a paragraph, or a story is
more difficult than defining the meaning of a database field.

13
Dimensions of IR

IR is more than just text, and more than just web search
 although these are central
People doing IR work with different media, different types of search
applications, and different tasks

Three dimensions of IR
□ Content
□ Applications
□ Tasks

20
The Content Dimension

Textual data, but…

New applications increasingly involve new media
□ Video, photos, music, speech
□ Scanned documents (for legal purposes)
Like text, content is difficult to describe and compare
□ Text may be used to represent them (e.g., tags)
IR approaches to search and evaluation are appropriate

15
The Application D imension

 Web search  Desktop search

□ Personal enterprise search
□ Most common
□ See above plus recent web pages

 Vertical search
 P2P search
□ Restricted domain/topic
□ No centralized control
□ Books, movies, suppliers □ File sharing, shared locality

 Enterprise search  Literature search

□ Corporate intranet
□ Databases, emails, web pages,  Forum search
documentation, code, wikis, tags,
directories, presentations, spreadsheets …

16
The Task Dimension

 User queries / ad-hoc search

□ Range of query enormous, not pre-specified
 Filtering
□ Given a profile (interests), notify about interesting news stories
□ Identify relevant user profiles for a new document
 Classification / categorization
□ Automatically assign text to one or more classes of a given set
□ Identify relevant labels for documents
 Question answering
□ Similar to search
□ Automatically answer a question posed in natural language
□ Provide concrete answer, not list of documents.

17
Main Issues in IR

Relevance
□ A relevant document contains the information a user was looking for when
he/she submitted the query
Evaluation
□ How well does the ranking meet the expectation of the user
Users and information needs
□ Users of a search engine are the ultimate judges of quality

18
IR and Search Engines

A search engine is the practical application of information retrieval

techniques to large scale text collections
Big issues include main IR issues but also some others…

Information Retrieval Search Engines

● Relevance: Effective ranking ● Performance: Efficient search and indexing
● Evaluation: Testing and measuring ● Incorporating new data: Coverage and freshness
● Information needs: User interaction ● Scalability: Growing with data and users
● Adaptability: Tuning for applications
● Specific problems: e.g., Spam
Additional

19
Outline

□ Introduction to the Course

□ Overview of the Semester

20
Search Engine

 Basic architecture
Main issues
Indexing
 Text acquisition
 Text
transformation
 Index creation
Querying
 User interaction
 Ranking
 Evaluation

21
Overview of Traditional Retrieval Models

Boolean retrieval
Vector space model
Probabilistic models

22
Overview of Evaluation Metrics

 Effectiveness metrics

 Efficiency metrics

 Training, testing, and statistics

23
Advanced Retrieval Models

Language model-based retrieval

Learning to rank

30
Word Mismatch Problem

Language model-based approaches

□ Translation model
□ Topic model
□ Word cluster model
□ Wordnet
□ Dependency model

Query expansion approaches

25
Advanced/Specific IR Tasks

 Query log and query suggestion

 Personalized search
 Information extraction
 Cross-language IR
 Question answering
 Recommendation systems
 Enterprise search
 Digital library
 Structured text retrieval
 Multimedia retrieval
26
Query Log and Query Suggestion

27
Personalized Search

28
Information Extraction

29
Cross- language Retrieval

30
Question Answering

31
Recommendation Systems

32
Enterprise Search

33
Digital Library

40
Structured Text Retrieval

35
Multimedia Retrieval

36
Questions?

Route Map Digital Marketing - Masterclass - Tamil - Compressed-1
No ratings yet
Route Map Digital Marketing - Masterclass - Tamil - Compressed-1
28 pages
Market Guide For Privileged Access Management Aug 2017 PDF
No ratings yet
Market Guide For Privileged Access Management Aug 2017 PDF
38 pages
Lecture1 Chap1
No ratings yet
Lecture1 Chap1
22 pages
Chap 1
No ratings yet
Chap 1
23 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
1.introduction Information Retrival
No ratings yet
1.introduction Information Retrival
31 pages
Unit - I - IR
No ratings yet
Unit - I - IR
39 pages
Information Retrieval: Dr. Bassel ALKHATIB
No ratings yet
Information Retrieval: Dr. Bassel ALKHATIB
55 pages
Intro Notes
No ratings yet
Intro Notes
11 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
Information Retrieval and Web Search
No ratings yet
Information Retrieval and Web Search
29 pages
Ch2_IR and LT
No ratings yet
Ch2_IR and LT
45 pages
Information Retrieval: DR Sharifullah Khan Nust Seecs
No ratings yet
Information Retrieval: DR Sharifullah Khan Nust Seecs
32 pages
Chapter 1 Introduction To ISR
No ratings yet
Chapter 1 Introduction To ISR
39 pages
Chapter One - Information Storage & Reterival
No ratings yet
Chapter One - Information Storage & Reterival
25 pages
Information Retrieval 1 Introduction To IR
No ratings yet
Information Retrieval 1 Introduction To IR
12 pages
1stunit GN
No ratings yet
1stunit GN
36 pages
Unit-5. Search Engines
No ratings yet
Unit-5. Search Engines
105 pages
Modern Information Retrieval: Computer Engineering Department Fall 2005
No ratings yet
Modern Information Retrieval: Computer Engineering Department Fall 2005
19 pages
Cs8080 - Irt - Notes All
No ratings yet
Cs8080 - Irt - Notes All
281 pages
Introduction To IR Chapter 01
No ratings yet
Introduction To IR Chapter 01
29 pages
01 - Lect - Introd
No ratings yet
01 - Lect - Introd
23 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
01 Introduction to ISR
No ratings yet
01 Introduction to ISR
34 pages
Introduction To IR 2021
No ratings yet
Introduction To IR 2021
40 pages
3
No ratings yet
3
14 pages
1_introIR
No ratings yet
1_introIR
15 pages
Part I IR VTU M Tech SSE
No ratings yet
Part I IR VTU M Tech SSE
72 pages
1_IR_Introductionn (1)
No ratings yet
1_IR_Introductionn (1)
30 pages
chapter 1 ir (1)
No ratings yet
chapter 1 ir (1)
37 pages
1 IRIntro
No ratings yet
1 IRIntro
95 pages
Chap 1
No ratings yet
Chap 1
22 pages
Introduction Information Retrieval
No ratings yet
Introduction Information Retrieval
73 pages
Cs8080irtunitinotes 220515215754 E06d144b
No ratings yet
Cs8080irtunitinotes 220515215754 E06d144b
43 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
1 IR Chapter-One
No ratings yet
1 IR Chapter-One
47 pages
chapter one IR
No ratings yet
chapter one IR
18 pages
Information Storage and Retrieval: Chapter One - Introduction
No ratings yet
Information Storage and Retrieval: Chapter One - Introduction
50 pages
ch1_Information Retrieval Systems
No ratings yet
ch1_Information Retrieval Systems
52 pages
1 IR Introduction
No ratings yet
1 IR Introduction
23 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
42 pages
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
No ratings yet
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
77 pages
IR chapter 1 (2)
No ratings yet
IR chapter 1 (2)
29 pages
1 IR Intro
No ratings yet
1 IR Intro
30 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
63 pages
IR Chapter 1&2
No ratings yet
IR Chapter 1&2
88 pages
Introduction
No ratings yet
Introduction
32 pages
1 introIR
No ratings yet
1 introIR
22 pages
Information Retrieval and Artificial Intelligence.
No ratings yet
Information Retrieval and Artificial Intelligence.
5 pages
IR_MOD1_NOTES
No ratings yet
IR_MOD1_NOTES
20 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
1-Overview of Information Retrieval_new
No ratings yet
1-Overview of Information Retrieval_new
47 pages
1-Introduction-MIR
No ratings yet
1-Introduction-MIR
35 pages
Chapter 1 Introduction to IR
No ratings yet
Chapter 1 Introduction to IR
18 pages
Unit - 1
No ratings yet
Unit - 1
51 pages
UNIT I IR Final
No ratings yet
UNIT I IR Final
26 pages
01 Introduction To ISR
No ratings yet
01 Introduction To ISR
48 pages
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
100% (1)
Wollo University Kombolcha Institute of Technology College of Informatics Department of Information Technology
35 pages
1 Mod-1_Lec-1
No ratings yet
1 Mod-1_Lec-1
21 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
CVF Rter
No ratings yet
CVF Rter
142 pages
Credit Management Configuration
No ratings yet
Credit Management Configuration
10 pages
HYBRID EXPERT SYSTEM AND AI IN DECISION MAKING
No ratings yet
HYBRID EXPERT SYSTEM AND AI IN DECISION MAKING
15 pages
BK 280 2
No ratings yet
BK 280 2
11 pages
Sanjay Bagaria Resume
No ratings yet
Sanjay Bagaria Resume
2 pages
GATE 2018 Admit Card S6: Examination Centre
No ratings yet
GATE 2018 Admit Card S6: Examination Centre
1 page
LTE Report Sumbagsel
No ratings yet
LTE Report Sumbagsel
381 pages
Radio Access Network Architecture
75% (4)
Radio Access Network Architecture
170 pages
Iqrf Spi: Technical Guide
No ratings yet
Iqrf Spi: Technical Guide
24 pages
vIMS For Communications Service Providers: Solution Brief
No ratings yet
vIMS For Communications Service Providers: Solution Brief
12 pages
Feature-List-Avi-ESSENTIALS-vs-BASIC-vs-ENTERPRISE-For-AKO (1)
No ratings yet
Feature-List-Avi-ESSENTIALS-vs-BASIC-vs-ENTERPRISE-For-AKO (1)
4 pages
Theory of Computation - Need Solution Manual
No ratings yet
Theory of Computation - Need Solution Manual
3 pages
Parallel Implementation of OPTICS Algorithm
No ratings yet
Parallel Implementation of OPTICS Algorithm
10 pages
Software Engineering Coursework 4
No ratings yet
Software Engineering Coursework 4
15 pages
Python For Data Science Cheat Sheet 2.0
100% (1)
Python For Data Science Cheat Sheet 2.0
11 pages
New Format - Original
No ratings yet
New Format - Original
4 pages
SM G531M Eplis 11 PDF
No ratings yet
SM G531M Eplis 11 PDF
9 pages
Graphics Standards 04-MAR-21
100% (1)
Graphics Standards 04-MAR-21
44 pages
Big Data Performance Testing-The SandStorm Way
No ratings yet
Big Data Performance Testing-The SandStorm Way
10 pages
TVL Ict CSS 12 Coc1 Q1 M1 Lo2 W3&4
100% (1)
TVL Ict CSS 12 Coc1 Q1 M1 Lo2 W3&4
16 pages
23 Study Notes Computer PDF
No ratings yet
23 Study Notes Computer PDF
20 pages
Bryan Resume 2019
No ratings yet
Bryan Resume 2019
2 pages
MaxMobile UserGuide
No ratings yet
MaxMobile UserGuide
3 pages
CSS MELC Grade 12
No ratings yet
CSS MELC Grade 12
4 pages
SMPP Error Code
No ratings yet
SMPP Error Code
2 pages
Mgate 5103 Series: 1-Port Modbus Rtu/Ascii/Tcp/Ethernet/Ip-To-Profinet Gateways
No ratings yet
Mgate 5103 Series: 1-Port Modbus Rtu/Ascii/Tcp/Ethernet/Ip-To-Profinet Gateways
6 pages
Next-Wave Reference Docs
No ratings yet
Next-Wave Reference Docs
3 pages
Skills: Ashraf Emam Abdel Aleem
No ratings yet
Skills: Ashraf Emam Abdel Aleem
3 pages