Information Retrieval
Information Retrieval
Course Contents:
Information retrieval is the process through which a computer system can respond to a user's
query for text-based information on a specific topic. IR was one of the first and remains one of
the most important problems in the domain of natural language processing (NLP). Web search
is the application of information retrieval techniques to the largest corpus of text anywhere --
the web -- and it is the area in which most people interact with IR systems most frequently.
In this course, we will cover basic and advanced techniques for building text-based
information systems, including the following topics:
Efficient text indexing
Boolean and vector-space retrieval models
Evaluation and interface issues
IR techniques for the web, including crawling, link-based algorithms, and metadata
usage
Document clustering and classification
Traditional and machine learning-based ranking approaches
Course Objective:
By the end of this course the student should:
understand the theoretical basis behind the standard models of IR (Boolean,
Vector-space, Probabilistic and Logical models),
understand the difficulty of representing and retrieving documents, images,
speech, etc.,
be able to implement, run and test a standard IR system,
Teaching Methodology:
Lectures, Written Assignments, Practical labs, Semester Project, Presentations
Courses Assessment:
Exams, Assignments, Quizzes. Course will be assessed using a combination of written
examinations.
Reference Materials:
There are several good textbooks for the topic of information retrieval. The first book listed
below is our official textbook, and the others are recommended references.
1. Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, and
Hinrich Schuetze, Cambridge University Press, 2007.
2. Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, and
Trevor Strohman, Pearson Education, 2009.
3. Modern Information Retrieval. Baeza-Yates Ricardo and Berthier Ribeiro-Neto. 2nd
edition, Addison-Wesley, 2011. 1 SYLLABUS IFORMATION RETRIEVAL
4. Information Retrieval: Implementing and Evaluating Search Engines. Stefan Buttcher,
Charlie Clarke, Gordon Cormack, MIT Press, 2010.
Week Contents Theory
1 Introduction to Information Retrieval
Motivation
Information Retrieval vs Data Retrieval
Flashback
Boolean Model
Vector Space Model
Probabilistic Model
Alternative Models
3 Retrieval Evaluation
Keywords
Boolean Queries
Context Queries
Natural Language Queries
Structural Queries
Relevance Feedback
Query Expansion
Automatic Local Analysis
Automatic Global Analysis
7 Text Searching
Knuth-Morris-Pratt
Boyer-Moore family
Suffix automaton
Phrases and Proximity
8 Document Clustering
MID TERM
9 Multimedia Information Retrieval
Similarity Queries
Feature-based Indexing and Searching
Spatial Access Methods
Searching in Multidimensional Spaces
11 Meta-Ranking
12 Web Search
History of Web
Indexing
Spidering/Crawling
Link Analysis (HITS, PageRank)
16 Search applications
Introduce modern applications in search systems, including recommendation,
personalization, and online advertising, if time allows.
Final Exam