0% found this document useful (0 votes)
24 views

Information Retrieval

This document outlines the course contents and objectives for an information retrieval course. The course covers basic and advanced techniques for building text-based information systems, including indexing, retrieval models, evaluation, query languages, advanced query operations, text preprocessing, searching, document clustering, multimedia retrieval, parallel and distributed systems, meta-ranking, web search, user interfaces, link analysis, crawling, and applications of search systems.

Uploaded by

Noureen Zafar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Information Retrieval

This document outlines the course contents and objectives for an information retrieval course. The course covers basic and advanced techniques for building text-based information systems, including indexing, retrieval models, evaluation, query languages, advanced query operations, text preprocessing, searching, document clustering, multimedia retrieval, parallel and distributed systems, meta-ranking, web search, user interfaces, link analysis, crawling, and applications of search systems.

Uploaded by

Noureen Zafar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

PIR MEHR ALI SHAH ARID AGRICULTURE UNIVERSITY

University Institute of Information Technology

CS-802 Information retrieval


Credit Hours: 3(3-0) Prerequisites: None
Course Learning Outcomes (CLOs)
At the end of course the students will be able to: Domain BT Level*
1. Understand the basic concepts of the information C 1
retrieval.
2. Learn tools and techniques to do cutting-edge research in C 2
the area of information retrieval or text mining.
3. Identify the involvement of the information retrieval in C 3
modern life style & social media
4. Get hands on project experience by developing real- C 4
world applications, such as intelligent tools for
improving search accuracy from user feedback, email
spam detection, recommendation system, or scientific
literature organization and mining.
*BT- Bloom’s Taxonomy, C=Cognitive domain, P=Psychomotor domain, A=Affective domain

Course Contents:
Information retrieval is the process through which a computer system can respond to a user's
query for text-based information on a specific topic. IR was one of the first and remains one of
the most important problems in the domain of natural language processing (NLP). Web search
is the application of information retrieval techniques to the largest corpus of text anywhere --
the web -- and it is the area in which most people interact with IR systems most frequently.
In this course, we will cover basic and advanced techniques for building text-based
information systems, including the following topics:
Efficient text indexing
 Boolean and vector-space retrieval models
 Evaluation and interface issues
 IR techniques for the web, including crawling, link-based algorithms, and metadata
usage
 Document clustering and classification
 Traditional and machine learning-based ranking approaches

Course Objective:
By the end of this course the student should:
 understand the theoretical basis behind the standard models of IR (Boolean,

Vector-space, Probabilistic and Logical models),
  understand the difficulty of representing and retrieving documents, images,
speech, etc.,
  be able to implement, run and test a standard IR system,

  understand the standard methods for Web indexing and retrieval,

  understand how techniques from natural language processing, artificial


intelligence, human-computer interaction and visualization integrate with IR, and
  be familiar with various algorithms and systems.

Teaching Methodology:
Lectures, Written Assignments, Practical labs, Semester Project, Presentations
Courses Assessment:
Exams, Assignments, Quizzes. Course will be assessed using a combination of written
examinations.
Reference Materials:
There are several good textbooks for the topic of information retrieval. The first book listed
below is our official textbook, and the others are recommended references.
1. Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, and
Hinrich Schuetze, Cambridge University Press, 2007.
2. Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, and
Trevor Strohman, Pearson Education, 2009.
3. Modern Information Retrieval. Baeza-Yates Ricardo and Berthier Ribeiro-Neto. 2nd
edition, Addison-Wesley, 2011. 1 SYLLABUS IFORMATION RETRIEVAL
4. Information Retrieval: Implementing and Evaluating Search Engines. Stefan Buttcher,
Charlie Clarke, Gordon Cormack, MIT Press, 2010.
Week Contents Theory
1 Introduction to Information Retrieval

 Motivation
 Information Retrieval vs Data Retrieval
 Flashback

2 Models of Information Retrieval

 Boolean Model
 Vector Space Model
 Probabilistic Model
 Alternative Models

3 Retrieval Evaluation

 Recall and Precision


 Alternative Measures
 Reference Collections and Evaluation of IR systems

4 Query Languages for IR

 Keywords
 Boolean Queries
 Context Queries
 Natural Language Queries
 Structural Queries

5 Advanced Query Operations

 Relevance Feedback
 Query Expansion
 Automatic Local Analysis
 Automatic Global Analysis

6 Text Indexing, Preprocessing and File Organization

 Stopwards, stemming, thesauri


 File (Text) organization (invert,suff)
 Text statistics (properties)
 Text compression

7 Text Searching

 Knuth-Morris-Pratt
 Boyer-Moore family
 Suffix automaton
 Phrases and Proximity

8 Document Clustering
MID TERM
9 Multimedia Information Retrieval

 Similarity Queries
 Feature-based Indexing and Searching
 Spatial Access Methods
 Searching in Multidimensional Spaces

10 Parallel and Distributed IR

 Architectures MIMD and SIMD


 Collection Partitioning
 Source Selection
 Query Processing
 Peer-2-Peer Architectures and Systems

11 Meta-Ranking

 Integrated vs Isolated Methods


 Interleaving
 Voting

12 Web Search

 History of Web
 Indexing
 Spidering/Crawling
 Link Analysis (HITS, PageRank)

13 User Interfaces and Visualization


14 Link Analysis
 Ranking the web frontier
 The WebGraph framework I: Compression techniques
 Extrapolation methods for accelerating PageRank computations
 Searching the workplace web

15 Crawling and near-duplicate pages

 Mercator: A scalable, extensible web crawler.


 A standard for robot exclusion

16 Search applications
Introduce modern applications in search systems, including recommendation,
personalization, and online advertising, if time allows.
Final Exam

You might also like