0% found this document useful (0 votes)

63 views

Syllabus

Uploaded by

vipay78199

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Syllabus

Uploaded by

vipay78199

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

B TECH ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

ADT 453 INFORMATION Category L T P Credit

EXTRACTION AND
PEC 2 1 0 3
RETRIEVAL

Preamble:
Information Extraction and Retrieval is a course that focuses on the techniques and methodologies
for extracting relevant information from large volumes of unstructured data and retrieving it
efficiently. The course explores various approaches, algorithms, and tools used to process and
analyze textual data, enabling students to gain insights and make informed decisions. Topics
covered include text mining, information retrieval models, document indexing, query processing,
and evaluation techniques. Through this course, students will develop the skills necessary to extract
valuable information from diverse sources and build effective retrieval systems to support
information needs
Prerequisite: Basic knowledge in machine learning.

Mapping of course outcomes with program outcomes

CO1 Understand information retrieval fundamentals.(Cognitive Knowledge Level:

Understand)

CO2 Apply classic IR models And Analyze IR model effectiveness.(Cognitive Knowledge

Level: Apply)

CO3 Construct keyword-based queries and Apply Boolean query approaches(Cognitive

Knowledge Level: Apply)

CO4 Describe text and multimedia languages. Implement efficient indexing techniques and
search algorithms(Cognitive Knowledge Level: Apply)

CO5 Apply information extraction techniques and Evaluate chunking and

expansion(Cognitive Knowledge Level: Apply)

Mapping of course outcomes with program outcomes

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

CO1
CO2

CO3

CO4

CO5

Abstract POs defined by National Board of Accreditation

PO# Broad PO PO# Broad PO

PO1 Engineering Knowledge PO7 Environment and Sustainability

PO2 Problem Analysis PO8 Ethics

PO3 Design/Development of solutions PO9 Individual and team work

Conduct investigations of
PO4 complex problems PO10 Communication

PO5 Modern tool usage PO11 Project Management and Finance

PO6 The Engineer and Society PO12 Life long learning

Assessment Pattern

Continuous Assessment Tests

Bloom’s End Semester Examination
Category Marks (%)
Test 1 (%) Test 2 (%)

Remember

Understand 30 30 30

Apply 70 70 70

Analyze

Evaluate
Create

\
Mark Distribution

Total Marks CIE Marks ESE Marks ESE Duration

150 50 100 3

Continuous Internal Evaluation Pattern:

Attendance 10 marks
Continuous Assessment Tests(Average of Internal Tests1&2) 25 marks

Continuous Assessment Assignment 15 marks

Internal Examination Pattern

Each of the two internal examinations has to be conducted out of 50 marks. First series test shall be preferably
conducted after completing the first half of the syllabus and the second series test shall be preferably
conducted after completing remaining part of the syllabus. There will be two parts: Part A and Part B. Part
A contains 5 questions (preferably, 2 questions each from the completed modules and 1 question from the
partly completed module), having 3 marks for each question adding up to 15 marks for part A. Students
should answer all questions from Part A. Part B contains 7 questions (preferably, 3 questions each from the
completed modules and 1 question from the partly completed module), each with 7 marks. Out of the 7
questions, a student should answer any5.

End Semester Examination Pattern:

There will be two parts; Part A and Part B. Part A contains 10 questions with 2 questions from each module,
having 3 marks for each question. Students should answer all questions. Part B contains 2 full questions
from each module of which student should answer any one. Each question can have maximum 2 sub-
divisions and carries 14 marks.

Syllabus
Module – 1 (Introduction and Basic Concepts)

Introduction: Information versus Data Retrieval, IR: Past, present, and future. Basic concepts: The
retrieval process, logical view of documents. Modeling: A Taxonomy of IR models, ad-hoc
retrieval and filtering

Module – 2 (Classic IR Models and Retrieval Evaluation)

Classic IR models, Alternative Set theoretic models, Alternative algebraic models, Alternative
probabilistic models, Structured text retrieval models, models for browsing. Retrieval evaluation:
Performance evaluation of IR: Recall and Precision, other measures
Module – 3 (Reference Collections and Query Languages)
Reference Collections such as TREC, CACM, and ISI data sets. Query Languages: Keyword based
queries, single word queries, context queries, Boolean Queries, Query protocols.
Module– 4 (Text and Multimedia Languages, Indexing, and Searching)
Text and Multimedia Languages and properties, Metadata, Text formats, Markup languages,
Mult imedia dat a format s, Text Operat ions -Document preprocessing, Document
Clust ering, Text Compression,Comparing text compression techniques. Indexing and
searching -Inverted files, other indices for text, Sequent ial se arching-Brut e force,
knut h morris pratt, Pattern mat ching -string matching allowing errors.

Module 5 (Web based Information Extraction)

Web search basics - Background and history , Web characteristics, Advertising as the economic
model, The search user experience, Index size and estimation, Near-duplicates and shingling
Web crawling and indexes – Crawling, Distributing indexes, Connectivity servers
Link analysis - The Web as a graph, PageRank
Text Book

1. An Introduction to Information Retrieval, Christopher D. Manning,Prabhakar

Raghavan,Hinrich Schütze, Cambridge University Press
2. R. Baeza-Yates and B. R. Neto: Modern Information Retrieval:, Pearson Education, 2004
Reference Books
1. C.J. van Rijsbergen: Information Retrieval, Butterworths.
2. Introduction to Information Retrieval: Christopher D. Manning, Raghavan, and Schutze. 2000.
3. Information Retrieval: Algorithms and Heuristics (The Information Retrieval Series:2nd
Edition): David A. Grossman and Ophir Frieder.

Course Level Assessment Questions

Course Outcome1 (CO1):
1. What are the key differences between information retrieval and data retrieval? Provide
examples to illustrate their distinctions.
2. Discuss the evolution of information retrieval over time.
Course Outcome 2(CO2):
1. Compare and contrast the strengths and limitations of set-theoretic and probabilistic IR
models, and discuss real-world scenarios where one model may outperform the other.

2. Let X t be a random variable indicating whether the term t appears in a document. Suppose
we have | R | relevant documents in the document collection and that Xt = 1 in s of the
documents. Take the observed data to be just these observations of X t for each document
in R. Show that the MLE for the parameter p t = P ( Xt = 1 | R = 1, ~ q ) , that is, the value
for p t which maximizes the probability of the observed data, is p t = s/ | R | .
3. What is the relationship between the value of F1 and the break-even point?

Course Outcome 3(CO3):

1. Construct a Boolean query that retrieves documents containing the words "machine learning"
and "classification" but excludes any documents with the word "neural networks" present.
2. Explain the significance of reference collections in information retrieval research, and describe
the characteristics and importance of well-known collections like TREC and CACM.

Course Outcome 4(CO4): .

1. Describe index compression techniques?
2. How can clustering classified using statistical techniques.? Describe in detail.

Course Outcome 5(CO5):

1. Define web search and web search engine.
2. Explain crawling and types of crawling?

Model Question Paper

QP CODE:

Reg No: _______________

Name: _________________ PAGES : 4

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY

EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR

Course Code: ADT 453

Course Name: INFORMATION EXTRACTION AND RETRIEVAL

Max.Marks:100 Duration: 3 Hours

PART A

Answer All Questions. Each Question Carries 3 Marks

1. Give the historical view of Information Retrieval.

2. What are the components of IR?

3. Why the Classic IR might lead to poor retrieval?

4. What is the relationship between the value of F1 and the break-even point?

5. Explain the concept of a Boolean query. Discuss the advantages and

limitations of using sequential searching for information retrieval.

6. List out the query protocols.

7. Write notes on parallel inverted files.

8. What are the desirable properties of clustering algorithm

9. What are the basic rules for Web crawler operation

10. Define web search and web search engine.

(10x3=30)

Part B
(Answer any one question from each module. Each question carries 14 Marks)

11. (a) Explain the Information Retrieval in detail (7)

(b) Explain the influence of AI in information retrieval (7)

12. (a) Discuss the evolution of information retrieval over time. (7)

(b) What are the key differences between information retrieval and data (7)
retrieval? Provide examples to illustrate their distinctions.

13. (a) Compare and contrast the strengths and limitations of set-theoretic and (8)
probabilistic IR models, and discuss real-world scenarios where one
model may outperform the other.

(b) How can you find similarity between doc and query in probabilistic principle (6)
Using Bayes’ rule?

14. (a) Explain in detail about vector-space retrieval models with an example (7)

(b) Write the formal characterization of IR Models (7)

15. (a) Construct a Boolean query that retrieves documents containing the words (6)
"machine learning" and "classification" but excludes any documents with
the word "neural networks" present.

(b) Explain keyword-based query in detail (8)

16. (a) Explain the significance of reference collections in information retrieval (14)
research, and describe the characteristics and importance of well-known
collections like TREC and CACM.

17. (a) How can clustering classified using statistical techniques.? Describe in detail. (7)

(b) Discuss Brute force algorithm. (7)

OR
18. (a) Describe Text compression techniques? (6)

(b) Explain knuth morris pratt algorithm (8)

19. (a) What are the benefits of distributing Web search indexes? Explain the (7)
challenges and solutions for distributing indexes in a scalable and fault-
tolerant way.

(b) Explain crawling and types of crawling? (7)

20. (a) Briefly explain web search architectures? (9)

(b) Explain page rank (5)

Teaching Plan

No. of
Lecture
No Contents Hours
(35 hrs)
Module-1(Introduction) (4 hours)

1.1 Information versus Data Retrieval, IR: Past, present, and future. 1 hour

1.2 Basic concepts: The retrieval process, logical view of documents. 1 hour

1.3 Modeling: A Taxonomy of IR models 1 hour

1.4 Ad-hoc retrieval and filtering 1 hour

Module-2 (IR Models and Retrieval Evaluation) (10 hours)

2.1 Classic IR models 2 hour

2.2 Alternative set theoretic models 1 hour

2.3 Alternative algebraic models 2 hour

2.4 Alternative probabilistic models 2 hour

2.5 Structured text retrieval models 1 hour

2.6 Models for browsing 1 hour

2.7 Retrieval evaluation: Performance evaluation of IR: Recall and Precision, 1 hour
other measures
Module-3 (Reference Collections and Query Languages) (5 hours)
3.1 Reference Collections such as TREC, CACM, and ISI data sets. 2 hour
Query Languages: Keyword based queries, single word queries, context
3.2 2 hour
queries, Boolean Queries
3.3 Query protocols 1 hour
Module-4 (Text and Multimedia Languages, Indexing, and Searching) (9 hours)
Text and Multimedia Languages and properties- Metadata, Text formats,
4.1 2 hour
Markup languages, Mult imedia data format s
4.2 Text Operat ions-Document preprocessing, Document Clust ering, 2 hour
4.3 Text Compression,Comparing t ext compression t echniques. 2 hour
4.4 Indexing and searching -Inverted files, ot her indices for t ext, 1 hour
4.5 Sequent ial searching -Brute force, knut h morris pratt 1 hour
4.6 Pattern mat ching-String mat ching allowing errors 1 hour
Module-5 (Fuzzy Applications) (7 hours)

5.1 Web search basics - Background and history , Web characteristics, Advertising 1 hour
as the economic model
The search user experience, Index size and estimation, Near-duplicates and
5.2 2 hour
shingling
Web crawling and indexes – Crawling, Distributing indexes, Connectivity
5.3 2 hour
servers
5.4 Link analysis - The Web as a graph, PageRank 2 hour

1.1 Joint Letter Request Company and Applicant For 9 (G) PWP
No ratings yet
1.1 Joint Letter Request Company and Applicant For 9 (G) PWP
2 pages
Boom and Crash (Pip Lord)
100% (1)
Boom and Crash (Pip Lord)
14 pages
CST466 Datamining Syllabus
No ratings yet
CST466 Datamining Syllabus
13 pages
syllabus-1
No ratings yet
syllabus-1
13 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
10 pages
Computer Science and Engineering s7 & s8
No ratings yet
Computer Science and Engineering s7 & s8
440 pages
CST201 - KQB KtuQbank
No ratings yet
CST201 - KQB KtuQbank
11 pages
CONCEPTS_OF_MACHINE_LEARNING [MINOR]
No ratings yet
CONCEPTS_OF_MACHINE_LEARNING [MINOR]
14 pages
syllabus INTRODUCTION TO DEEP LEARNING
No ratings yet
syllabus INTRODUCTION TO DEEP LEARNING
11 pages
CST 445 Python For Engineers - Open Elective
50% (2)
CST 445 Python For Engineers - Open Elective
10 pages
CONCEPTS IN MACHINE LEARNING-Ktunotes.in
No ratings yet
CONCEPTS IN MACHINE LEARNING-Ktunotes.in
14 pages
3 Object Oriented Modeling and Design
No ratings yet
3 Object Oriented Modeling and Design
9 pages
Database Management
No ratings yet
Database Management
15 pages
CST204 - Ktu Qbank
No ratings yet
CST204 - Ktu Qbank
15 pages
Ait401 DL Syllubus
100% (1)
Ait401 DL Syllubus
13 pages
Concepts in Deep Learning
No ratings yet
Concepts in Deep Learning
14 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
SOFT COMPUTING
No ratings yet
SOFT COMPUTING
12 pages
Syllabus PDF
No ratings yet
Syllabus PDF
252 pages
cse1syll (2)
No ratings yet
cse1syll (2)
17 pages
University of Kerala: Syllabus FOR Semester Iii&Iv
No ratings yet
University of Kerala: Syllabus FOR Semester Iii&Iv
199 pages
OS Syllabus
No ratings yet
OS Syllabus
10 pages
3rd & 4th Sem MCA Syllabus Updated
No ratings yet
3rd & 4th Sem MCA Syllabus Updated
98 pages
Screenshot 2023-01-31 at 12.37.26 PM
No ratings yet
Screenshot 2023-01-31 at 12.37.26 PM
11 pages
Distributed Computing
No ratings yet
Distributed Computing
11 pages
Computer Science and Engineering - 2019 Scheme s3 Syllabus - Ktustudents - in
No ratings yet
Computer Science and Engineering - 2019 Scheme s3 Syllabus - Ktustudents - in
94 pages
Information Technology s7 & s8
No ratings yet
Information Technology s7 & s8
317 pages
syllabus
No ratings yet
syllabus
13 pages
Syllabus TE CSD
No ratings yet
Syllabus TE CSD
84 pages
Course Description-FALL-2024 (Programming for Artificial Intelligence).Docx
No ratings yet
Course Description-FALL-2024 (Programming for Artificial Intelligence).Docx
6 pages
CDP Finalized Format AI
No ratings yet
CDP Finalized Format AI
16 pages
SECP2043 Data Structures and Algorithms
No ratings yet
SECP2043 Data Structures and Algorithms
4 pages
CSE 201 DS Course Handout
No ratings yet
CSE 201 DS Course Handout
11 pages
CST206 - Ktu Qbank
No ratings yet
CST206 - Ktu Qbank
10 pages
COMPUTER-NETWORKS-CourseFile 28 11 2022
No ratings yet
COMPUTER-NETWORKS-CourseFile 28 11 2022
10 pages
III Sem Syllabus RNSIT New
No ratings yet
III Sem Syllabus RNSIT New
19 pages
Syllabus-2
No ratings yet
Syllabus-2
11 pages
BCS-352 COA Lab Manual Updated
No ratings yet
BCS-352 COA Lab Manual Updated
55 pages
mca1_syllabus
No ratings yet
mca1_syllabus
23 pages
CBCS - MTech - Design of Mechanical Equipments - Syllabus 271218
No ratings yet
CBCS - MTech - Design of Mechanical Equipments - Syllabus 271218
106 pages
MCA Syllabus - 1st Sem PDF
No ratings yet
MCA Syllabus - 1st Sem PDF
32 pages
Computer Science and Engineering
No ratings yet
Computer Science and Engineering
34 pages
21AML543 - Fundamentals of Data Science
No ratings yet
21AML543 - Fundamentals of Data Science
4 pages
MCASyll 31
No ratings yet
MCASyll 31
3 pages
Lecture 0
No ratings yet
Lecture 0
32 pages
Lecture 0
No ratings yet
Lecture 0
32 pages
New Syllabus
No ratings yet
New Syllabus
4 pages
Web Programming
No ratings yet
Web Programming
13 pages
KTU S7 Elective: CST433 Security in Computing
No ratings yet
KTU S7 Elective: CST433 Security in Computing
8 pages
CG Syllabus
No ratings yet
CG Syllabus
12 pages
Csesyll 2022
No ratings yet
Csesyll 2022
55 pages
2024-MCA_Final
No ratings yet
2024-MCA_Final
48 pages
Computer Science and Engineering
No ratings yet
Computer Science and Engineering
247 pages
Computer Science and Engineering
No ratings yet
Computer Science and Engineering
235 pages
AI Syllbus
No ratings yet
AI Syllbus
5 pages
Python 4 ML
No ratings yet
Python 4 ML
5 pages
CSE303 CourseOutline Spring2024 IUB
No ratings yet
CSE303 CourseOutline Spring2024 IUB
6 pages
AN-IM707 Social Network Analytics PGDM Batch 2020-22
No ratings yet
AN-IM707 Social Network Analytics PGDM Batch 2020-22
6 pages
MCA 3rd Sem Syllabus
No ratings yet
MCA 3rd Sem Syllabus
79 pages
Syllabus
No ratings yet
Syllabus
8 pages
Learning Assessment Techniques: A Handbook for College Faculty
From Everand
Learning Assessment Techniques: A Handbook for College Faculty
Elizabeth F. Barkley
No ratings yet
MCSD Certification Toolkit (Exam 70-483): Programming in C#
From Everand
MCSD Certification Toolkit (Exam 70-483): Programming in C#
Rod Stephens
3/5 (2)
General Brochure HESS AAC Systems English
No ratings yet
General Brochure HESS AAC Systems English
9 pages
Oculo Orbital Trauma: MDCT Findings
No ratings yet
Oculo Orbital Trauma: MDCT Findings
32 pages
TIB-722-GB-0711 2-Way Control Valves With Pneum - Actuator PDF
No ratings yet
TIB-722-GB-0711 2-Way Control Valves With Pneum - Actuator PDF
58 pages
Boardingpass 2
No ratings yet
Boardingpass 2
1 page
Cef Console
No ratings yet
Cef Console
15 pages
Bypassing Anti-Analysis of Commercial Protector Methods Using DBI Tools
No ratings yet
Bypassing Anti-Analysis of Commercial Protector Methods Using DBI Tools
19 pages
Hernandez Pajares Podcast Formative Assessment
No ratings yet
Hernandez Pajares Podcast Formative Assessment
5 pages
Coordinating Conjunctions Exercise
No ratings yet
Coordinating Conjunctions Exercise
5 pages
Valve and Accessory
No ratings yet
Valve and Accessory
3 pages
Record of Wortenia War - Volume 25
No ratings yet
Record of Wortenia War - Volume 25
153 pages
Phase 8 Coverage: Coverpoint
No ratings yet
Phase 8 Coverage: Coverpoint
4 pages
Speech acts - Semantics
No ratings yet
Speech acts - Semantics
10 pages
Unit 7 Supplementary Resources Overview: Purposes
No ratings yet
Unit 7 Supplementary Resources Overview: Purposes
2 pages
Comprehensive_Viva_Questions_and_Answers_Corrected
No ratings yet
Comprehensive_Viva_Questions_and_Answers_Corrected
3 pages
New Excel File
No ratings yet
New Excel File
49 pages
Loopback ADDRESSING
No ratings yet
Loopback ADDRESSING
6 pages
Don't Go (蝴蝶少女) (Chinese Version) ": (Romanized:)
No ratings yet
Don't Go (蝴蝶少女) (Chinese Version) ": (Romanized:)
2 pages
Kumpulan Shema Diagram Laptop
No ratings yet
Kumpulan Shema Diagram Laptop
105 pages
Torres-Madrid V Feb Mitsui
No ratings yet
Torres-Madrid V Feb Mitsui
6 pages
Logical Quant Practice 6
No ratings yet
Logical Quant Practice 6
2 pages
Asdm 717 VPN Config
No ratings yet
Asdm 717 VPN Config
242 pages
F900713008 Technical Specification - Light Rev 2013-03-19
No ratings yet
F900713008 Technical Specification - Light Rev 2013-03-19
47 pages
Fema P440a
No ratings yet
Fema P440a
19 pages
Proposal Trial Colour Sorter
No ratings yet
Proposal Trial Colour Sorter
21 pages
Sathyabama: Packet Sniffing Using Python in Kali Linux
No ratings yet
Sathyabama: Packet Sniffing Using Python in Kali Linux
19 pages
Using Homophones Correctly: 200 Administration Bldg. 651-603-6233
No ratings yet
Using Homophones Correctly: 200 Administration Bldg. 651-603-6233
3 pages
Finance Assignment 2
100% (1)
Finance Assignment 2
3 pages
Councillors Guide To Urban Design CABE
No ratings yet
Councillors Guide To Urban Design CABE
15 pages