SlideShare a Scribd company logo
UNIT-II
IV Year / VIII Semester
By
K.Karthick AP/CSE
KNCET.
KONGUNADU COLLEGE OF ENGINEERING AND
TECHNOLOGY
(Autonomous)
NAMAKKAL- TRICHY MAIN ROAD, THOTTIAM
DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING
CS8080 – Information Retrieval Techniques
Syllabus
MODELING AND RETRIEVAL EVALUATION
• Basic Retrieval Models
• An IR model governs how a document and a
query are represented and how the relevance
of a document to a user query is defined.
• There are Three main IR models:
– Boolean model
– Vector space model
– Probabilistic model
• Each term is associated with a weight.Given a
collection of documents D, let
• V = {t1, t2... t|V|} be the set of distinctive
terms in the collection, where ti is a term.
• The set V is usually called the vocabulary of
the collection, and |V| is its size,
• i.e., the number of terms in V.
• An IR model is a quadruple [D, Q, F, R(qi, dj)]
where
• 1. D is a set of logical views for the documents
in the collection
• 2. Q is a set of logical views for the user queries
• 3. F is a framework for modeling documents
and queries
• 4. R(qi, dj) is a ranking function
unit -4MODELING AND RETRIEVAL EVALUATION
Boolean Model
• The Boolean model is one of the earliest and
simplest information retrieval models.
• It uses the notion of exact matching to match
documents to the user query.
• Both the query and the retrieval are based on
Boolean algebra.
• In the Boolean model, documents and queries
are represented as sets of terms.
• That is, each term is only considered present
or absent in a document.
• Boolean Queries:
• Query terms are combined logically using the Boolean
operators AND, OR, and NOT, which have their usual
semantics in logic.
• Thus, a Boolean query has a precise semantics.
• For instance, the query, ((x AND y) AND (NOT z)) says that
a retrieved document must contain both the terms x and y
but not z.
• As another example, the query expression (x OR y) means
that at least one of these terms must be in each retrieved
document.
• Here, we assume that x, y and z are terms. In general, they
can be Boolean expressions themselves.
• Document Retrieval:
• Given a Boolean query, the system retrieves
every document that makes the query
logically true.
• Thus, the retrieval is based on the binary
decision criterion, i.e., a document is either
relevant or irrelevant. Intuitively, this is called
exact match.
• Most search engines support some limited
forms of Boolean retrieval using explicit
inclusion and exclusion operators.
• Drawbacks of the Boolean Model
• No ranking of the documents is provided
(absence of a grading scale)
• Information need has to be translated into a
Boolean expression, which most users find
awkward
• The Boolean queries formulated by the users
are most often too simplistic.
TF-IDF (Term Frequency/Inverse Document
Frequency) Weighting
• We assign to each term in a document a
weight for that term that depends on the
number of occurrences of the term in the
document.
• We would like to compute a score between a
query term t and a document d, based on the
weight of t in d. The simplest approach is to
assign the weight to be equal to the number of
occurrences of term t in document d.
• This weighting scheme is referred to as term
frequency and is denoted tft,d, with the
subscripts denoting the term and the
document in order.
• For a document d, the set of weights
determined by the tf weights above (or indeed
any weighting function that maps the number
of occurrences of t in d to a positive real
value) may be viewed as a quantitative digest
of that document.
• How is the document frequency df of a term
used to scale its weight? Denoting as usual the
total number of documents in a collection by
N, we define the inverse document frequency
(idf) of a term t as follows:
• idft = log
• Tf-idf weighting
• We now combine the definitions of term
frequency and inverse document frequency, to
produce a composite weight for each term in each
document.
• The tf-idf weighting scheme assigns to term t a
weight in document d given by
•
• tf-idft,d = tft,d ×idft.
• Document d is the sum, over all query terms,
of the number of times each of the query
terms occurs in d.
• We can refine this idea so that we add up not
the number of occurrences of each query
term t in d, but instead the tf-idf weight of
each term in d.
• Score (q, d) =
Cosine similarity
• Documents could be ranked by computing the distance between the
points representing the documents and the query.
• More commonly, a similarity measure is used (rather than a distance or
dissimilarity measure), so that the documents with the highest scores are
the most similar to the query.
• A number of similarity measures have been proposed and tested for this
purpose.
• The most successful of these is the cosine correlation similarity measure.
• The cosine correlation measures the cosine of the angle between the
query and the document vectors.
• When the vectors are normalized so that all documents and queries are
represented by vectors of equal length, the cosine of the angle between
two identical vectors will be 1 (the angle is zero), and for two vectors that
do not share any non-zero terms, the cosine will be 0.
• The cosine measure is defined as:
• The numerator of this measure is the sum of the products
of the term weights for the matching query and
document terms (known as the dot product or inner
product).
• The denominator normalizes this score by dividing by the
product of the lengths of the two vectors. There is no
theoretical reason why the cosine correlation should be
preferred to other similarity measures, but it does
perform somewhat better in evaluations of search quality.
Ad

More Related Content

Similar to unit -4MODELING AND RETRIEVAL EVALUATION (20)

Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weighting
Vaibhav Khanna
 
IR.pptx
IR.pptxIR.pptx
IR.pptx
MahamSajid4
 
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
onlmcq
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
 
Text Representation methods in Natural language processing
Text Representation methods in Natural language processingText Representation methods in Natural language processing
Text Representation methods in Natural language processing
NarendraChindanur
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
Habtamu100
 
Ir 08
Ir   08Ir   08
Ir 08
Mohammed Romi
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
The vector space model
The vector space modelThe vector space model
The vector space model
pkgosh
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
ankurpandeyinfo
 
Ir 09
Ir   09Ir   09
Ir 09
Mohammed Romi
 
Text mining
Text miningText mining
Text mining
Koshy Geoji
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
ankit_ppt
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomness
Vaibhav Khanna
 
Web search engines
Web search enginesWeb search engines
Web search engines
AbdusamadAbdukarimov2
 
Simple semantics in topic detection and tracking
Simple semantics in topic detection and trackingSimple semantics in topic detection and tracking
Simple semantics in topic detection and tracking
George Ang
 
vectorSpaceModelPeterBurden.ppt
vectorSpaceModelPeterBurden.pptvectorSpaceModelPeterBurden.ppt
vectorSpaceModelPeterBurden.ppt
pepe3059
 
Information retrieval 8 term weighting
Information retrieval 8 term weightingInformation retrieval 8 term weighting
Information retrieval 8 term weighting
Vaibhav Khanna
 
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...IRS-Lecture-Notes irsirs    IRS-Lecture-Notes irsirs   IRS-Lecture-Notes irsi...
IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsirs IRS-Lecture-Notes irsi...
onlmcq
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Sean Golliher
 
Text Representation methods in Natural language processing
Text Representation methods in Natural language processingText Representation methods in Natural language processing
Text Representation methods in Natural language processing
NarendraChindanur
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
Habtamu100
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
4-IR Models_new.ppt
4-IR Models_new.ppt4-IR Models_new.ppt
4-IR Models_new.ppt
BereketAraya
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
The vector space model
The vector space modelThe vector space model
The vector space model
pkgosh
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
ankurpandeyinfo
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
ankit_ppt
 
Information retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomnessInformation retrieval 20 divergence from randomness
Information retrieval 20 divergence from randomness
Vaibhav Khanna
 
Simple semantics in topic detection and tracking
Simple semantics in topic detection and trackingSimple semantics in topic detection and tracking
Simple semantics in topic detection and tracking
George Ang
 
vectorSpaceModelPeterBurden.ppt
vectorSpaceModelPeterBurden.pptvectorSpaceModelPeterBurden.ppt
vectorSpaceModelPeterBurden.ppt
pepe3059
 

More from karthiksmart21 (13)

unit 5 WEB RETRIEVAL AND WEB CRAWLING
unit 5    WEB RETRIEVAL AND WEB CRAWLINGunit 5    WEB RETRIEVAL AND WEB CRAWLING
unit 5 WEB RETRIEVAL AND WEB CRAWLING
karthiksmart21
 
unit 1 INTRODUCTION
unit 1                      INTRODUCTIONunit 1                      INTRODUCTION
unit 1 INTRODUCTION
karthiksmart21
 
UNIT V RECOMMENDER SYSTEM
UNIT V                RECOMMENDER SYSTEMUNIT V                RECOMMENDER SYSTEM
UNIT V RECOMMENDER SYSTEM
karthiksmart21
 
WEB TECHNOLOGY Unit-2.pptx
WEB TECHNOLOGY Unit-2.pptxWEB TECHNOLOGY Unit-2.pptx
WEB TECHNOLOGY Unit-2.pptx
karthiksmart21
 
WEB TECHNOLOGY Unit-3.pptx
WEB TECHNOLOGY Unit-3.pptxWEB TECHNOLOGY Unit-3.pptx
WEB TECHNOLOGY Unit-3.pptx
karthiksmart21
 
WEB TECHNOLOGY Unit-5.pptx
WEB TECHNOLOGY Unit-5.pptxWEB TECHNOLOGY Unit-5.pptx
WEB TECHNOLOGY Unit-5.pptx
karthiksmart21
 
WEB TECHNOLOGY Unit-4.pptx
WEB TECHNOLOGY Unit-4.pptxWEB TECHNOLOGY Unit-4.pptx
WEB TECHNOLOGY Unit-4.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 3.pptx
MOBILE COMPUTING Unit 3.pptxMOBILE COMPUTING Unit 3.pptx
MOBILE COMPUTING Unit 3.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 4.pptx
 MOBILE COMPUTING Unit 4.pptx MOBILE COMPUTING Unit 4.pptx
MOBILE COMPUTING Unit 4.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 2.pptx
MOBILE COMPUTING Unit 2.pptxMOBILE COMPUTING Unit 2.pptx
MOBILE COMPUTING Unit 2.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 1.pptx
MOBILE COMPUTING Unit 1.pptxMOBILE COMPUTING Unit 1.pptx
MOBILE COMPUTING Unit 1.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 5.pptx
MOBILE COMPUTING Unit 5.pptxMOBILE COMPUTING Unit 5.pptx
MOBILE COMPUTING Unit 5.pptx
karthiksmart21
 
Unit 1
Unit 1Unit 1
Unit 1
karthiksmart21
 
unit 5 WEB RETRIEVAL AND WEB CRAWLING
unit 5    WEB RETRIEVAL AND WEB CRAWLINGunit 5    WEB RETRIEVAL AND WEB CRAWLING
unit 5 WEB RETRIEVAL AND WEB CRAWLING
karthiksmart21
 
UNIT V RECOMMENDER SYSTEM
UNIT V                RECOMMENDER SYSTEMUNIT V                RECOMMENDER SYSTEM
UNIT V RECOMMENDER SYSTEM
karthiksmart21
 
WEB TECHNOLOGY Unit-2.pptx
WEB TECHNOLOGY Unit-2.pptxWEB TECHNOLOGY Unit-2.pptx
WEB TECHNOLOGY Unit-2.pptx
karthiksmart21
 
WEB TECHNOLOGY Unit-3.pptx
WEB TECHNOLOGY Unit-3.pptxWEB TECHNOLOGY Unit-3.pptx
WEB TECHNOLOGY Unit-3.pptx
karthiksmart21
 
WEB TECHNOLOGY Unit-5.pptx
WEB TECHNOLOGY Unit-5.pptxWEB TECHNOLOGY Unit-5.pptx
WEB TECHNOLOGY Unit-5.pptx
karthiksmart21
 
WEB TECHNOLOGY Unit-4.pptx
WEB TECHNOLOGY Unit-4.pptxWEB TECHNOLOGY Unit-4.pptx
WEB TECHNOLOGY Unit-4.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 3.pptx
MOBILE COMPUTING Unit 3.pptxMOBILE COMPUTING Unit 3.pptx
MOBILE COMPUTING Unit 3.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 4.pptx
 MOBILE COMPUTING Unit 4.pptx MOBILE COMPUTING Unit 4.pptx
MOBILE COMPUTING Unit 4.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 2.pptx
MOBILE COMPUTING Unit 2.pptxMOBILE COMPUTING Unit 2.pptx
MOBILE COMPUTING Unit 2.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 1.pptx
MOBILE COMPUTING Unit 1.pptxMOBILE COMPUTING Unit 1.pptx
MOBILE COMPUTING Unit 1.pptx
karthiksmart21
 
MOBILE COMPUTING Unit 5.pptx
MOBILE COMPUTING Unit 5.pptxMOBILE COMPUTING Unit 5.pptx
MOBILE COMPUTING Unit 5.pptx
karthiksmart21
 
Ad

Recently uploaded (20)

Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G..."Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...
Infopitaara
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
Raish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdfRaish Khanji GTU 8th sem Internship Report.pdf
Raish Khanji GTU 8th sem Internship Report.pdf
RaishKhanji
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Ad

unit -4MODELING AND RETRIEVAL EVALUATION

  • 1. UNIT-II IV Year / VIII Semester By K.Karthick AP/CSE KNCET. KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY (Autonomous) NAMAKKAL- TRICHY MAIN ROAD, THOTTIAM DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CS8080 – Information Retrieval Techniques
  • 3. MODELING AND RETRIEVAL EVALUATION • Basic Retrieval Models • An IR model governs how a document and a query are represented and how the relevance of a document to a user query is defined. • There are Three main IR models: – Boolean model – Vector space model – Probabilistic model
  • 4. • Each term is associated with a weight.Given a collection of documents D, let • V = {t1, t2... t|V|} be the set of distinctive terms in the collection, where ti is a term. • The set V is usually called the vocabulary of the collection, and |V| is its size, • i.e., the number of terms in V.
  • 5. • An IR model is a quadruple [D, Q, F, R(qi, dj)] where • 1. D is a set of logical views for the documents in the collection • 2. Q is a set of logical views for the user queries • 3. F is a framework for modeling documents and queries • 4. R(qi, dj) is a ranking function
  • 7. Boolean Model • The Boolean model is one of the earliest and simplest information retrieval models. • It uses the notion of exact matching to match documents to the user query. • Both the query and the retrieval are based on Boolean algebra.
  • 8. • In the Boolean model, documents and queries are represented as sets of terms. • That is, each term is only considered present or absent in a document.
  • 9. • Boolean Queries: • Query terms are combined logically using the Boolean operators AND, OR, and NOT, which have their usual semantics in logic. • Thus, a Boolean query has a precise semantics. • For instance, the query, ((x AND y) AND (NOT z)) says that a retrieved document must contain both the terms x and y but not z. • As another example, the query expression (x OR y) means that at least one of these terms must be in each retrieved document. • Here, we assume that x, y and z are terms. In general, they can be Boolean expressions themselves.
  • 10. • Document Retrieval: • Given a Boolean query, the system retrieves every document that makes the query logically true. • Thus, the retrieval is based on the binary decision criterion, i.e., a document is either relevant or irrelevant. Intuitively, this is called exact match. • Most search engines support some limited forms of Boolean retrieval using explicit inclusion and exclusion operators.
  • 11. • Drawbacks of the Boolean Model • No ranking of the documents is provided (absence of a grading scale) • Information need has to be translated into a Boolean expression, which most users find awkward • The Boolean queries formulated by the users are most often too simplistic.
  • 12. TF-IDF (Term Frequency/Inverse Document Frequency) Weighting • We assign to each term in a document a weight for that term that depends on the number of occurrences of the term in the document. • We would like to compute a score between a query term t and a document d, based on the weight of t in d. The simplest approach is to assign the weight to be equal to the number of occurrences of term t in document d.
  • 13. • This weighting scheme is referred to as term frequency and is denoted tft,d, with the subscripts denoting the term and the document in order. • For a document d, the set of weights determined by the tf weights above (or indeed any weighting function that maps the number of occurrences of t in d to a positive real value) may be viewed as a quantitative digest of that document.
  • 14. • How is the document frequency df of a term used to scale its weight? Denoting as usual the total number of documents in a collection by N, we define the inverse document frequency (idf) of a term t as follows: • idft = log
  • 15. • Tf-idf weighting • We now combine the definitions of term frequency and inverse document frequency, to produce a composite weight for each term in each document. • The tf-idf weighting scheme assigns to term t a weight in document d given by • • tf-idft,d = tft,d ×idft.
  • 16. • Document d is the sum, over all query terms, of the number of times each of the query terms occurs in d. • We can refine this idea so that we add up not the number of occurrences of each query term t in d, but instead the tf-idf weight of each term in d. • Score (q, d) =
  • 17. Cosine similarity • Documents could be ranked by computing the distance between the points representing the documents and the query. • More commonly, a similarity measure is used (rather than a distance or dissimilarity measure), so that the documents with the highest scores are the most similar to the query. • A number of similarity measures have been proposed and tested for this purpose. • The most successful of these is the cosine correlation similarity measure. • The cosine correlation measures the cosine of the angle between the query and the document vectors. • When the vectors are normalized so that all documents and queries are represented by vectors of equal length, the cosine of the angle between two identical vectors will be 1 (the angle is zero), and for two vectors that do not share any non-zero terms, the cosine will be 0.
  • 18. • The cosine measure is defined as: • The numerator of this measure is the sum of the products of the term weights for the matching query and document terms (known as the dot product or inner product). • The denominator normalizes this score by dividing by the product of the lengths of the two vectors. There is no theoretical reason why the cosine correlation should be preferred to other similarity measures, but it does perform somewhat better in evaluations of search quality.