Unit-3 Irs
Unit-3 Irs
U N I T- 3 : S Y L L A B U S
Natural language
Concept
This has the advantage of being able to use the developed formal theory of
Query
“What is java” Data Base
Doc 1
Doc 3
• which document is more specific to the user query. Doc 7
• Rank them according to their occurrence and will be
.
displayed by the system to the user. .
Doc m
P ROBABILISTIC W EIGHTING
Applications Of Probabilistic Statistical Indexing :
It is used in logic regression / reference model. This model
consist of special system called “ Model O System”.
• This system includes the following components,
(a) Number of words in the document (d).
(b) Number of words in the query(q).
• In addition to these there are attributes. Attributes are classified
in to the following types.
(i)Query Attributes: How many times a particular word has
occurred in query
(ii)Document Attributes: How many times a particular word has
occurred in Document.
set of attributes (v1 . . . . . V n) from the query
P ROBABILISTIC W EIGHTING
Log O is the logarithm of the odds (log odds) of relevance for term t k
which is present in document d j and query q i :
where O(R) is the odds that a document chosen at random from the
database is relevant to query Qi.
• The inverse logistic transformation is applied to obtain the
probability of relevance of a document to a query:
• where slope was set at .2 and the pivot was set to the average
number of unique terms occurring in the collection
INVERSE DOCUMENT FREQUENCY