0% found this document useful (0 votes)
165 views

Signature File Structure in Information Retrieval System

Signature File Structure in information retrieval system

Uploaded by

amit.sagu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views

Signature File Structure in Information Retrieval System

Signature File Structure in information retrieval system

Uploaded by

amit.sagu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Signature File Structure

Information Retrieval System

Amit Sagu
[email protected]
Subscribe Our YouTube Channel: www.youtube.com/@cssimplified51
Signature File Structure

In the context of Information Retrieval (IR) systems, a "signature file" is a method used for indexing
and searching text documents.

Doc -1

Doc-2
Query
Database
(Documents)
Doc-3

Doc-n
Signature File Structure is particularly useful for handling large collections of documents and supporting
efficient search queries.

Name: Amit Sagu


Name: Amit Sagu Address: ****
Address: **** Mobile: ****
Mobile: **** Email: *****
Email: *****
Each document in the database is represented by a "signature." A signature is a fixed-size bit string.

Signatures File

Doc 1- Signature Doc -1


1100100…

1100100… Doc 2- Signature Doc-2

Database
1100100… (Documents)
Doc 3- Signature Doc-3

1100100… Doc n- Signature Doc-n


Creation of Signatures

Hash Function is used for creating signatures

Hashing is a process that transforms input data of any size into a fixed-size value or key, typically a
string of characters or a numerical value.

Hash
Input: “This is String” Output: 0001000
Function
(Document) (Document Signature)
Example:

TEXT: “Computer Science graduate students study” = 1001 0111 1110 0110

WORD Signature

0001 0110 0000 0110


0110
Computer
Science 1001 0000 1110 0000 ( ORing ) 0100
graduate 1000 0101 0100 0010
students 0000 0111 1000 0100 0110
study 0000 0110 0110 0100

( ORing )

Signature 1001 0111 1110 0110


Example:

Searching
Or
Matching
Query Computer : 0001 0110 0000 0110 Doc -1 : 1001 0111 1110 0110

( Doc -1 : “Computer Science graduate students study”)

1001 0111 1110 0110


AND 0001 0110 0000 0110
0001 0110 0000 0110 Match Found

0000 0000 0000 0000 Match Not Found


Problem With Signature File Structure : False Positive

( Doc -1 : Computer Science graduate students study )


Computer : 0001 0110 0000 0110
1001 0111 1110 0110
( “Hard” )
AND 0001 0110 0000 0110

0001 0110 0000 0110 Match Found

False Positives: Due to the nature of bit pattern matching, signature files can generate false positives. This means
the system might identify documents as relevant to a query when they are not.

You might also like