unit8
unit8
• The search engines allow a user to carry out the task of searching the
web for information.
– Bulk:
– Diversity:
– Growth:
– Demanding Users:
• users demand immediate results.
– Quality of document:
• Text documents are usually of high quality but web documents may not.
– Hyperlinks:
• very important components of web documents
– Queries:
• web queries are short and ambiguous.
– Indexer
1. Web crawling
2. Indexing
3. Searching
– The user’s query is parsed into the words by the query parser.
– Such parsed words are matched with the words in the inverted
list of indexed documents.
– Robustness:
– Distributed:
– Scalable:
• Quality:
– Given that a significant fraction of all web pages are of poor utility
for serving user query needs, the search engine should be biased
towards fetching “useful” pages first.
• Extensible:
– Scarcity problem
– Young to old
– Navigational
– Informational
– Transactional
• To reach a website that the user has in mind. The user may
know the site exists but or may have visited the site earlier but
does not know the site URL.
– Informational:
– Transactional:
– Recall:
• First screen:
• Speed:
– Presenting results
there is no links to a web page there is no support for that page so, get
low rank.
documents.
on a random link will be directed to the document with 0.5 page rank.
– Hence each document would begin with an estimated Page Rank of 0.25.
– If pages B, C and D each only link to A, they would each confer 0.25 page
rank to A.
• The value of link votes is divided among all the outbound links on the page.
• Thus B gives vote worth 0.125 to page A and a vote 0.125 to page C.
• Similarly, D’s page rank is 0.083 (approximately)
• i.e. PR(A) = PR(B)/2 + PR(C)/1 + PR(D)/3
• Describe the page rank algorithm. Using an example, show how it works.