0% found this document useful (0 votes)

2 views

unit8

The document provides an overview of search engines, detailing their characteristics, functionality, and the ranking of web pages. It explains the differences between web search and traditional information retrieval, the components of search engines, and the processes of web crawling, indexing, and searching. Additionally, it discusses the goals of web search, the quality of search results, and the page rank algorithm.

Uploaded by

Srizan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

unit8

Uploaded by

Srizan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Unit 8: Search Engines LH 3

Presented By : Tekendra Nath Yogi

[email protected]
College Of Applied Business And Technology
Contd…
• Outline:
– 8.1 Characteristics of search engine

– 8.2 Search Engine functionality

– 8.3 Ranking of Web pages

7/18/2019 By: Tekendra Nath Yogi 2

Introduction
• Most libraries have a relatively small collection of documents and a
catalogue to search documents.

• The web on the other hand is a very large collection of documents.

• The search engines allow a user to carry out the task of searching the
web for information.

• The Google dominates the search engine market(about 75%).

7/18/2019 By:Tekendra Nath Yogi 3

Differences between web search and information retrieval

• A web search is very different than a normal information

retrieval search of a document because of the following factors:

– Bulk:

• web is much larger than any set of documents used by IR.

– Diversity:

• web pages may contains text, tables, image, video, audio

instead of only text.

– Growth:

• exponential growth of web.

7/18/2019 By:Tekendra Nath Yogi 4

Contd..
– Dynamic:
• web changes significantly with time but text does not.

– Demanding Users:
• users demand immediate results.

– Quality of document:
• Text documents are usually of high quality but web documents may not.

– Hyperlinks:
• very important components of web documents

– Queries:
• web queries are short and ambiguous.

7/18/2019 By:Tekendra Nath Yogi 5

Search engine
• A search engine is defined as program that searches for
documents for specified keyword and returns a list of the
documents where the keywords are found.

• A search engine consists of the following main components:

– Crawler(spider)

– Indexer

– Search engine user interface

7/18/2019 By:Tekendra Nath Yogi 6

Contd…
• A typical search engine architecture is as shown in figure below

7/18/2019 By:Tekendra Nath Yogi 7

Contd..

• How search engine works?

A search engine operates in the following order:

1. Web crawling

2. Indexing

3. Searching

7/18/2019 By:Tekendra Nath Yogi 8

Contd..
• Web Crawling:

– Search engine has a huge databases of web pages . Such databases

are built and updated automatically by the web crawler.

– The web crawler performs web crawling as follows:

• The crawler begins with one or more URLs that constitute a
URL set.
• It picks a URL from this URL set, and then fetches the web
page at that URL.
• The fetched page is then parsed to extract both the text and the
links from the page.
• The extracted links (URLs) are then added to a URL set.
• The extracted text is fed to a text indexer.
7/18/2019 By:Tekendra Nath Yogi 9
Contd..
• Indexing:
– The indexer module of the search engine is responsible for
indexing the extracted text supplied by the web crawler.

– Most commonly used indexing is the inverted indexing

7/18/2019 By:Tekendra Nath Yogi 10

Contd..

7/18/2019 By:Tekendra Nath Yogi 11

Contd..
• Searching:

– When a user enters a query to the search engine, user is not

searching the entire web. Instead user is only searching the
database that has been compiled by the search engine.

– The user’s query is parsed into the words by the query parser.

– Such parsed words are matched with the words in the inverted
list of indexed documents.

– The matched list of documents are returned to the user with

ranking.
7/18/2019 By:Tekendra Nath Yogi 12
Characteristics of search engines
• Features a search engine must provide:

– Robustness:

• search engine must be distributed over large number of

machine to deal search engine failure due to the machine
failure.
– Politeness:
• Web servers have policies regulating the rate at which a search
engine can visit them. These politeness policies must be
respected.

7/18/2019 By:Tekendra Nath Yogi 13

Contd..
• Features a search engine should provide

– Distributed:

• The search should have the ability to execute in a

distributed fashion across multiple machines.

– Scalable:

• The search engine architecture should permit scaling up

the search rate by adding extra machines.

7/18/2019 By:Tekendra Nath Yogi 14

Contd..
• Performance and efficiency:

– The search system should make efficient use of various

system resources including processor, storage and network.

• Quality:

– Given that a significant fraction of all web pages are of poor utility
for serving user query needs, the search engine should be biased
towards fetching “useful” pages first.

7/18/2019 By:Tekendra Nath Yogi 15

Contd..
• Freshness:
– it should obtain fresh copies of previously fetched pages.

• Extensible:

– Crawlers should be designed to be extensible in many ways – to

cope with new data formats, new fetch protocols, and so on. This
demands that the crawler architecture be modular.

7/18/2019 By:Tekendra Nath Yogi 16

Problems with search using search engines
• Specifying query keywords can be challenging:

– Search result get affected by structure of the query phrase.

– Due to the nature of English language search result may get

affected e.g., current.

– Scarcity problem

7/18/2019 By:Tekendra Nath Yogi 17

Contd..
• Difficult for the search engine to be certain about what users
want.

– Some may be seeking destination While others may want only

a small number of highly relevant result.

• Diversity of search engine and web users

– Young to old

– A search engine is therefore attempting to meet the needs of a

diverse group of users.

7/18/2019 By:Tekendra Nath Yogi 18

The goals of web search
• Depending on the nature of search engine queries, the
information needs of user may be divided into three classes:

– Navigational

– Informational

– Transactional

7/18/2019 By:Tekendra Nath Yogi 19

Contd..
– Navigational:

• To reach a website that the user has in mind. The user may
know the site exists but or may have visited the site earlier but
does not know the site URL.

– Informational:

• To find a website that provides useful information about a topic

of interest. The user may not have a particular website in mind.

– Transactional:

• To go to a site to perform some kind of transaction. E.g., buy a

book
7/18/2019 By:Tekendra Nath Yogi 20
Quality of search result
• The quality of search results from a search engine ideally should
satisfy the following requirements:
– Precision:

• precision indicates what percentage of documents retrieved are

relevant?

• So , only relevant documents should be returned.

– Recall:

• means what percentage of relevant documents is retrieved from

total relevant documents in the web

• So, all relevant document should be returned

7/18/2019 By:Tekendra Nath Yogi 21
Contd..
• Ranking:

– A ranking of the documents providing some indication of the

relative relevance of the results should be returned.

• First screen:

– The first page of results should include the most relevant

results.

• Speed:

– Results should be provided quickly.

7/18/2019 By:Tekendra Nath Yogi 22

Search engine functionality
• A search engine is a complex collection of software modules. A
search engines carries out a variety of tasks:
– Collecting information

– Evaluating and categorizing information

– Creating a database and creating indexes

– Computing ranks of the web documents

– Checking queries and executing them

– Presenting results

– Profiling the users

7/18/2019 By:Tekendra Nath Yogi 23

Contd..
• Collecting information:

– A search engine would normally collect web pages or information

about them by web crawling.

• Evaluating and categorizing information:

– search engine evaluates the pages before submission and categorize

the information.

• Creating a database and creating indexes:

– The information collected needs to be stored either in a database

or some kind of file system. Indexes must be created so that the
information may be searched efficiently.
7/18/2019 By:Tekendra Nath Yogi 24
Contd..
• Computing ranks of the web documents: rank the web pages before
returning as a response to the user queries.

• Checking queries and executing them: queries posed by users need

to be checked , for example, for spelling errors. Once checked, a query
is executed by searching the search engine database.

• Presenting results: search engine determine what results to present

and how to display them

• Profiling the user: To improve the search performance the search

engines carry out user profiling that deals with the way users use
search engines.
7/18/2019 By:Tekendra Nath Yogi 25
Page Ranking
• The web consists of a huge number of documents that have been
published without any quality control.

• The page ranking is a method for determining the relative

importance and quality of the page for a given query.

• The most well known ranking algorithm is the page rank

algorithm.

7/18/2019 By:Tekendra Nath Yogi 26

Contd..
• Page rank algorithm:

– Was developed by Larry page at Stanford university.

– A hyperlink to a page counts as a vote of support .

• A page that is linked to by many pages receives a high rank and if

there is no links to a web page there is no support for that page so, get

low rank.

7/18/2019 By:Tekendra Nath Yogi 27

Contd..
– Assigns to every node in the web graph a numerical score

between 0 and 1 to each element of hyperlinked set of

documents.

– The rank value indicates the importance of a particular page.

• A page rank of 0.5 means there is a 50% chance that a person clicking

on a random link will be directed to the document with 0.5 page rank.

7/18/2019 By:Tekendra Nath Yogi 28

Contd..
• Algorithm with illustrative example:
– Assume a small universe of four web pages A, B, C and D.

– The initial approximation of Page Rank would be evenly divided between

the four documents.

– Hence each document would begin with an estimated Page Rank of 0.25.

– If pages B, C and D each only link to A, they would each confer 0.25 page
rank to A.

– i.e. PR(A) = PR(B) + PR(C) + PR(D) = 0.75

7/18/2019 By:Tekendra Nath Yogi 29

Contd..
• Suppose that page B has link to page C as well as to page A, while pages D has
links to all three pages and page C has link to A.

• The value of link votes is divided among all the outbound links on the page.
• Thus B gives vote worth 0.125 to page A and a vote 0.125 to page C.
• Similarly, D’s page rank is 0.083 (approximately)
• i.e. PR(A) = PR(B)/2 + PR(C)/1 + PR(D)/3

• Where L(page) = number of outbound links

• Bm = set of all pages link to page m

7/18/2019 By:Tekendra Nath Yogi 30

Home work
• What is search engine? Explain the various components of search engine
architecture.

• What is the role of crawler and indexer?

• Explain the different search engine functionality.

• What are the primary goals of web search?

• Describe the page rank algorithm. Using an example, show how it works.

• How is web search different than text retrieval?

7/18/2019 By:Tekendra Nath Yogi 31

Thank You !

7/18/2019 By: Tekendra Nath Yogi 32

NIOS CLASS 12 Data Entry Operations Practical File - 2
100% (1)
NIOS CLASS 12 Data Entry Operations Practical File - 2
22 pages
Types of Search Engines and How It Works
100% (2)
Types of Search Engines and How It Works
42 pages
How Do Search Engines Work
No ratings yet
How Do Search Engines Work
3 pages
ASSIGNMENT 3 DM
No ratings yet
ASSIGNMENT 3 DM
12 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
4 pages
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
No ratings yet
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
10 pages
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
No ratings yet
Working of Search Engines: Avinash Kumar Widhani, Ankit Tripathi and Rohit Sharma Lnmiit
13 pages
Search Engine Student Documents
No ratings yet
Search Engine Student Documents
6 pages
As3 DM
No ratings yet
As3 DM
9 pages
Computer - Search Engines
No ratings yet
Computer - Search Engines
10 pages
Jaff Seminar
No ratings yet
Jaff Seminar
31 pages
Assignment 3 of DM
No ratings yet
Assignment 3 of DM
7 pages
ICT Module 4
No ratings yet
ICT Module 4
13 pages
UNIT-1
No ratings yet
UNIT-1
47 pages
UNIT 3 Notes
No ratings yet
UNIT 3 Notes
32 pages
Unit 8 - Search Engines
No ratings yet
Unit 8 - Search Engines
8 pages
Web Technologies Unit-III
No ratings yet
Web Technologies Unit-III
11 pages
Social Media
No ratings yet
Social Media
10 pages
Search Engine: Amit Kamath Ancy Alphonso
No ratings yet
Search Engine: Amit Kamath Ancy Alphonso
22 pages
Darknet Report
No ratings yet
Darknet Report
27 pages
Search Engine
No ratings yet
Search Engine
20 pages
SEARCH ENGINE
No ratings yet
SEARCH ENGINE
15 pages
Search Tools: Presented By: ISHA
No ratings yet
Search Tools: Presented By: ISHA
22 pages
SPPM 1002 Web Searching
No ratings yet
SPPM 1002 Web Searching
12 pages
22761A05E9 - CaseStudy
No ratings yet
22761A05E9 - CaseStudy
9 pages
005-001-000-024 Search Engines
No ratings yet
005-001-000-024 Search Engines
11 pages
Prashant Mathur Neha Gupta Monu K. Verma Mohd. Shoaib
No ratings yet
Prashant Mathur Neha Gupta Monu K. Verma Mohd. Shoaib
31 pages
Ismayilova Fatime MATH2201 B
No ratings yet
Ismayilova Fatime MATH2201 B
11 pages
Search Tools and Their Components
No ratings yet
Search Tools and Their Components
7 pages
SEO Book
No ratings yet
SEO Book
32 pages
Search Engine
100% (1)
Search Engine
22 pages
Search Engine
No ratings yet
Search Engine
15 pages
Digital Marketing 2
No ratings yet
Digital Marketing 2
52 pages
Lect 1 IRIntroduction
No ratings yet
Lect 1 IRIntroduction
59 pages
Search Engines-UNIT-II
No ratings yet
Search Engines-UNIT-II
4 pages
Search Engine Optimization - Using Data Mining Approach
No ratings yet
Search Engine Optimization - Using Data Mining Approach
5 pages
An_Overview_of_Search_Engine_Optimization
No ratings yet
An_Overview_of_Search_Engine_Optimization
6 pages
Pre 5 Midterm Reviewer Nerfed
No ratings yet
Pre 5 Midterm Reviewer Nerfed
6 pages
Social Media
No ratings yet
Social Media
10 pages
Chapter 1 Search Engine 1. Objective
No ratings yet
Chapter 1 Search Engine 1. Objective
63 pages
BA4029 SOCIAL MEDIA WEB ANALYTICS unit 5
No ratings yet
BA4029 SOCIAL MEDIA WEB ANALYTICS unit 5
23 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
Module 1 - Search Engine Basics
No ratings yet
Module 1 - Search Engine Basics
79 pages
Working of Webb Search Engines
No ratings yet
Working of Webb Search Engines
29 pages
Search Engine Powerpoint
No ratings yet
Search Engine Powerpoint
2 pages
Comsats Institute of Information TECHNOLOGY Islamabad
No ratings yet
Comsats Institute of Information TECHNOLOGY Islamabad
11 pages
Unit 5 - Data Science & Big Data - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Science & Big Data - WWW - Rgpvnotes.in
17 pages
Search ENgine
No ratings yet
Search ENgine
28 pages
Search Engine Technology Assignment
No ratings yet
Search Engine Technology Assignment
6 pages
Query and Reporting Tools: Search Engine Architecture
No ratings yet
Query and Reporting Tools: Search Engine Architecture
5 pages
Unit 4
No ratings yet
Unit 4
47 pages
Yogvardhan (A3) DM
No ratings yet
Yogvardhan (A3) DM
9 pages
WEB BROWSERS+search Engine
No ratings yet
WEB BROWSERS+search Engine
10 pages
Information Retrieval and XML Data: ADBMS Unit-4
No ratings yet
Information Retrieval and XML Data: ADBMS Unit-4
37 pages
Meta Search Engines
No ratings yet
Meta Search Engines
48 pages
Search Engine Comparison
No ratings yet
Search Engine Comparison
7 pages
IR Unit V Notes remaining
No ratings yet
IR Unit V Notes remaining
10 pages
Preparation
No ratings yet
Preparation
10 pages
Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Search Engine Marketing: A Guide for SEM Campaign Success
From Everand
Mastering Search Engine Marketing: A Guide for SEM Campaign Success
Rebecca Cox
No ratings yet
Seo Learning Guide
From Everand
Seo Learning Guide
ngencoband
No ratings yet
Thug2 PC Manual
No ratings yet
Thug2 PC Manual
68 pages
Siemens HiPath 4000
No ratings yet
Siemens HiPath 4000
6 pages
RCPIER User Manual
No ratings yet
RCPIER User Manual
204 pages
SC 370740 DW
No ratings yet
SC 370740 DW
1 page
Sublime Text - Wikipedia PDF
No ratings yet
Sublime Text - Wikipedia PDF
12 pages
Programming the Finite State Machine with 8 Bit PICs in Assembly and C 1st Edition Andrew Pratt - The ebook is available for online reading or easy download
100% (1)
Programming the Finite State Machine with 8 Bit PICs in Assembly and C 1st Edition Andrew Pratt - The ebook is available for online reading or easy download
71 pages
Online Marks Entry - 17 Internal PDF
No ratings yet
Online Marks Entry - 17 Internal PDF
2 pages
Calibration Procedure For TUC 6 BLMS PDF
No ratings yet
Calibration Procedure For TUC 6 BLMS PDF
2 pages
MKS SGEN_L Datasheet
No ratings yet
MKS SGEN_L Datasheet
49 pages
T80 Wheel Gamepad Modes
No ratings yet
T80 Wheel Gamepad Modes
40 pages
RAHPDESIGN - Google UX Internship - Entry Level UX Resume Template
No ratings yet
RAHPDESIGN - Google UX Internship - Entry Level UX Resume Template
1 page
REPORT
No ratings yet
REPORT
42 pages
Advantages and Disadvantages of PLC
100% (1)
Advantages and Disadvantages of PLC
3 pages
Digital Electronic Circuits Principles And Practices Shuqin Lou Chunling Yang China Science Publishing Media Ltd instant download
100% (1)
Digital Electronic Circuits Principles And Practices Shuqin Lou Chunling Yang China Science Publishing Media Ltd instant download
89 pages
WRG A Gwarg
No ratings yet
WRG A Gwarg
17 pages
Cisco Live 2018 SP
No ratings yet
Cisco Live 2018 SP
36 pages
W07E-EN-02 CP1L GettingStartedGuide
No ratings yet
W07E-EN-02 CP1L GettingStartedGuide
167 pages
Advantages of Layered Approach
No ratings yet
Advantages of Layered Approach
9 pages
Geode NZ File Formats
No ratings yet
Geode NZ File Formats
18 pages
Malla: Demystifying Real-World Large Language Model Integrated Malicious Services
No ratings yet
Malla: Demystifying Real-World Large Language Model Integrated Malicious Services
18 pages
Role of Delivery Offices:: Region - in
No ratings yet
Role of Delivery Offices:: Region - in
1 page
Finger Print Recognition
No ratings yet
Finger Print Recognition
56 pages
Assignment 1 File Reading, File Writing, C-String
No ratings yet
Assignment 1 File Reading, File Writing, C-String
3 pages
Screencast: Powerpoint 101: Everything You Need To Make A Basic Presentationby 17 Feb 2014
No ratings yet
Screencast: Powerpoint 101: Everything You Need To Make A Basic Presentationby 17 Feb 2014
13 pages
CP R80.20 PerformanceTuning AdminGuide
No ratings yet
CP R80.20 PerformanceTuning AdminGuide
330 pages
PPS Unit 1&2
No ratings yet
PPS Unit 1&2
55 pages
Technical clean up for BI – Saptechnicalguru.com
No ratings yet
Technical clean up for BI – Saptechnicalguru.com
4 pages
24 22101-Lza7016014 1uen BH
No ratings yet
24 22101-Lza7016014 1uen BH
14 pages
Sweta Kumari Resume
No ratings yet
Sweta Kumari Resume
1 page