Web Search Iiitb
Web Search Iiitb
T.B. Rajashekar National Centre for Science Information Indian Institute of Science Bangalore - 560 012 (E-Mail: [email protected])
databases)
How the database is organised, record content, fields, search elements Indexing and query language, thesaurus, Boolean logic, truncation, etc. Our information need formulated as a search expression using the query language
T.B. Rajashekar November 2000 3
T.B. Rajashekar
November 2000
and services:
Education and research Entertainment Business and commerce Personal home pages
Estimated to contain over 1 billion indexable web pages Doubling each year Over 80 million web sites
T.B. Rajashekar
November 2000
vocabulary, unlike library catalogues or journal article indexes Impossible to reach all related pages/ sites directly Need to use intermediate, resource finding tools
T.B. Rajashekar
November 2000
catalogues Organised collection of descriptions and links to Internet sources Organisation: by subject categories (hierarchical); by resource type (patents, e-journals, institutes, etc.) Most use human experts for source selection, indexing and classification Some include reviews/ ratings of listed sites
T.B. Rajashekar
November 2000
T.B. Rajashekar
November 2000
Disadvantages: One needs to be aware of such directories/ guides May not be up-to-date May not be exhaustive Categories (subject hierarchy) varies across directories
T.B. Rajashekar
November 2000
11
T.B. Rajashekar
November 2000
13
search engines build a full-text index to web pages gathered from web sites and provide a keyword search interface to this index Spider programs periodically visit web sites and gather the web pages for indexing Also index web sites submitted by site developers A brief summary of the indexed web page is also prepared The index usually contains URLs, titles, headings, and other words from the HTML document
November 2000 14
T.B. Rajashekar
interface for entering the queries Support simple and advanced search interfaces Search results are returned in the form of a list of web sites matching the query Some key features supported:
Phrase searching ( double quotes) Boolean searching (AND, OR, NOT) Implied Boolean: Term inclusion (+), term exclusion (-)
T.B. Rajashekar
November 2000
15
T.B. Rajashekar
November 2000
16
T.B. Rajashekar
November 2000
17
WebCrawler (www.webcrawler.com)
Worldwide Web Worm (www.goto.com)
T.B. Rajashekar
November 2000
18
Best suited for complex keyword/ concept searches Control over search: search terms can be combined as required Searches can be limited to period of time, fields, source type,etc. Currency of information, made possible by regular addition by web spiders Exhaustive information can be retrieved (with lots of patience!)
Disadvantages:
Time consuming False positives Search engines vary in terms of search techniques/ syntax
T.B. Rajashekar
November 2000
22
T.B. Rajashekar
November 2000
23
T.B. Rajashekar
November 2000
24
LexiBot (www.completeplanet.com)
T.B. Rajashekar
November 2000
25
T.B. Rajashekar
November 2000
26
T.B. Rajashekar
November 2000
27
ProFusion (www.profusion.com)
T.B. Rajashekar
November 2000
28
T.B. Rajashekar
November 2000
29
T.B. Rajashekar
November 2000
30
3. Translate the search terms into search statements of the selected search engine
4. Perform search 5. Refine the search based on results 6. Visit the actual site(s) and save the information (using FileSave option of the browser)
T.B. Rajashekar November 2000 31
Use specific keywords, rare/unusual words are better than common ones
T.B. Rajashekar November 2000 32
Use More like this option, if supported by the search engine (e.g. Excite, Google)
T.B. Rajashekar
November 2000
33
Use the NOT operator to exclude unwanted pages (e.g.: biodata, resumes, courses)
Go through at least 5 pages of search results before giving up the scan
T.B. Rajashekar
November 2000
34
T.B. Rajashekar
November 2000
35
T.B. Rajashekar
November 2000
36
T.B. Rajashekar
November 2000
37
indices and directories) www.searchpower.com (a very comprehensive search engine directory - claims over 16,000 search engine listings!) www.123go.com/drw/search/search.htm (Dr. Websters Big Page of Search Engines ) www.finderseeker.com (The search engine of search engines) www.virtualfreesites.com (Over 1,000 specialised search engines)
November 2000 38
T.B. Rajashekar
Keeping Current
AskScott (www.askscott.com): Provides a very
comprehensive tutorial on search engines SearchEngineWatch (www.searchenginewatch.com) The site offeres information about new developments in search engines and provides reviews and tutorials. Botspot (www.botspot.com): Collection and guide to variety of bots (intelligent agents)
T.B. Rajashekar
November 2000
39
T.B. Rajashekar
November 2000
40