0% found this document useful (0 votes)
17 views

Search Engines&research Methds

Uploaded by

danifunflouis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Search Engines&research Methds

Uploaded by

danifunflouis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Kwame Nkrumah University of

Science & Technology, Kumasi, Ghana

COLLEGE OF ENGINEERING
Engineering in Society CENG 291

Part I: Introduction to Search Engines


Part II: Introduction to Survey
Kwame Nkrumah University of
Science & Technology, Kumasi, Ghana

Part I: Introduction to
Search Engines
Lecture Plan
• World Wide Web
• Example of Search Engines
• Google, Yahoo, Bing

www.knust.edu.gh
World Wide Web
• World Wide Web (WWW) is a global
hypermedia system on Internet.
• It can be described as wide-area hypermedia
information retrieval initiative aiming to give
universal access to a large universe of
documents.

www.knust.edu.gh
World Wide Web
• Through WWW it is possible to deliver
hypertext, graphics, animation and sound
between different computer environments.
• To use WWW the user needs a browser, for
example internet explorer and a set of
viewers, that are used to display complex
graphics, animation and sound.
• Browsers are currently available on Windows,
Linux and Macintosh.
www.knust.edu.gh
Search Engine
▪ A Web search engine is a search engine designed to
search for information on the World Wide Web.

▪ Information may consist of web pages, images and


other types of files.

▪ Eg: Google.com, Yahoo!, Altavista.com, Excite.com

www.knust.edu.gh
How it works?
• The searcher types a query into a search
engine.
• Search engine software quickly sorts through
literally millions of pages in its database to
find matches to this query.
• The search engine's results are ranked in order
of relevancy.

www.knust.edu.gh
Example of Search Engines
• Google is always a safe bet for most search
queries, and most of the time your search will
be successful on the very first page of search
results.
• Yahoo is also a great choice, and finds a lot of
stuff that Google does not necessarily pick up.

www.knust.edu.gh
Example of Search Engines
• There are some search engines out there that are
able to answer factual questions, among these are
Answers.com, BrainBoost, Factbites, and Ask
Jeeves.
• There are quite a few search engines that will help
you do this with clustered results or search
suggestions.
• Some of these include Clusty, WiseNut, AOL Search,
and Teoma, in addition to Gigablast, AllTheWeb, and
SurfWax.
www.knust.edu.gh
Example of Search Engines
• There are lots of great search engines that
deal primarily in academic and research
oriented results.
• Included among them are:
– GoogleScholar, ResearchGate, Scirus, Yahoo
Reference, National Geographic Map Machine,
MagPortal, CompletePlanet, FirstGov, and
EducationWorld.

www.knust.edu.gh
Example of Search Engines
• Images on the Web are easy to find, especially
with targeted image search engines such as
Picsearch, Ditto, and of course, Google has
some fantastic image search capabilities.
• You can also check out my list of Image Search
Engines-Directories-Collections, or Clip Art-
Buttons-Graphics- Icons-Images on the Web.

www.knust.edu.gh
Example of Search Engines
• There's so much multimedia on the Web that
your main problem will be finding enough
time to look at it all.
• Here are a few places you can use to search
for sounds, movies, and music on the Web,
e.g.:
– YouTube, Loomia, Torrent Typhoon, The Internet
Movie Database, SingingFish, and Podscope.

www.knust.edu.gh
Example of Search Engines
• Finding someone with similar interests on the Web
via a blog or online community is simple.
• Use LjSeek, Technorati, and Daypop to search for
blogs;
• find people with ZoomInfo, Pretrieve, or Zabasearch,
and
• search for discussion groups and message boards
with BoardTracker.
• For programming solutions forums like
StackOverflow and others are helpful
www.knust.edu.gh
Indexing
• Search engine indexing collects, parses, and stores
data to facilitate fast and accurate information
retrieval.
• Index design incorporates interdisciplinary concepts
from linguistics, cognitive psychology, mathematics,
informatics, physics and computer science.
• An alternate name for the process in the context of
search engines designed to find web pages on the
Internet is Web indexing.
www.knust.edu.gh
Kwame Nkrumah University of
Science & Technology, Kumasi, Ghana
www.knust.edu.gh
Google strengths
• The interface is clear and simple.
• Pages load instantly.
• Placement in search results is never sold to
anyone.
• No other search engine accesses more of the
Internet or delivers more useful information
than Google.
• Google Search is fast with most results coming
back to the user in less than one second
www.knust.edu.gh
Google Weakness
• Some people love the results they get at Google, others
are often disappointed.
• To a large extent, both the pluses and the minuses derive
from Google's ranking system, which (as the folks at
Google explain http://www.google.com/technology/)
depends largely on the
– number of links to a particular page and
– the relevance of the content on those linking pages to the
content on the target page, and
– the quality of the pages doing the linking.

www.knust.edu.gh
Conclusion
• Google is designed to crawl and index the Web
efficiently and produce much more satisfying search
result than existing systems.
• Google's technology uses the collective intelligence
of the web to determine a page's importance.
• There is no human involvement or manipulation of
results, which is why users have come to trust
Google as a source of objective information
untainted by paid placement
www.knust.edu.gh
Google Search Tips:
• You can search for a phrase by using quotations ["like this"] or with a
minus sign between words [like-this].
• You can search by a date range by using two dots between the years
[2004..2007].
• When searching with a question mark [?] at the end of your phrase,
you will see sponsored Google Answer links, as well as definitions if
available.
• Google searches are not case sensitive.
• By default Google will return results which include all of your search
terms.
• Google automatically searches for variations of your term, with
variants of the term shown in yellow highlight.
• Google lets you enter up to 32 words per search query.
www.knust.edu.gh
www.knust.edu.gh
Yahoo Search Tips:
• By default Yahoo returns results that include all of your search
terms
• To exclude words use a minus sign [cat -tabby] would show all
results about cats with no mention of tabby.
• Yahoo search results also shows related searches, which are
based on other searches by users with similar terms
• To search for a map, use map [location]
• To search for dictionary definitions use "define" [define
harddrive]
• To search a single domain use site [site:webopedia.com DVD]
would search Webopedia for the term DVD.
www.knust.edu.gh
Bing
• Bing is a new search engine from Microsoft that was launched
on May 28, 2009.
• Microsoft calls it a "Decision Engine," because it's designed to
return search results in a format that organizes answers to
address your needs.
• When you search on Bing, in addition to providing relevant
search results, the search engine also shows a list of related
searches on the left-hand side of the search engine results
page (SERP).
• You can also access a quick link to see recent search history.
Bing uses technology from a company called Powerset, which
Microsoft acquired

www.knust.edu.gh
BING
• Bing launched with several features that are unique in the
search market.
• For example, when you mouse-over a Bing result a small pop-
up provides additional information for that result, including a
contact e-mail address if available.
• The main search box features suggestions as you type, and
Bing's travel search is touted as being the best on the net.
Bing is expected to replace Microsoft Live Search.

www.knust.edu.gh
• Bing Search Tips:
You can search for feeds using feeds: before the
query
• To search Bing without a background image use
http://www.bing.com/?rb=0
• To turn the background image back on, use
http://www.bing.com/?rb=1
• To change the number of search results returned per
page, click "Extras" (on top-right of page) and select
"Preferences". Under Web Settings / Results you can
choose 10, 15, 30 or 50 results

www.knust.edu.gh
MathBrowser
• Recently, The Mathsoft company has announced
MathBrowser, a WWW-browser that can display
HTML and MathCAD documents.
• MathBrowser has a computational engine and
interface similar than MathCAD, allowing the
student to edit MathCAD documents through the
Internet.
• MathBrowser is used to distribute a collection of
Shaum's outline series in electronic form.
www.knust.edu.gh
Wikipedia
• An online encyclopedia
• Open source
• Created and edited by everyone with
expertise in the field
• Be careful when citing Wikipedia; if possible,
go to Wikipedia's source.

www.knust.edu.gh
Kwame Nkrumah University of
Science & Technology, Kumasi, Ghana

PART II
Research Methods
Introduction to Survey
Data Collection and Analysis
Lecture Plan
• General Introduction
• Collecting Data with Surveys
• Summarizing Data

www.knust.edu.gh
Kwame Nkrumah University of
Science & Technology, Kumasi, Ghana

Part 1: General Introduction


Introduction
• Applied Statistics generalizes into 5-step process:
– Problem Definition/Description
– “Research Design”
– Data Collection
– Data Description/Summarization
– Data Analysis/Interpretation/Communication
• Many methods exist to complete this process for
many types of data and experimental settings
www.knust.edu.gh
Examples of Practical Problems
• New Drug Testing: Does a new drug improve patient
condition more so than an existing drug or placebo?
• Quality Control: Does a manufacturing facility continue to
have an acceptable rate of defectives?
• Political Polling: What proportion of adults favor a particular
political stance or candidate?
• Measuring Animal Populations: How many fish of a
particular species live in a lake?
• Agricultural Production: Which of 4 varieties of fertilizer give
best results in grain production?
• Marketing: What levels of pricing and advertising produce
the highest profits for a new product?
www.knust.edu.gh
Populations and Samples
• Population: All measurements of interest
to researcher
• Sample: Subset of population that is
obtained and to be analyzed

Population
Sample

www.knust.edu.gh
Statistical Inference
• Goal: Make statements regarding a population
(or true state of nature) based on observations
from a sample.
– What can be said of the average measurement in a
population, when we obtain the average of a sample
from the population?
– Can we conclude success rates differ for two
treatments in nature (across a conceptual population
of units), based on a certain difference in success
rates in two samples of units that were assigned the
treatments
www.knust.edu.gh
Kwame Nkrumah University of
Science & Technology, Kumasi, Ghana

Part 2: Collecting Data with


Surveys
Surveys
• Instruments used to obtain demographic
characteristics and attitudes or behavioral
tendencies from subjects
• Passive in nature, obtaining “naturally
occurring” information
• Many fields conduct surveys regularly:
– Public Opinions: Gallup, CNN, WSJ, TV Networks
– Government Bureaus: Census, Labor Statistics
– Business: Customer satisfaction, Quality, Practices
– Recreation: State parks and wildlife area usage
www.knust.edu.gh
Sampling Methods
• Simple Random Sampling: Frame listing all N elements of
population exists. Random numbers used to obtain a
sample of n elements such that all samples of size n had
equal chance of selection
• Stratified Random Sampling: Population split into
homogeneous groups (strata) based on auxiliary variable(s)
such as gender, income, race. Simple random samples
taken from each stratum.
• Cluster Sampling: Population broken into set of clusters
(often based on location), and sample of clusters are
selected, with all elements in sampled cluster measured
• Systematic Sampling: Element selected at random near
top of list, then every kth element subsequently measured
www.knust.edu.gh
Survey Problems
• Nonresponse: If people who do not respond tend to
differ systematically from responders, results will be
biased
• Measurement Problems
– Recall: Tendency to forget occurrences of certain things or be
unable to give accurate counts of frequency of occurrence
– Leading Questions: Wording of questions can lead to certain
responses that can bias survey results
– Unclear Wording: Different people can interpret the same
question in different ways, making results inaccurate when
responses depend on interpretations
www.knust.edu.gh
Survey Techniques
• Personal Interviews: In person, face-to-face meetings
between interviewer and interviewee. Biases can occur
due to the interaction.
• Telephone Interviews: Interview over the phone. Less
costly than personal interviews. Bias can occur due to
unlisted numbers and different schedules for different
people.

www.knust.edu.gh
• Self-administered Questionnaire:
Inexpensive, but notoriously low response
rates. Can be done by mail or on internet.

• Direct Observation: Measurements made


directly using monitoring equipment or public
records

www.knust.edu.gh
Kwame Nkrumah University of
Science & Technology, Kumasi, Ghana

Part 3: Summarizing Data


Graphical Methods - 1 Variable
• After data collected, sorted into categories/ranges of
values so that each individual observation falls in
exactly one category/range
– Numeric Responses: Break “range” of values into non-
overlapping bins and count number of units in each bin
– Categorical Responses: List all possible categories (with
“Other” if needed), and count numbers of units in each
• Pie Chart: Displays percent in each category/range
• Bar Chart: Displays frequency/percent per category
• Histogram: Displays frequency/percent per “range”
www.knust.edu.gh
Constructing Pie Charts
• Select a small number of categories (say 5
or 6 at most) to avoid many narrow
“slivers”
• If possible, arrange categories in ascending
or descending order for categorical
variables

www.knust.edu.gh
Category Range Count
Philly Monthy Rainfall 1825-1869 (1/100 inches) 1 <100 17
2 100-199 78
3 200-299 132
4 300-399 115
5 400-499 86
6 500-599 55
7 600-699 27
8 700-799 17
9 800-899 6
1012 900-999 3
113 >1000 4
4
5
6
7
8
9
10
11

www.knust.edu.gh
Constructing Bar Charts

• Put frequencies on one axis (typically vertical,


unless many categories) and categories on other
• Draw rectangles over categories with
height=frequency
• Leave spaces between categories

www.knust.edu.gh
Constructing Histograms
• Used for numeric variables, so need Class Intervals
– Let Range = Largest - Smallest Measurement
– Break range into (say) 5-20 intervals depending on sample size
– Make the width of the subintervals a convenient unit, and make
“break points” so that no observations fall on them
– Obtain Class Frequencies, the number in each subinterval
– Obtain Relative Frequencies, proportion in each subinterval
• Construct Histogram
– Draw bars over each subinterval with height representing class
frequency or relative frequency (shape will be the same)
– Leave no space between bars to imply adjacency of class
intervals
www.knust.edu.gh
Histogram

140 100
120 200
300
100
Frequency

400
80 500
60 600
40 700
800
20
900
0
1000
1100
0

e
00
10

30

50

70

90

or
11

M
1200
rain100 More

www.knust.edu.gh
Interpreting Histograms
• Probability: Heights of bars over the class intervals
are proportional to the “chances” an individual
chosen at random would fall in the interval
• Unimodal: A histogram with a single major peak
• Bimodal: Histogram with two distinct peaks (often
evidence of two distinct groups of units)
• Uniform: Interval heights are approximately equal
• Symmetric: Right and Left portions are same shape
• Right-Skewed: Right-hand side extends further
• Left-Skewed: Left-hand side extends further
www.knust.edu.gh
Numerical Descriptive Measures

• Numeric summaries of a set of measurements


• Measures of Central Tendency describe the “location”
or center of a set of measurements
• Measures of Variability describe the “spread” or
dispersion of a set of measurements
• Parameters: Numeric descriptive measures based on
Populations of measurements
• Statistics: Numeric descriptive measures based on
Samples of measurements

www.knust.edu.gh
Measures of Central Tendency - I
• Mode: Most often occurring outcome (typically only of
interest for variables taking on only “discrete” values)
• Median: Middle value when measurements ordered
from smallest to largest
• Mean: Sum of all measurements, divided by total
number of measurements (equal distribution of total)
Population ( N elements) : =
 i
yi
N

Sample ( n elements) : y=
 i
yi
n
In practice, we only observe sample, and use y to estimate 
www.knust.edu.gh
Summarizing Data of More than One Variable
• Contingency Table: Cross-tabulation of units based on
measurements of two qualitative variables simultaneously
• Stacked Bar Graph: Bar chart with one variable
represented on the horizontal axis, second variable as
subcategories within bars
• Cluster Bar Graph: Bar chart with one variable forming
“major groupings” on horizontal axis, second variable
used to make side-by-side comparisons within major
groupings (displays all combinations in factorial expt)
• Scatterplot: Plot with quantitaive variables y and x plotted
against each other for each unit
• Side-by-Side Boxplot: Compares distributions by groups
www.knust.edu.gh
Example - Ginkgo and Acetazolamide for Acute
Mountain Syndrome Among Himalayan Trekkers
AMS No AMS Total
Placebo 40 79 119
Acet 14 104 118
Ginkgo 43 81 124
Acc+Gi 18 108 126
Total 115 372 487
AMS No AMS Total
Placebo 33.61 66.39 100
Acet 11.86 88.14 100
Ginkgo 34.68 65.32 100
Acc+Gi 14.29 85.71 100
www.knust.edu.gh
• MicroSoft Excel is a simple but excellent tool
for summarizing data.

• Search Youtube to learn how to use it to


construct the various types of charts using the
data provided in the previous slide

www.knust.edu.gh
Stacked Bar Graph of AMS Incidence (Percent) Cluster Bar Graph of AMS Incidence (Counts)

100% 120

90%

100
80%

70%
80

60%

Frequency
No AMS AMS
50% 60
AMS No AMS

40%

40
30%

20%
20

10%

0% 0
Placebo Acet Ginkgo Acc+Gi Placebo Acet Ginkgo Acc+Gi
Treatment Treatment

3-D Barchart of Incidence of AMS

100.00

90.00

80.00

70.00

60.00

Percent within Treatment 50.00

40.00

30.00

20.00

10.00

0.00 No AMS

Placebo Outcome
Acet AMS
Ginkgo
Treatment Acc+Gi

www.knust.edu.gh
www.knust.edu.gh

You might also like