Week-1 Why Do We Need Statistics
Week-1 Why Do We Need Statistics
and Tourism
Week 1
Tadayuki Hara, PhD
The Research Process
2
Types of Data Analysis
Quantitative Methods
Testingtheories using numbers
Qualitative Methods
Testing theories using language
Magazine articles/Interviews
Conversations
Newspapers
Media broadcasts
3
Initial Observation
Find something that needs explaining
Observe the real world
Read other research
Test the concept: collect data
Collect data to see whether your guess is
correct
To do this, you need to define variables
Anything that can be measured and can differ
across entities or time.
4
The Research Process
5
Generating and Testing Theories
Theories
An hypothesized general principle or set of principles
that explain known findings about a topic and from
which new hypotheses can be generated.
Hypothesis
A prediction from a theory.
A person with increased disposable income is more
likely to travel.
Satisfied guests with hotel services would be more
likely to come back.
A hotel employee who are satisfied more with
his/her job (salary, bosses, benefits) would quit less
than comparable employees.
6
The Research Process
7
Collect Data to Test Your Theory
Hypothesis:
Propose a hypothesis
Independent Variable
Possible associations
A predictor
variable
A manipulated variable (in experiments)
Dependent Variable
The proposed effect
An outcome variable
Measured not manipulated (in experiments)
8
Levels of Measurement
Categorical (entities are divided into distinct categories):
Binary variable: There are only two categories
e.g. dead or alive.
Nominal variable: There are more than two categories (orders do not have
meanings)
e.g. race, nationality
Ordinal variable: The same as a nominal variable but the categories have a
logical order (e.g. Likert scale)
e.g. “how do you like the streaming video course so far?”
(5. Like it very much, 4. Like it, 3. neutral, 2. Not like it 1. hate it – Ordinal has range!)
9
Measurement Error
Measurement error
Thediscrepancy between the actual value we’re trying to
measure, and the number we use to represent that value.
Example:
You (in reality) weigh 80 kg.
You stand on your bathroom scales and they say 83 kg.
The measurement error is 3 kg.
10
The Research Process
11
Analysing Data: Histograms
12
The Normal Distribution
Properties of Frequency Distributions
Skew
The symmetry of the distribution.
Positive skew (scores bunched at low values with
the tail pointing to high values).
Negative skew (scores bunched at high values
with the tail pointing to low values).
Kurtosis
The ‘heaviness’ of the tails.
Leptokurtic = heavy tails.
Platykurtic = light tails.
14
Skew
15
Kurtosis
16
Central tendency: The Mode
Mode
Themost frequent score
Bimodal
Having two modes
Multimodal
Having several modes
17
Bimodal and Multimodal
Distributions
18
Central Tendency: The Median
Here are Numbers of friends of 11 Facebook users
The Median is the middle score when scores are
ordered:
Median
19
Central Tendency: The
MeanMean
The sum of scores divided by the number of
scores.
Number of friends of 11 Facebook users.
20
The Dispersion: Range
The Range
The smallest score subtracted from the largest
For our Facebook friends data the highest score is 234
and the lowest is 22; therefore the range is:
234 −22 = 212
21
The Dispersion: The Inter‐quartile
range
Quartiles
The three values that split the sorted data into four
equal parts.
Second Quartile = median.
Lower quartile = median of lower half of the data
Upper quartile = median of upper half of the data
22
Deviance
23
Sum of Squared Errors, SS
Indicates the total dispersion, or total deviance
of scores from the mean:
24
Standard Deviation
The variance gives us a measure in units squared.
In our Facebook example we would have to say that the
average error in out data was 3224.6 friends squared.
This problem is solved by taking the square root of the
variance, which is known as the standard deviation:
25
Using a Frequency Distribution to
go Beyond the Data
26
Important Things to Remember
The Sum of Squares, Variance, and Standard Deviation
represent the same thing with different expressions:
The ‘Fit’ of the mean to the data
The variability in the data
How well the mean represents the observed data
Error
27
The Normal Probability Distribution
28
Going beyond the data: Z‐scores
Z‐scores
Standardising a score with respect to the other scores in
the group.
Expresses a score in terms of how many standard
deviations it is away from the mean.
The distribution of z‐scores has a mean of 0 and SD
XX
= 1.
z
s
29
Probability Density function of a
Normal Distribution
30
Properties of z‐scores
1.96 cuts off the top 2.5% of the distribution.
−1.96 cuts off the bottom 2.5% of the distribution.
As such, 95% of z‐scores lie between −1.96 and 1.96.
99% of z‐scores lie between −2.58 and 2.58,
99.9% of them lie between −3.29 and 3.29.
31