Inferential Statistics
Inferential Statistics
3/5/2025 1
Disclaimer
• This presentation contains materials adapted or referenced from
various online sources. These materials are included solely for
educational and informational purposes. I do not claim ownership or
original authorship of the content sourced from external references,
and full credit is due to the respective authors and creators of these
works.
3
Branches of Biostatistics
Inferential Statistics
This is a type of statistics
Focuses on drawing conclusions from;
- generalizing from samples to populations
- testing of hypothesis
- establishing relationships among variables in a sample or
determining difference between samples or groups
Inferential Statistics
null hypothesis
"the two groups will not differ“
alternative hypothesis
"group A will do better than group B"
"group A and B will not perform the same
Possible Outcomes in
Hypothesis Testing
Correct
Reject Error
Decision
Type I Error
p-value
A measure of confidence in the observed difference
Allows researchers to determine the probability that the difference is due to
chance or real
A p-value of LESS than 0.05 (<0.05) is the
common criterion for statistical significance
The probability that the results are due to
chance alone is less than 5 times out of 100
One can be 95% certain that the results are
real and not due to chance alone
Types of inferential statistical tests
• Parametric Tests
• Non Parametric tests
• Tests of association
Parametric tests
Data Research
Analysis Hypo
RESEARCH
PROCESS
Approach
Data
and
collection
Designs
Variables
and Sample
Measure- size
ment
Sampling
Recap
Variables
• A variable is a characteristic that may assume more than one set of
values to which a numerical measure can be assigned
• Variable is a characteristic that changes.
• This differs from a constant, which does not change but remains the
same.
Continuous Ordinal
Quantitative Qualitative
Discrete Nominal
Measurement Scales
• Measurement scales are used to quantify numeric values of variables
or observations
There are 4 types of measurement scales:
• Nominal Scale
• Ordinal Scale
• Interval Scale
• Ratio Scale
The Hierarchy of Measurement Scales
Data
Quantitative Qualitative or
or Numeric Categorical
22
• Causation always implies correlation but correlation does not
necessarily imply causation.
• The degree of relationship between the variables under consideration
is measured through the correlation analysis.
• The correlation analysis enable us to have an idea about the degree &
direction of the relationship between the two variables under study.
• A correlation typically evaluates three aspects of the relationship:
• the direction
• the form
• the degree
• A correlation typically evaluates three aspects of the relationship:
• the direction
• the form
• the degree
• The direction of the relationship is measured by the sign of the
correlation (+ or -).
• A positive correlation means that the two variables tend to change
in the same direction; as one increases, the other also tends to
increase.
• A negative correlation means that the two variables tend to change
in opposite directions; as one increases, the other tends to
decrease.
26
Direction of the Correlation
• Positive relationship – Variables change in the same
direction.
• As X is increasing, Y is increasing Indicated by
• As X is decreasing, Y is decreasing Sign (+).
• E.g., As height increases, so does weight.
• Negative relationship – Variables change in opposite
directions.
• As X is increasing, Y is decreasing
Indicated by
• As X is decreasing, Y is increasing Sign (-).
• E.g., As TV time (home video) increases, grades decrease
Positive and negative relationships
Positive or direct relationships
• If the points cluster around a line
that runs from the lower left to upper
right of the graph area, then the
relationship between the two
variables is positive or direct.
31
• Linear correlation: Correlation is said to be linear when the
amount of change in one variable tends to bear a constant ratio to
the amount of change in the other.
• The graph of the variables having a linear relationship will form a
straight line.
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
• Non-Linear correlation: The correlation would be non-linear if
the amount of change in one variable does not bear a constant
ratio to the amount of change in the other variable.
A perfect positive correlation
Weight
Weight
of B
Weight
of A
A linear
relationship
Height
Height Height
of A of B
Linear Correlation
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
• A correlation typically evaluates three aspects of the relationship:
• the direction
• the form
• the degree
• The measure of correlation is called the correlation coefficient (r).
• The degree of relationship (the strength or consistency of the
relationship) is measured by the numerical value of the correlation.
• A value of 1.00 indicates a perfect relationship and a value of zero
indicates no relationship.
• The degree of relationship is expressed by coefficient which range
from correlation ( -1 ≤ r ≥ +1)
37
High Degree of positive correlation
• Positive relationship
r = +.80
Weight
Height
Degree of correlation
• Moderate Positive Correlation
r = + 0.4
Shoe
Size
Weight
Degree of correlation
• Perfect Negative Correlation
r = -1.0
TV
watching
per
week
Exam score
Degree of correlation
• Moderate Negative Correlation
r = -.80
TV
watching
per
week
Exam score
Degree of correlation
• Weak negative Correlation
r = - 0.2
Shoe
Size
Weight
Degree of correlation
• No Correlation (horizontal line)
r = 0.0
IQ
Height
Degree of correlation (r)
r = +.80 r = +.60
r = +.40 r = +.20
Types of Correlation
• Simple correlation: Under simple correlation problem
there are only two variables are studied.
• Multiple Correlation: Under Multiple Correlation three
or more than three variables are studied.
• Partial correlation: analysis recognizes more than
two variables but considers only two variables
keeping the other constant.
• Total correlation: is based on all the relevant
variables, which is normally not feasible.
Hypothesis testing for correlation
• Step 1
• Ho : There is no correlation between variable A and variable B
• Ha : There is a correlation between variable A and variable B (this can be
positive or negative)
• Step 2: Calculate ‘r’ (correlation coefficient)
• Step 3: Check the corresponding p-value
• For example, if r = -0.4 and the P value is 0.007, it can be interpreted
as follows: There is statistically significant evidence of an inverse
correlation between variable A and variable B
Methods of Studying Correlation
• Scatter Diagram Method
• Karl Pearson’s Correlation
• Spearman’s Rank correlation coefficient
Methods of Studying Correlation
• Scatter Diagram Method
• Karl Pearson’s Correlation
• Spearman’s Rank correlation coefficient
Scatter Diagram Method
• Scatter Diagram is a graph of observed plotted points where
each points represents the values of X & Y as a coordinate.
It portrays the relationship between these two variables
graphically.
Table 1. BP and Age of Children
120
90 12.5
88 12.1
100 13.6
110
70 10.0
80 11.2
90 12.0
100
100 13.4
102 13.8
120 16.8
SPB
110 15.6 90
89 12.3
80 12.0
90 12.7
80
100 13.7
87 12.0
93 12.8
70
82 11.6
102 14.0 10 12 14 16 18
Age
93 13.0
86 11.9
Scatter plot of the relationship between weight and age of children
Weight Age
(kg)
38 12.5
60
45 12.1
35 13.6
50 10.0
60 11.2
45 12.0
50
30 13.4
51 13.8
53 16.8
40 15.6
43 12.3
40
39 12.0
41 12.7
40 13.7
50 12.0
56 12.8
30
52 111.6
0 50 100
62 14.0
Age
39 13.0
44 11.9
Scatter plot of the relationship between weight and age of children
Weight Age
(kg)
38 12.5
45 12.1
35 13.6
50 10.0
60 11.2
45 12.0
30 13.4
51 13.8
53 16.8
40 15.6
43 12.3
39 12.0
41 12.7
40 13.7
50 12.0
56 12.8
52 11.6
62 14.0
39 13.0
44 11.9
• Scatter plot - Positive relationship
Weight
Height
• Scatter plot - Moderate Positive Correlation
Shoe
Size
Weight
• Scatter plot - Perfect Negative Correlation
TV
watching
per
week
Exam score
• Scatter plot - Moderate Negative Correlation
TV
watching
per
week
Exam score
• Weak negative Correlation
Shoe
Size
Weight
• Scatter plot - No Correlation
IQ
Height
Advantages of Scatter Plot
• Simple & Non-Mathematical method
• First step in investing the relationship between two variables
Disadvantage of scatter diagram
76
The Pearson Correlation
• The Pearson correlation measures the direction and degree of
linear (straight line) relationship between two variables.
• Pearson’s ‘r’ is the most common correlation coefficient
• Degree of Correlation is expressed by a value of the Coefficient
• r is usually between -1 ≤ r ≥ +1
77
• To compute the Pearson correlation r (Method 1), you first measure
the variability of X and Y scores separately by computing SS for the
scores of each variable (SSX and SSY).
• Then, the co-variability (tendency for X and Y to vary together) is
measured by the sum of products (SP).
• The Pearson correlation r is found by computing the ratio,
SP/(SSX)(SSY) .
Sample Table.
x y x–x y–y (x – x)2 (y – y)2 (x – x) (y – y)
Deviations of x Deviations of y Product of x and y
43 99
21 65
25 79
42 75
57 87
59 81
43 99 -1.8 -18
21 65 20.2 16
25 79 16.2 2
42 75 -0.8 6
57 87 -15.8 -6
59 81 -17.8 0
= 0.5255
Another method for calculating “r”
Procedure for computing the correlation coefficient
(Method 2)
• Calculate the sum (∑ )of the two series ‘x’ &’y’ (∑ x & ∑y )
• Square each value of ‘x’ &’y’ then obtain the sum of the squared
deviation i.e.∑x2 & .∑y2
• Multiply each value of x with corresponding value under y &
obtain the product of ‘xy’.
• Then obtain the sum of the product of x , y i.e. ∑xy
• Substitute the value in the formula.
Sample question: Find the value of the
correlation coefficient from the following table:
Subject Age x Glucose Level y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 1:Make a chart. Use the given data, and add three more
columns: xy, x2, and y2.
Glucose
Subject Age x xy x2 y2
Level y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 2: Multiply x and y together to fill the xy column. For
example, row 1 would be 43 × 99 = 4,257.
Glucose
Subject Age x xy x2 y2
Level y
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 3: Take the square of the numbers in the x column, and put
the result in the x2 column.
Glucose
Subject Age x xy x2 y2
Level y
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Step 4: Take the square of the numbers in the y column, and put
the result in the y2 column.
Glucose
Subject Age x xy x2 y2
Level y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Step 5: Add up all of the numbers in the columns and put the
result at the bottom of the column. The Greek letter sigma (Σ) is
a short way of saying “sum of.”
Glucose
Subject Age x xy x2 y2
Level y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022
Step 6: Use the following correlation coefficient formula.
From our table:
Σx = 247
Σy = 486
Σxy = 20,485
Σx2 = 11,409
Σy2 = 40,022
n is the sample size, in our case = 6
The correlation coefficient =
Step 6: Use the following correlation coefficient formula.
From our table:
Σx = 247
Σy = 486
Σxy = 20,485
Σx2 = 11,409
Σy2 = 40,022
n is the sample size, in our case = 6
The correlation coefficient =
= 0.5298
Examples of SPSS output for Correlation
• There was a moderately positive correlation between height and weight, r (352)=
.513, p = .000
Somethings to note:
1. There are two ways to report p values. The first way is to cite the alpha value.
The second way, the preferred way is to report the exact p-value. The other thing to
note here is that if your p-value is less than .001, it's conventional to state p < .001,
rather than give the exact value.
2. The r statistic should be stated at 2 decimal places.
3. Remember to drop the leading 0 from both r and the p-value (i.e., not 0.34, but
rather .34).
4. You don't need to provide the formula for r.
5. Degrees of freedom for r is N - 2 (the number of data points minus 2).
Methods of Studying Correlation
• Scatter Diagram Method
• Karl Pearson’s Correlation
• Spearman’s Rank correlation coefficient
Spearman’s Rank Coefficient of Correlation
• When statistical series in which the variables under study are not
capable of quantitative measurement but can be arranged in
serial order, in such situation Pearson's correlation coefficient can
not be used but Spearman Rank correlation can be used.
• R = 1- (6 ∑D2 ) / N (N2 – 1)
• R = Rank correlation coefficient
• D = Difference of rank between paired item in two series.
100
The Spearman Correlation (cont.)
The calculation of the Spearman correlation requires:
101
Class example
• MICROSOFT shortlisted 10 candidates for final selection interview.
They were examined in written and oral communication skills. They
were ranked as follows:
Performance in Test 90.0 92.4 98.5 98.3 95.4 91.3 98.0 92.0
% Time spent in study 76.0 74.2 75.0 77.4 78.3 78.8 73.2 76.5
116
2. Establish Level of Significance
• α is a predetermined value
• The convention
• α = .05
• α = .01
• α = .001
117
3. Determine The Hypothesis: Whether There is an
Association or Not
118
• The hypothesis is termed the null hypothesis which states
• Ho: That there is NO statistically significant difference between
observed values and the expected values.
• In other words, the results or differences that do exist between
observed and expected are totally random and occurred by
chance alone.
• The hypothesis is termed the alternative hypothesis which states
• Ha: That there is a statistically significant difference between
observed values and the expected values.
4. Calculating Test Statistics
• Contrasts observed frequencies in each cell of a contingency table
with expected frequencies.
• The expected frequencies represent the number of cases that would
be found in each cell if the null hypothesis were true ( i.e. the nominal
variables are unrelated).
• Expected frequency of two unrelated events is product of the row and
column frequency divided by number of cases.
Fe= Fr Fc / N
120
4. Calculating Test Statistics
(O E)
2
2
E
(Fo Fe )
2
2
Fe
121
4. Calculating Test Statistics
(Fo Fe ) 2
2
F e
122
123
5. Determine Degrees of
Freedom
df = (R-1)(C-1)
6. Compare computed test statistic against a
tabled/critical value
124
Example
• Suppose a researcher is interested in examining the relationship
between heart dx and COVID-19 infection
• A questionnaire was developed and administered to a random sample
of 90 elderly patients.
• The researcher also collects information about heart dx (present or
absent) and tests for COVID-19 infection status (negative,
indeterminate, positive) for the 90 respondents.
126
Bivariate Frequency Table or Contingency Table
Heart Dx 10 10 30 50
Present
Heart Dx 15 15 10 40
Absent
f column 25 25 40 n = 90
127
Bivariate Frequency Table or Contingency Table
Heart Dx 10 10 30 50
Present
Heart Dx 15 15 10 40
Absent
f column 25 25 40 n = 90
128
Row frequency
Bivariate Frequency Table or Contingency Table
Heart Dx 10 10 30 50
Present
Heart Dx 15 15 10 40
Absent
f column 25 25 40 n = 90
129
Bivariate Frequency Table or Contingency Table
Heart Dx 10 10 30 50
Present
Heart Dx 15 15 10 40
Absent
f ncny
Colum n frecqouluem
25 25 40 n = 90
130
1. Determine Appropriate Test
131
132
2. Establish Level of
Significance
Alpha of .05
3. Determine The Hypothesis
• Ho : There is NO relationship between heart disease and COVID-19
infection among elderly patients
133
4. Calculating Test Statistics
(Recall study findings)
Heart Dx 10 10 30 50
Present
Heart Dx 15 15 10 40
Absent
f column 25 25 40 n = 90
134
4. Calculating Test Statistics
Remember: Fe= Fr Fc / N
135
4. Calculating Test Statistics
Remember: Fe= Fr Fc / N
136
4. Calculating Test Statistics
Remember: Fe= Fr Fc / N
137
Test Statistics
138
4. Calculating Test Statistics
2
2
(O E)
E
(10 13.89)2 (10 13.89)2 (30 22.2)2
2
13.89 13.89 22.2
= 11.03
139
5. Determine Degrees of 140
Freedom
Negative Not sure Positive f row df = (R-1)(C-1) =
(2-1)(3-1) = 2
Heart Dx 10 10 30 50
Present
Heart Dx 15 15 10 40
Absent
f column 25 25 40 n = 90
• Check the critical value in the Table with the following parameters
• α = 0.05
• df = 2
6. Compare computed test statistic against a
tabled/critical value
• α = 0.05
• df = 2
• Remember, If calculated 2 is greater than 2 table value, reject Ho
• Critical tabled value = 5.991
• Test statistic, 11.03, exceeds critical value
• Null hypothesis is rejected
• There is a relationship between heart disease and COVID-19
infection among elderly patients
143
The Hypothesis
If calculated 2 is greater than 2 table value, reject Ho
11.03 > 5.991
144
SPSS Output for COVID-19 infection Example
Chi-Square Tests
Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 11.025a 2 .004
Likelihood Ratio 11.365 2 .003
Linear-by-Linear
Association 8.722 1 .003
N of Valid Cases 90
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 11.11.
145
How to report Chi Square
• A Chi-squared analysis showed a significant relationship between
heart disease and COVID-19 infection among elderly patients
(χ2=11.03, df=2, P=0.004).
• If expected count is ≤ 5 and Table is 2 X 2 use Fishers Exact Test p-
values
• If expected count is > 5 and Table > 2 X 2 use Chi Squared Test p-
values
• If expected count is > 5 and Table is 2 X 2 use Continuity correction p-
values
• For example:
20 30 10
2
2
(O E)
E
APC PDP Labor
Observed 20 30 10
Expected 20 20 20
• Plug in to formula
• (Expected observation is gotten from the mean of the observed
proportions)
(20 20)2 (30 20)2 (10 20)2
2
2
(O E)
E 20 20 20
2 (2) 10
.05
2
5.99
• Remember, If calculated 2 is greater than 2 table value, reject Ho
• Reject H0
• The district will probably vote PDP
More Examples
• What do Doctors do with their free time?
Males
30 40 20 10
Females
20 30 40 10
• Question: Is there a relationship between gender and
what Doctors do with their free time?
• Can you state the null and the alternative hypothesis?
Males
30 40 20 10 100
Females
20 30 40 10 100
50 70 60 20 200
• Expected = (Ri*Cj)/N
• Example for males TV: (100*50)/200 = 25
Watch TV Sleep Plan Strike Play Total
• df = (R-1)(C-1)
• R = number of rows
• C = number of columns
Interpretation
2 (3) 10.10
7.82
2
.05
Males
30 40 20 10
Females
20 30 40 10
Assumptions
• Normality
• Rule of thumb is that we need at least 5 for our expected frequencies value
• Inclusion of non-occurences
• Must include all responses, not just those positive ones
• Independence
• Note that the variables are independent or related (that’s what the test can be used for), but
rather as with t-tests, the observations (data points) don’t have any bearing on one another.
• To help with the last two, make sure that your N equals the total number of
people who responded
Exercise
• A study was conducted to assess the effectiveness of motorcycle safety
helmets in preventing head injury. The data consist of a random sample of
793 people involved in motorcycle accidents during a one-year period. Use
the data below if test if there is an association between having a head injury
and wearing helmets.
Head injury
Wearing helmets Yes No
Yes 17 130
No 218 428
Total 235 558