0% found this document useful (0 votes)

4 views171 pages

Inferential Statistics

The document provides an overview of inferential statistics, focusing on hypothesis testing, correlation, and various statistical tests. It explains the concepts of null and alternative hypotheses, types of errors in hypothesis testing, and the importance of p-values. Additionally, it discusses correlation analysis, including the types of correlation and methods for studying correlations.

Uploaded by

sylvester.nwabueze

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views171 pages

Inferential Statistics

Uploaded by

sylvester.nwabueze

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 171

Inferential Statistics

3/5/2025 1
Disclaimer
• This presentation contains materials adapted or referenced from
various online sources. These materials are included solely for
educational and informational purposes. I do not claim ownership or
original authorship of the content sourced from external references,
and full credit is due to the respective authors and creators of these
works.
3

Branches of Biostatistics
Inferential Statistics
This is a type of statistics
Focuses on drawing conclusions from;
- generalizing from samples to populations
- testing of hypothesis
- establishing relationships among variables in a sample or
determining difference between samples or groups
Inferential Statistics

Inferential Statistics: use sample data to evaluate the

credibility of a hypothesis about a population

Hypothesis is a statement about expected findings in a study

- establishing relationship between the outcome and
exposure or difference between samples or groups
Inferential Statistics

null hypothesis
"the two groups will not differ“

alternative hypothesis
"group A will do better than group B"
"group A and B will not perform the same
Possible Outcomes in
Hypothesis Testing

Null is True Null is False

Correct
Accept Error
Decision
Type II Error

Correct
Reject Error
Decision
Type I Error

Type I Error: Rejecting a True Hypothesis

Type II Error: Failing to reject (Accepting) a False hypothesis
Guide to deciding on the Hypotheses

p-value
A measure of confidence in the observed difference
Allows researchers to determine the probability that the difference is due to
chance or real
A p-value of LESS than 0.05 (<0.05) is the
common criterion for statistical significance
The probability that the results are due to
chance alone is less than 5 times out of 100
One can be 95% certain that the results are
real and not due to chance alone
Types of inferential statistical tests

• Parametric Tests
• Non Parametric tests
• Tests of association
Parametric tests

• Fulfil the assumption of normality

• Utilize the parameters of the normal distribution
• Executed on quantitative data
Normal distribution properties
• The mean, median and mode are equal (Mean=Median=Mode)
• The total area under the curve is equal to 1
• The normal distribution curve is symmetrical at the center
• Approximately half of the values are to the right of the center as well as the
left
• The normal distribution curve is defined by the mean and standard
deviation
• The normal distribution curve must have only one peak
• The curve approaches the x-axis, but it never touches, and it extends
farther away from the mean.
Correlation and
Chi-Square Tests
Research
Title and
Question
Report /
Literature
Presentati
Review
ons

Data Research
Analysis Hypo

RESEARCH
PROCESS
Approach
Data
and
collection
Designs

Variables
and Sample
Measure- size
ment
Sampling
Recap
Variables
• A variable is a characteristic that may assume more than one set of
values to which a numerical measure can be assigned
• Variable is a characteristic that changes.
• This differs from a constant, which does not change but remains the
same.

Continuous Ordinal
Quantitative Qualitative
Discrete Nominal
Measurement Scales
• Measurement scales are used to quantify numeric values of variables
or observations
There are 4 types of measurement scales:
• Nominal Scale
• Ordinal Scale
• Interval Scale
• Ratio Scale
The Hierarchy of Measurement Scales

Ratio Absolute zero

Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Types of Data

Data

Quantitative Qualitative or
or Numeric Categorical

Continuous Data Discrete Data Nominal Data Ordinal Data

Today
Inferential Statistics
 Inferential statistics allows researchers to make decisions or
inferences by interpreting data patterns

 Inferential statistics can be classified as either Estimation or

Hypothesis testing.
Hypothesis Testing
• Hypothesis testing is a statistical method used to determine if there is
enough evidence in a sample data to draw conclusions about a
population
• Common tests include t-tests, chi-square tests, ANOVA, and
regression analysis. The selection depends on data type, distribution,
sample size
• Depending on data distributions, hypothesis testing can be
Parametric or nonparametric
Parametric and Nonparametric tests
• The key difference between parametric and nonparametric
tests is that the parametric test relies on statistical
distributions in data whereas non parametric do not depend
on any distribution.
• In the literal meaning of the terms, a parametric statistical test makes
assumptions about the parameters (defining properties) of the
population distribution(s) from which one's data are drawn.
• In contrast, a non-parametric test makes no such assumptions.
• Nonparametric statistics are most commonly used for variables at
the nominal or ordinal level of measurement, meaning they are used
for variables that do not have a normal distribution.
Sample vs. Population
Today
• Understanding and interpreting Correlation
• Understanding and interpreting Chi-Square Tests
Correlation
Correlation
• Correlation is a statistical tool that helps to measure, describe
and analyse the degree of relationship between two variables.
• Correlation analysis deals with the association between two or
more variables.
Thinking about lines, What can we measure?:
• Gradient – a measure of how the line slopes
• Intercept – where the line cuts the y axis
• Correlation – a measure of how well the line
fits the data
y
5
Equation for a line: y = 1.5 + 0.5x
y = a + bx 4
3
a is the point at which the line
crosses the y axis (when x=0). 2
1
b is a measure of the slope
(the amount of change in y that 00 5
x
1 2 3 4
occurs with a 1-unit change in x).
• A relationship exists when changes in one variable tend to be
accompanied by consistent and predictable changes in the other
variable.
• If two variables vary in such a way that movement in one is
accompanied by movement in other, these variables are called cause
and effect relationship.

22
• Causation always implies correlation but correlation does not
necessarily imply causation.
• The degree of relationship between the variables under consideration
is measured through the correlation analysis.
• The correlation analysis enable us to have an idea about the degree &
direction of the relationship between the two variables under study.
• A correlation typically evaluates three aspects of the relationship:
• the direction
• the form
• the degree
• A correlation typically evaluates three aspects of the relationship:
• the direction
• the form
• the degree
• The direction of the relationship is measured by the sign of the
correlation (+ or -).
• A positive correlation means that the two variables tend to change
in the same direction; as one increases, the other also tends to
increase.
• A negative correlation means that the two variables tend to change
in opposite directions; as one increases, the other tends to
decrease.

26
Direction of the Correlation
• Positive relationship – Variables change in the same
direction.
• As X is increasing, Y is increasing Indicated by
• As X is decreasing, Y is decreasing Sign (+).
• E.g., As height increases, so does weight.
• Negative relationship – Variables change in opposite
directions.
• As X is increasing, Y is decreasing
Indicated by
• As X is decreasing, Y is increasing Sign (-).
• E.g., As TV time (home video) increases, grades decrease
Positive and negative relationships
Positive or direct relationships
• If the points cluster around a line
that runs from the lower left to upper
right of the graph area, then the
relationship between the two
variables is positive or direct.

Negative or inverse relationships

• If the points tend to cluster around
a line that runs from the upper left
to lower right of the graph, then the
relationship between the two
variables is negative or inverse.
• A correlation typically evaluates three aspects of the relationship:
• the direction
• the form
• the degree
• The most common form of relationship is a straight line or linear
relationship which is measured by the Pearson correlation.
• There can also be a non-linear relationship

31
• Linear correlation: Correlation is said to be linear when the
amount of change in one variable tends to bear a constant ratio to
the amount of change in the other.
• The graph of the variables having a linear relationship will form a
straight line.
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 5, 7, 9, 11, 13, 15, 17, 19,
Y = 3 + 2x
• Non-Linear correlation: The correlation would be non-linear if
the amount of change in one variable does not bear a constant
ratio to the amount of change in the other variable.
A perfect positive correlation
Weight
Weight
of B
Weight
of A
A linear
relationship

Height
Height Height
of A of B
Linear Correlation
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
• A correlation typically evaluates three aspects of the relationship:
• the direction
• the form
• the degree
• The measure of correlation is called the correlation coefficient (r).
• The degree of relationship (the strength or consistency of the
relationship) is measured by the numerical value of the correlation.
• A value of 1.00 indicates a perfect relationship and a value of zero
indicates no relationship.
• The degree of relationship is expressed by coefficient which range
from correlation ( -1 ≤ r ≥ +1)

37
High Degree of positive correlation
• Positive relationship
r = +.80

Weight

Height
Degree of correlation
• Moderate Positive Correlation

r = + 0.4

Shoe
Size

Weight
Degree of correlation
• Perfect Negative Correlation
r = -1.0

TV
watching
per
week

Exam score
Degree of correlation
• Moderate Negative Correlation

r = -.80
TV
watching
per
week

Exam score
Degree of correlation
• Weak negative Correlation
r = - 0.2
Shoe
Size

Weight
Degree of correlation
• No Correlation (horizontal line)
r = 0.0

Height
Degree of correlation (r)
r = +.80 r = +.60

r = +.40 r = +.20
Types of Correlation
• Simple correlation: Under simple correlation problem
there are only two variables are studied.
• Multiple Correlation: Under Multiple Correlation three
or more than three variables are studied.
• Partial correlation: analysis recognizes more than
two variables but considers only two variables
keeping the other constant.
• Total correlation: is based on all the relevant
variables, which is normally not feasible.
Hypothesis testing for correlation
• Step 1
• Ho : There is no correlation between variable A and variable B
• Ha : There is a correlation between variable A and variable B (this can be
positive or negative)
• Step 2: Calculate ‘r’ (correlation coefficient)
• Step 3: Check the corresponding p-value
• For example, if r = -0.4 and the P value is 0.007, it can be interpreted
as follows: There is statistically significant evidence of an inverse
correlation between variable A and variable B
Methods of Studying Correlation
• Scatter Diagram Method
• Karl Pearson’s Correlation
• Spearman’s Rank correlation coefficient
Methods of Studying Correlation
• Scatter Diagram Method
• Karl Pearson’s Correlation
• Spearman’s Rank correlation coefficient
Scatter Diagram Method
• Scatter Diagram is a graph of observed plotted points where
each points represents the values of X & Y as a coordinate.
It portrays the relationship between these two variables
graphically.
Table 1. BP and Age of Children

SBP Age Weight Age

(kg)
90 12.5
38 12.5
88 12.1
45 12.1
100 13.6
35 13.6
70 10.0
50 10.0
80 11.2
60 11.2
90 12.0
45 12.0
100 13.4
30 13.4
102 13.8
51 13.8
120 16.8
53 16.8
110 15.6
40 15.6
89 12.3
43 12.3
80 12.0
39 12.0
90 12.7
41 12.7
100 13.7
40 13.7
87 12.0
50 12.0
93 12.8
56 12.8
82 11.6
52 111.6
102 14.0
62 14.0
93 13.0
39 13.0
86 11.9
44 11.9
Scatter plot of the relationship between SBP and age of children
SBP Age

120
90 12.5
88 12.1
100 13.6

110
70 10.0
80 11.2
90 12.0

100
100 13.4
102 13.8
120 16.8
SPB
110 15.6 90
89 12.3
80 12.0
90 12.7
80

100 13.7
87 12.0
93 12.8
70

82 11.6
102 14.0 10 12 14 16 18
Age
93 13.0
86 11.9
Scatter plot of the relationship between weight and age of children
Weight Age
(kg)
38 12.5

60
45 12.1
35 13.6
50 10.0
60 11.2
45 12.0

50
30 13.4
51 13.8
53 16.8
40 15.6
43 12.3
40

39 12.0
41 12.7
40 13.7
50 12.0
56 12.8
30

52 111.6
0 50 100
62 14.0
Age
39 13.0

44 11.9
Scatter plot of the relationship between weight and age of children
Weight Age
(kg)
38 12.5
45 12.1
35 13.6
50 10.0
60 11.2
45 12.0
30 13.4
51 13.8
53 16.8
40 15.6
43 12.3
39 12.0
41 12.7
40 13.7
50 12.0
56 12.8
52 11.6
62 14.0
39 13.0

44 11.9
• Scatter plot - Positive relationship

Weight

Height
• Scatter plot - Moderate Positive Correlation

Shoe
Size

Weight
• Scatter plot - Perfect Negative Correlation

TV
watching
per
week

Exam score
• Scatter plot - Moderate Negative Correlation

TV
watching
per
week

Exam score
• Weak negative Correlation

Shoe
Size

Weight
• Scatter plot - No Correlation

Height
Advantages of Scatter Plot
• Simple & Non-Mathematical method
• First step in investing the relationship between two variables
Disadvantage of scatter diagram

• Can not adopt the exact degree of correlation

Methods of Studying Correlation
• Scatter Diagram Method
• Karl Pearson’s Correlation
• Spearman’s Rank correlation coefficient
• To compute a correlation you need two scores, X and Y, for each
individual in the sample.
• The Pearson correlation requires that the scores be numerical values
from an interval or ratio scale of measurement.
• Other correlational methods exist for other scales of measurement
(e.g. Spearman’s).

76
The Pearson Correlation
• The Pearson correlation measures the direction and degree of
linear (straight line) relationship between two variables.
• Pearson’s ‘r’ is the most common correlation coefficient
• Degree of Correlation is expressed by a value of the Coefficient
• r is usually between -1 ≤ r ≥ +1

77
• To compute the Pearson correlation r (Method 1), you first measure
the variability of X and Y scores separately by computing SS for the
scores of each variable (SSX and SSY).
• Then, the co-variability (tendency for X and Y to vary together) is
measured by the sum of products (SP).
• The Pearson correlation r is found by computing the ratio,
SP/(SSX)(SSY) .
Sample Table.
x y x–x y–y (x – x)2 (y – y)2 (x – x) (y – y)
Deviations of x Deviations of y Product of x and y

Σ (SSX) Σ (SSY) Σ(SP)

r = SP/(SSX)(SSY)

Where x = mean of x and y = mean of y

Sample Table.
x y x–x y–y (x – x)2 (y – y)2 (x – x) (y – y)
Age Glu Deviations of x Deviations of y Product of x and y
level (x = 41.2) (y = 81)

43 99
21 65
25 79
42 75
57 87
59 81