STAT 101 Module Handout 5.1
STAT 101 Module Handout 5.1
From the previous module, you have learned how to deal with inference on a single population. Now, you are
well-equipped on estimation and hypothesis testing on one population mean and proportion, as well as one
population with several proportions and one population variance. But what if our study deals with two or more
populations?
In this module, you will be learning about dealing with inference on two or more populations. Studies are usually
conducted to compare two or several populations. For example, researches are done to compare the average
length of time spent on social media sites of men and women, to determine if difference exists between the
mean pre-test and post-test scores of students in a bridging program, or to compare the cigarette consumption
among heavy smokers after a hypnotherapy program, with 3 time-points such as before, a month after, and six
months after the program. Estimation and hypothesis testing on parameters such as the population mean and
population proportion may also be done for such cases.
To further understand and appreciate the concepts about the inference on two or more populations, some real-
life applications can be found in this Youtube video.
LEARNING OBJECTIVES
At the end of this module, you must be able to:
1. differentiate related from independent samples;
2. use the most appropriate statistical method(s) for comparing two population means,
3. interpret the output of a statistical software for comparing two population means, and
4. recommend appropriate actions based on statistical results.
In analyzing several populations, you need to first correctly distinguish between the two types of samples,
namely, related (or paired) and independent samples.
DEFINITION
Related or paired samples are obtained by matching two similar units with respect to some important
characteristic or by self-pairing in which two measurements are taken from the same unit. Independent
samples, on the other hand, are obtained when two unrelated sets of units are measured for a variable.
How the set of samples was selected from one population has no relationship with how the other set of
samples was selected from the other population.
For additional reference on the definition of related and independent samples, you may check this
supplementary reading material.
Let us consider the following scenarios to better illustrate the use of paired and independent samples.
1. A professor wants to know if there are differences between the scores in midterm and pre-final exams
of her students.
a. She randomly selected 25 students from whom she obtained their midterm scores and another 25
students for their pre-final scores.
b. She randomly selected 25 students from whom she obtained both the midterm and pre-final scores.
In scenario (a), the professor used independent samples since the midterm scores from
the 25 selected students have no relationship with the pre-final scores from the other 25
selected students. On the other hand, in scenario (b), the professor employed the use of
Icon made by related samples. This is because both midterm and pre-final scores are taken from each of
surang from
www.flaticon.com the 25 selected students.
2. Thirty individuals were randomly selected to rate Milk Teh, a weight gain milk, in terms of its taste.
Afterward, they also rated the taste of the newly introduced competitor, Milk Koya. The ratings for the
two brands of weight gain milk brands were then compared.
Each individual in the study rated both milk brands so the collected data can be organized
by pairs wherein every individual is associated with two ratings from each milk brand. Hence,
this scenario illustrates the use of related samples.
3. A badminton player wants to test if there is a difference in speed between two brands of shuttlecocks.
Twenty shuttlecocks from each brand were randomly selected from a production batch and each
sampled shuttlecock was subjected to a speed test.
The speed was measured on the twenty randomly selected shuttlecocks from each brand.
In this case, all the measurements are independent. Thus, this certain problem employed the
use of independent samples.
To further evaluate if you have learned how to correctly identify the type of sample used for every scenario,
here are some more illustrations. For each item, identify if the type of sample is related or independent.
1. A study was conducted to examine the effect of thermal pollution on the growth of clams. A random
sample of six clams was collected at the intake site without thermal pollution while a random sample
of four clams was taken from the discharge site with thermal pollution. The length of each clam (in cm)
was measured.
2. An experiment aims to determine which of two concentrations of a certain chemical enhances the
growth of a certain type of plant better. The growth of 10 randomly selected plants in concentration A
and 10 randomly selected plants in concentration B were recorded.
3. Studies show that Filipinos are relatively friendlier compared to other races. To verify this, a Japanese
national randomly selected 50 Filipinos and another 50 non-Filipinos and asked them whether or not
they will talk to a stranger while on public transportation.
4. To verify if first-born children tend to be more independent than those who are not, five first-born
children with their second-born siblings were randomly chosen. All sampled children answered a test
that determines if a person can be considered independent. Their test scores were then recorded.
5. High concentration of trace metals in drinking water affects the flavor and poses a health hazard to
drinkers. A health officer wanted to compare the zinc concentration found in bottom water and surface
water for 20 randomly selected brands of bottled water.
6. It is of interest to determine the effect of a very difficult exam on a student’s belief in the existence of
a divine being. To do this, 240 randomly selected students were initially asked if they believe in the
existence of the divine being or not. Then, they were given a very difficult exam and asked again their
belief after taking the said exam.
7. A manager randomly selected 20 of his employees to investigate if losing one night’s sleep affects
employees’ work performance. Each sampled employee was given a problem-solving task at noon for
2 days, in which the employees had a night of good sleep for the first day, while they had no sleep at
all for the other day, then their scores were recorded.
8. An ear specialist wants to examine if the right and left ears of children have a different mean threshold
of pain to noise. He randomly selected 20 children and recorded their maximum level of tolerance to
noise for both right and left ears.
9. A researcher is interested to determine if two instruments (A and B) differ in their measurement of the
diameter of the ball bearing. He randomly selects 10 ball bearings and measures their diameter using
instrument A. He again randomly selects another 10 ball bearings and measures their diameter using
instrument B.
10. It is of interest to compare the efficacy of a new feed formulation (NF) with the standard (SF) in
increasing the milk yield (kg) of cows. Dr. Beh Ca measured the milk yield from five pairs of cows,
each pair of the same parent and the same health status. For each pair, he assigned one cow to NF
and the other cow to SF.
There are several researches which involve comparing two populations to determine if there are significant
differences between the two populations. To answer such objectives, estimation and hypothesis testing of two
population means are commonly done.
x1n1 x2n2
Sample Size n1 n2
Sample Mean x1 x2
Sample Variance s12 s22
where
xij represents the observed value of the random variable 𝑋 taken from the jth unit of the ith population,
where i = 1, 2; j = 1, 2, …, ni;
j =1
si2 is the variance of the random sample obtained from the ith population, si =
2
.
ni − 1
Estimation
A point estimator for the difference between two population means (µD) is given by ˆD = x1 − x2 .
12 22
Furthermore, the sampling distribution of ̂D is normal with parameters mean µD and variance + .
n1 n2
Hypothesis Testing
In hypothesis testing, note that the null hypothesis is Ho: µD = D0, where D0 is the hypothesized population mean
difference, which could take values from −∞ to +∞ but is commonly set to 0. To test the significance of µD, we
can use different parametric tests depending on our knowledge about the population variances.
REMARK
In formulating the null and alternative hypotheses for one-tailed tests, note that the inequality sign may differ
according to how the parameter is defined. For example, suppose the parameter is defined as µD = µ1 − µ2
and you want to test the hypothesis µD > 0. The same parameter can be defined as µD = µ2 − µ1 but the
corresponding hypothesis will now be µD < 0. Both are correct as long as it is consistent with the defined
populations. Whichever the case is, results of the hypothesis test will still be the same.
.
However, before using these parametric tests, we need to satisfy first the following assumptions:
1. The variable of interest should be measured on at least interval scale.
2. Independent samples must be taken using simple random sampling.
3. In addition to the independence of samples, each of the two populations should be normally
distributed.
The table below provides a summary of the different interval estimators, test procedures, and test statistics to
be applied depending on the assumed condition of the population variances.
s12 s22
ˆD t +
1 1 2( ) n1
df n2
ˆD t ( n + n −2)sp +
2 1 2 n1 n2
A (1−α) × 100% 12 22 where
ˆD +
( ) ( )
confidence Z where
2 n1 n2 s2 n + s2 n
interval estimator df =
1 1 2 2
( n1 − 1) s12 + ( n2 − 1) s22
sp =
( ) ( )
2 2
n1 + n2 − 2 s12 n1 s22 n2
+
n1 n2
Hypothesis testing
Non-pooled t-test
Test procedure Z-test Pooled t-test
(Welch’s test)
From the given table, note that when the population variances are known (Case 1), Z-test is the
most appropriate test to be used. On the other hand, the t-test is more appropriate when the
population variances are unknown. Under this scenario, two cases may still arise; we may have
equal (Case 2) or unequal (Case 3) variances between the two populations. Thus, a test on
equality of variances must be performed first to know which between pooled or non-pooled t-
test is more appropriate to be used.
Icon made by mynamepong from www.flaticon.com
TAKE NOTE!
▪ The Z-test, pooled t-test, and the non-pooled t-test require independent and simple random samples,
and normal populations.
▪ If the population variances are different, the pooled t-test can result in a significantly larger Type I error.
▪ The non-pooled t-test applies whether or not the population variances are equal.
▪ However, pooled t-test is slightly more powerful, on the average, if the population variances are equal.
Indigenous people (IP) or native people are ethnic groups who are the original settlers of
a place. They maintain the traditions, ways, language, religion, dress, or other aspects of
an early culture even in modern times. Originally, they occupy vast tracts of lands that are
usually rich in natural resources. As such, they play the role of protector of nature and
preserver of heritage.
Icon made by Eucalyp from www.flaticon.com
However, IPs are threatened with extinction due to poverty, encroachment by the outside world,
colonization, modernization, land grabbing, profit-oriented businesses, and corrupt governments, leading
to loss of cultural identity and ancestral lands. The Hinereben Foundation, a donor-based organization, is
aimed at promoting the welfare of IP in the country. Its activities, however, are limited, relying heavily on
donors for funds. In Alakan Valley, one of its project sites, IP farmers from Sitio Pawagaon and Sitio
Magsaysay were randomly selected and interviewed as part of a profiling procedure by the foundation.
Because of a limited budget, the executive director of Hinereben Foundation plans to concentrate its effort
on Sitio Magsaysay as he believes that the IP farmers in the said site are more financially in need. He
believes that the IP farmers in Sitio Pawagaon, being located near the Poblacion, have higher monthly
household incomes compared to those in Sitio Magsaysay. Is there enough evidence to support the
executive director’s claim? Would you advise him to pursue his plan?
To analyze this problem, let us first define the populations. Let population 1 be the set of IP farmers from
Sitio Pawagaon and population 2 be the set of IP farmers from Sitio Magsaysay. Since the IP farmers were
taken independently from each Sitio, we know that we are dealing with independent samples.
R COMMANDER OUTPUT #1 The variable of interest for each population is the monthly
household income which is measured in the ratio scale.
Sitio = Pawagaon Based on the Shapiro-Wilk normality test (see R
Shapiro-Wilk normality test Commander Output #1), the assumption of normality is
W = 0.96596, p-value = 0.3775 satisfied for each of the two populations (p-values are
-------- relatively large).
Sitio = Magsaysay
Shapiro-Wilk normality test Parameter of interest: µD = µP−µM; the difference between
W = 0.97861, p-value = 0.8690 the mean monthly household incomes of IP farmers from
Sitio Pawagaon and Sitio Magsaysay.
To obtain a point estimate of µD, we simply get the difference between the mean monthly household
incomes of IP farmers from Sitio Pawagaon and Sitio Magsaysay (see R Commander Output #2).
R COMMANDER OUTPUT #2
Thus,
ˆD = xP − xM
95 percent confidence interval:
1188.810 3165.008 = 4862.242 − 2685.333
sample estimates: = 2176.909
mean in group Pawagaon mean in group Magsaysay
4862.242 2685.333
Aside from a point estimate, a confidence interval estimate about µD may also be constructed. Based on
the results (see R Commander Output #2), we are 95% confident that the difference between the mean
monthly household incomes of IP farmers from Sitio Pawagaon and Sitio Magsaysay lies between
1,188.810 and 3,165.008 pesos.
To test if there is enough evidence to support the director’s claim that the IP farmers from Sitio Magsaysay
are more financially in need than those from Sitio Pawagaon, the appropriate hypotheses are:
Ho: µD (µP−µM) = 0; There is no difference in the mean monthly household incomes between IP farmers
from Sitio Pawagaon and Sitio Magsaysay.
Ha: µD (µP−µM) > 0; The mean monthly household income of IP farmers from Sitio Pawagaon is greater
than that of IP farmers from Sitio Magsaysay.
R COMMANDER OUTPUT #3
We already know that the assumption of normality for each population was satisfied. Since we do not have
information on the population variances, we need to test if the variances are equal. Based on the results of
the Bartlett test of homogeneity of variances (see R Commander Output #3), the population variances are
not equal (p-value is too small). Therefore, the non-pooled t-test or Welch’s test should be used.
R COMMANDER OUTPUT #4
(Consider R Commander Output #4) Since the p-value is very small, we reject the null hypothesis. Hence,
there is sufficient evidence to say that the mean monthly household income of IP farmers from Sitio
Pawagaon is greater than that of IP farmers from Sitio Magsaysay. Results show that there is sufficient
evidence to support the director’s claim and it is advised to pursue his plan.
What if the assumption(s) is(are) not satisfied for the original and transformed data?
Mann-Whitney test (Wilcoxon rank-sum test), a non-parametric test, is also applicable to two independent
samples when it is of interest to know if the two groups have been drawn from the same population. Instead of
using the actual values of the observations, it utilizes the ranks of the data, so it is functional for variables at
least ordinal scale. Among the non-parametric alternative of the parametric t-test, this non-parametric test is the
most practical and useful.
For the results to be valid, the assumptions for this non-parametric test are the following:
1. The variable of interest is at least ordinal in scale.
2. Independent samples must be taken using simple random sampling from two populations.
3. The distributions of the populations must have the same shape.
The Mann-Whitney test can be used to perform a hypothesis test for both population median and population
mean (unless the variable of interest is in ordinal scale, then only median can be used).
Studies have shown that listening to music alters a person’s dopamine level, a
neurotransmitter associated with pleasure and learning. Since different types of music may
have different effects on the concentration, memory, and learning states of an individual, an
experiment was conducted using different songs from jazz and pop music.
Icons made by ultimatearm from www.flaticon.com
Twenty randomly selected STAT 101 students were asked to solve a maze while listening to music. Ten
students were assigned to listen to jazz music and the other 10 students to pop music. The time it took (in
seconds) for each student to solve the maze was recorded. Determine if there is a significant difference in
the completion time of students who listened to jazz and pop music, on the average.
The given scenario still exhibits the use of independent samples since students who were assigned to
listen to jazz and pop music were taken independently. Let population 1 be the set of STAT 101 students
assigned to listen to jazz music and population 2 be the set of STAT 101 students assigned to listen to pop
music while solving a maze.
R COMMANDER OUTPUT #5
Based on the Shapiro-Wilk normality test (see R
Music = Jazz Commander Output #5), both populations did not
Shapiro-Wilk normality test satisfy the normality assumption (say α = 0.10).
W = 0.82269, p-value = 0.02077 Thus, instead of using the mean difference as the
-------- parameter of interest, we use the median
Music = Pop difference.
Shapiro-Wilk normality test
W = 0.80915, p-value = 0.01872
Parameter of interest: MdD = Md1−Md2; the difference between the median completion time (in seconds)
of STAT 101 students who listened to jazz and pop music
To test if the students who listened to jazz music finished the maze faster than those who are assigned to
pop music, the appropriate hypotheses are:
Ho: MdD = 0; There is no difference in the median completion time of students who listened to jazz and pop
music.
Ha: MdD ≠ 0; There is a difference in the median completion time of students who listened to jazz and pop
music.
R COMMANDER OUTPUT #6
TAKE NOTE!
Under the condition of normality, the pooled t-test is more powerful. Alternatively, under the non-normality
condition but with same shape distribution, Mann-Whitney test is more powerful.
Suppose a simple random sample of size n related or paired samples are drawn, then a characteristic of interest
X is measured for each member of a pair. The data obtained may be presented in the table below.
Pair Number (i) X1 X2 di
1 x11 x21 d1 = x11- x21
2 x12 x22 d2 = x12- x22
⋮ ⋮ ⋮ ⋮
n x1n x2n dn= x1n- x2n
where
x1i is the value of the random variable X observed from the first member of the ith pair;
x2i is the value of the random variable X observed from the second member of the ith pair;
di is the difference for the ith pair, di = x1i − x2i, i = 1, 2, …, n; and
n is the number of pairs of observations.
A similar data representation can be used when the observations are taken using the self-pairing method where
X1 represents the measurement at time 1 while X2 represents the measurement at time 2 taken on the ith unit.
Point Estimation
We now consider the difference (di) as the random variable of interest. A point estimator of µD is d , which is
computed as the mean of the observed sample differences given by
n
di
i =1
d = .
n
Interval Estimation
sd i =1
d t where sd = , the standard deviation of the observed sample differences.
2(
n −1)
n n
Hypothesis Testing
Similar to the previous case, the null hypothesis for determining if there is a difference between two populations
using related samples is Ho: µD = D0, where D0 is the hypothesized population mean difference, which could
take values from −∞ to +∞ but is commonly set to 0. To test the significance of µD, the test procedure to be
performed is the paired t-test.
For the results to be valid, these are the assumptions that need to be satisfied first:
1. The variable of interest should be measured on at least interval scale.
2. Paired samples must be taken using simple random sampling.
3. The paired-differences (di’s) must be normally distributed or the sample size is large enough for the
Central Limit Theorem to apply.
which follows the Student’s t distribution with (n−1) degrees of freedom. For a large sample (n ≥ 25), the test
statistic tc is approximately distributed as standard normal, N (0,1). This implies that the Z-table can be used to
obtain the tabular values should you decide to make a decision by comparing the test statistic and its
corresponding tabular value.
Romantic couples with a large age gap often raise eyebrows. Studies found partners with more
than a ten-year age gap experience social disapproval. It is of interest to find if the mean age
of husbands differs from the mean age of their wives. Ten Filipino married couples are selected
at random and their ages (in years) were obtained.
Icon made by Freepik from www.flaticon.com
The study involved married couples wherein each husband and wife were asked about their age. This
illustrates the use of related (or paired) samples. Thus, let di = xhusband − xwife be the difference between the
ages of the ith couple where xhusband is the age of the husband and xwife is the age of the wife.
R COMMANDER OUTPUT #7 The variable of interest for each population is age, which is
measured on a ratio scale. Based on the results of the
Shapiro-Wilk normality test Shapiro-Wilk normality test (see R Commander Output #7),
W = 0.95629, p-value = 0.7429 data on the difference of the ages of the couples are normally
distributed (p-value is too large).
Let’s consider the parameter of interest: µD; the mean difference of the ages between the couple.
R COMMANDER OUTPUT #8
Based on the results (see R Commander Output #8), the
95 percent confidence interval: mean difference of the ages between the couple is 3.6 years,
0.04394139 7.15605861 on average. Moreover, a confidence interval estimate can
sample estimates: also be constructed. At a 95% confidence level, the mean
mean of the differences difference of the ages between the couple lies from 0.04 to
3.6 7.16 years.
To determine if the mean age of husbands differ from the mean age of their wives, the appropriate
hypotheses are:
Ho: µD = 0; The mean difference of the ages between the couple is equal to zero.
Ha: µD ≠ 0; The mean difference of the ages between the couple is not equal to zero.
Given that the normality assumption is satisfied, we can use the paired t-test, a parametric test, to verify
the given claim (Consider R Commander Output #9).
R COMMANDER OUTPUT #9
We reject the null hypothesis at a given
Paired t-test α = 0.05. Therefore, we have sufficient
data: Husband and Wife evidence to say that the mean difference in
t = 2.2901, df = 9, p-value = 0.04777 the ages between couples is not equal to zero.
alternative hypothesis: true difference Thus, the data provide evidence that the mean
in means is not equal to 0 age of the husband differs from the mean age
of his wife.
What if the assumption(s) is(are) not satisfied for the original and transformed data?
In the case in which one of the assumptions of the paired t-test has not been satisfied, the paired Wilcoxon
signed-ranks test should be used. This test considers the magnitude as well as the direction of the differences.
Since the mean and median of a symmetric distribution are equal, the paired Wilcoxon signed-ranks test can
be used to perform a hypothesis test for both median difference and mean difference (unless the variable of
interest is in ordinal scale, then only median can be used).
A carefully designed pre- and post-test can be used as a diagnostic and developmental tool
to assess students’ learning preparedness and improve instructors’ teaching strategy. To
evaluate the knowledge of first-year students on basic statistics concepts, Prof. Remia
recorded the scores obtained from the 20 multiple choice questions by 20 randomly selected
first-year students from the exams administered before and after a series of review sessions.
Icon made by Eucalyp from www.flaticon.com
Prof. Remia wants to measure the amount of learning the students have acquired in their three-week review
sessions. She believes that her students have made great progress since they have been more
participatory in her lecture.
Since both the pre- and post-test scores were recorded for each student, this problem illustrates the use of
related samples. Thus, let di = xafter − xbefore be the difference of the pre- and post-test scores of the ith
student where xafter is the post-test score and xbefore is the pre-test score.
R COMMANDER OUTPUT #10 Based on the results of the Shapiro-Wilk normality test
(see R Commander Output #10, the assumption of
Shapiro-Wilk normality test normality is not satisfied (at a given α = 0.05). Hence, we
W = 0.8866, p-value = 0.0233 can use the median difference as the parameter of interest.
Consider the parameter of interest: MdD, the median difference between the pre- and post-test scores.
To find if Prof. Remia’s students have made great progress, the appropriate hypotheses are:
Ho: MdD = 0; The median difference between the pre- and post-test
scores is equal to zero.
Ha: MdD (Mdafter – Mdbefore) > 0;
The median difference between the pre- and post-test scores is
greater than zero.
TAKE NOTE!
Both paired t-test with paired Wilcoxon signed-ranks test require simple and random samples. However, for
a normally distributed paired-difference variable, the paired t-test is more powerful.
REFERENCE