0% found this document useful (0 votes)
236 views

Chapter 3 Test of Difference Between Means

The document discusses tests of difference between means, including the z-test for difference between means, t-test for independent and correlated samples, one-way ANOVA, multiple comparisons tests, and two-way ANOVA. The learning outcomes are to discuss the conditions of each test, assumptions of ANOVA, and how to perform and interpret each test.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views

Chapter 3 Test of Difference Between Means

The document discusses tests of difference between means, including the z-test for difference between means, t-test for independent and correlated samples, one-way ANOVA, multiple comparisons tests, and two-way ANOVA. The learning outcomes are to discuss the conditions of each test, assumptions of ANOVA, and how to perform and interpret each test.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

CHAPTER III:

TEST OF DIFFERENCE BETWEEN


MEANS
Topic Outline:
1. Z-test for Difference Between Means
2. t-test for Independent or Uncorrelated Samples
3. t-test for Correlated Samples
4. One-Way Analysis of Variance (One-Way ANOVA)
5. Multiple Comparisons Tests
6. Two-Way Analysis of Variance (Two-Way ANOVA)

Learning Outcomes:
At the end of the unit, the students must have:
1. discussed the conditions imposed by the different tests of difference between
means;
2. discussed the assumptions in ANOVA; and
3. performed each test and discussed the results.

Prepared by:
Prof. Jeanne Valerie Agbayani-Agpaoa
Statistical Methods I
Dr. Virgilio Julius P. Manzano, Jr.
Engr. Lawrence John C. Tagata
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

TOPIC 1: Z-TEST FOR DIFFERENCE BETWEEN MEANS

THE Z-TEST: TEST ON MEANS AND PROPORTIONS OF A NORMAL DISTRIBUTION WITH


KNOWN VARIANCE
The z test is a statistical test procedure that uses a sample statistic having a normal distribution. Statistical
testing is a way of making statistical inferences about unknown population parameters. The z test is usually used
to test hypotheses about
a. the mean of a population based on a single sample,
b. the proportion of successes in a population based on a single sample,
c. the difference between the means of two populations based on samples from each population, or
d. the difference between the proportions of successes in two populations based on samples from each
population.

Z-test is used when the population standard deviation is known. However, if we do not know the
population standard deviation, a z-test is still applicable provided that the sample is sufficiently large. Sufficiently
large means sample size is at least 30 (n=30) if the distribution of the variable is normal, and at least 50 (n=50)
for any distribution.

Critical Values of z
Foe convenience, the different critical values of z for different commonly used level of significance are
summarized in the table below. But it must be noted that these values were referred from Table A.3: Areas under
the Normal Curve.
Level of Confidence Type of Test
Significance,  Level 2-tailed 1-tailed, left 1-tailed, right
0.10 0.90 1.645 −1.282 +1.282
0.05 0.95 1.96 −1.645 +1.645
0.01 0.99 2.575 −2.326 +2.326
0.001 0.999 3.29 −3.08 +3.08

Rejection and Non-Rejection Regions


Depending on the type of statistical test to be used (based on how the alternative hypothesis Ha is
expressed), the following illustrations may be used as a guide to whether the null hypothesis (Ho) is be rejected
or not.
2-tailed test

1-tailed test, left 1-tailed test, right


STAT 201: Statistical Methods I

38
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

ONE-SAMPLE Z-TEST: TEST ON THE MEAN OF A NORMAL DISTRIBUTION, VARIANCE


KNOWN
The one-sample z-test is used to test whether the mean of a population is greater than, less than, or not
equal to a specific value. Because the standard normal distribution is used to calculate critical values for the test,
this test is often called the one-sample z-test. The z-test assumes that the population standard deviation or variance
is known.

Test Null Hypothesis: Ho: μ = μo


Procedure Test Statistic: ̅−𝝁
𝑿
𝒁= 𝝈
for a Single
Mean √𝒏
(Variance
Known) where:
𝑋̅ = sample mean
𝜇 = population mean
𝜎 = population standard deviation
n = number of cases or observation/sample size

Alternative Hypothesis, P-Value Rejection Criterion for


Ha Fixed-Level Tests
μ ≠ μo Probability above |z0| and 𝑧0 > 𝑧𝛼 or 𝑧0 < −𝑧𝛼
2
probability below –|z0|, 2

𝑃 = 2[1 − Φ(|𝑧0 |)]


μ > μo Probability above |z0|, 𝑧0 > 𝑧𝛼
𝑃 = 1 − Φ(|𝑧0 |)
μ < μo Probability below |z0|, 𝑧0 < −𝑧𝛼
𝑃 = Φ(|𝑧0 |)

The P-values and critical regions for these situations are shown in the figure above.

Example 1: A random sample of 100 recorded deaths in the United States during the past year showed an
average life span of 71.8 years. Assuming a population standard deviation of 8.9 years, does
this seem to indicate that the mean life span today is greater than 70 years? Use a 0.05 level
of significance.
Solutions:
Given: 𝜇 = 70 years
n = 100 recorded deaths
𝑥̅ = 71.8 years
𝜎 = 8.9 years
𝛼 = 0.05

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: The mean life span today is 70 years. 𝜇 = 70
Ha: The mean life span today is greater than 70 years. 𝜇 > 70

Step 2: Set the level of significance.


𝛼 = 0.05
STAT 201: Statistical Methods I

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
Since Ha is expressed in terms of “greater than” expression, it would be appropriate to use
a 1–tail, right test.

Step 4: Determine the tabular value for the test from the table above.
zcrit = +1.645.

Step 5: Compute for the required statistical test.


𝑋̅ − 𝜇 71.8 − 70
𝑍= 𝜎 = 8.9
√𝑛 √100

39
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

𝒁𝒄𝒐𝒎𝒑 = 𝟐. 𝟎𝟐

Step 6: Decide.

Rejection region

zcomp = +2.02 zcrit = +1.645

Decision: Since the computed value of z is greater than the critical value Z comp (2.02) > Zcrit
(1.645), Ho is rejected.

The P-value corresponding to z = 2.02 is given by the area of the shaded region in the figure
below.
Using Table A.3, we have 𝑃 = 𝑃(𝑍 > 2.02) = 0.0217. As a result, the evidence in favor of Ha
is even stronger than that suggested by a 0.05 level of significance.

Step 7: Conclusion.
Since Ho was rejected, conclude:
“The mean life span today is greater than 70 years.”

Example 2: A manufacturer of sports equipment has developed a new synthetic fishing line that the
company claims has a mean breaking strength of 8 kilograms with a standard deviation of
0.5 kilogram. Test the hypothesis that μ = 8 kilograms against the alternative that μ ≠ 8
kilograms if a random sample of 50 lines is tested and found to have a mean breaking strength
of 7.8 kilograms. Use a 0.01 level of significance.
Solutions:
Given: 𝜇 = 8 kilograms
n = 50 lines
𝑥̅ = 7.8 kilograms
𝜎 = 0.5 kilograms
𝛼 = 0.01

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: The mean breaking strength of the newly-developed synthetic fishing line is 8
kilograms. 𝜇 = 8 𝑘𝑔
Ha: The mean breaking strength of the newly-developed synthetic fishing line is not
equal to 8 kilograms. 𝜇 ≠ 8 𝑘𝑔

Step 2: Set the level of significance.


𝛼 = 0.01

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
Since Ha is expressed in terms of a non-directional expression (not equal to), it would be
STAT 201: Statistical Methods I

appropriate to use a 2–tailed test.

Step 4: Determine the tabular value for the test from the table.
zcrit = ±2.575.

Step 5: Compute for the required statistical test.


̅−𝝁
𝒙 𝟕. 𝟖 − 𝟖
𝒛= 𝝈 = 𝟎. 𝟓
√(𝒏) √𝟓𝟎
zcomp = –2.83

40
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 6: Decide.

Rejection region

tcomp = –2.83 tcrit = –2.575

Decision: Since the computed value of z is greater than the critical value Zcomp (–2.83) > Zcrit
(–2.575), Ho is rejected.

Step 7: Conclusion.
Since Ho was rejected, there is no evidence to support the perfume company’s claim that the
mean breaking strength of their newly-developed synthetic fishing line is not equal to 8
kilograms.
Conclude Ha: “The average breaking strength is not equal to 8 kilograms.”

ONE-SAMPLE PROPORTION TEST: Z-TEST ON A POPULATION PROPORTION


The One-Sample Proportion Test is used to assess whether a population proportion (P1) is significantly
different from a hypothesized value (P0). This is called the hypothesis of inequality. The hypotheses may be stated
in terms of the proportions, their difference, their ratio, or their odds ratio, but all four hypotheses result in the
same test statistics.

For example, suppose that the current treatment for a disease cures 62% of all cases. A new treatment
method has been proposed and studied. In a sample of 80 subjects with the disease that were treated with the new
method, 63 were cured. Do the results of this study support the claim that the new method has a higher response
rate than the existing method?

This procedure calculates sample size and statistical power for testing a single proportion using either
the exact test or other approximate z-tests. Exact test results are based on calculations using the binomial (and
hypergeometric) distributions. Because the analysis of several different test statistics is available, their statistical
power may be compared to find the most appropriate test for a given situation.

This procedure has the capability for computing power using both the normal approximation and
binomial enumeration for all tests. Some sample size programs use only the normal approximation to the binomial
distribution for power and sample size estimates. The normal approximation is accurate for large sample sizes and
for proportions between 0.2 and 0.8, roughly. When the sample sizes are small or the proportions are extreme (i.e.
less than 0.2 or greater than 0.8) the binomial calculations are much more accurate.

Test Null Hypothesis: Ho: p = po


Procedure Test Statistic: 𝒙 − 𝒏𝒑𝟎 ̂ − 𝒑𝟎
𝒑
𝒛= =
On a √𝒏𝒑𝟎 𝒒𝟎 𝒑 𝒒
√ 𝟎𝒏 𝟎
Binomial
STAT 201: Statistical Methods I

Proportion where:
p is the parameter of the binomial distribution.

41
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Alternative Hypothesis, P-Value Rejection Criterion for


Ha Fixed-Level Tests
𝑝 ≠ 𝑝𝑜 Probability above |z0| and 𝑧0 > 𝑧𝛼 or 𝑧0 < −𝑧𝛼
2
probability below –|z0|, 2

𝑃 = 2[1 − Φ(|𝑧0 |)]


𝑝 > 𝑝𝑜 Probability above |z0|, 𝑧0 > 𝑧𝛼
𝑃 = 1 − Φ(|𝑧0 |)
𝑝 < 𝑝𝑜 Probability below |z0|, 𝑧0 < −𝑧𝛼
𝑃 = Φ(|𝑧0 |)

Example 1: A builder claims that heat pumps are installed in 70% of all homes being constructed today
in the city of Richmond, Virginia. Would you agree with this claim if a random survey of
new homes in this city showed that 8 out of 15 had heat pumps installed? Use a 0.10 level of
significance.
Solutions:
Given: 8
p=
15
p0 = 0.70
n = 15
𝛼 = 0.10

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: 70% of all homes being constructed today in the city of Richmond, Virginia are
installed with heat pumps. (𝑝 = 0.70)
Ha: The proportion of homes being constructed today in the city of Richmond,
Virginia installed with heat pumps is not equal to 70%. (𝑝 ≠ 0.70)

Step 2: Set the level of significance.


𝛼 = 0.10

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
Since Ha is expressed in terms of non-directional expression “not equal to”, it would be
appropriate to use a 2–tailed test.

Step 4: Determine the tabular value for the test from the table above.
zcrit = 1.645.
Step 5: Compute for the required statistical test.
8
𝑝̂ − 𝑝0 − 0.70
𝑧= = 15
𝑝 𝑞
√ 0 0 √(0.70)(1 − 0.70)
𝑛 15
𝒁𝒄𝒐𝒎𝒑 = −𝟏. 𝟒𝟏

Step 6: Decide.
STAT 201: Statistical Methods I

Rejection region Rejection region

zcomp = –1.41
zcrit = –1.645 zcrit = +1.645

Decision: Since the computed value of z is less than (figuratively, it lies within the non-
rejection region) the critical value Zcomp (−1.41) > Zcrit (1.645), fail to reject Ho.

Step 7: Conclusion.
Since Ho was not rejected, conclude that there is insufficient reason to doubt the builder’s

42
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

claim.
“70% of all homes being constructed today in the city of Richmond, Virginia are installed
with heat pumps.”

Example 2: A commonly prescribed drug for relieving nervous tension is believed to be only 60%
effective. Experimental results with a new drug administered to a random sample of 100
adults who were suffering from nervous tension show that 70 received relief. Is this sufficient
evidence to conclude that the new drug is superior to the one commonly prescribed? Use a
0.05 level of significance.
Solutions:
Given: 70
p=
100
p0 = 0.60
n = 100
𝛼 = 0.05

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: There is no significant difference between the effectivity of the new and commonly
prescribed drugs. 𝑝 = 0.60
Ha: The new drug is superior to the one commonly prescribed. 𝑝 > 0.60

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
Since Ha is expressed in terms of a directional expression (greater than), it would be
appropriate to use a 1–tailed, right test.

Step 4: Determine the tabular value for the test from the table.
zcrit = +1.645.

Step 5: Compute for the required statistical test.


70
𝑝̂ − 𝑝0 − 0.60
𝑧= = 100
𝑝 𝑞
√ 0𝑛 0 √(0.60)(1 − 0.60)
100
𝒁𝒄𝒐𝒎𝒑 = 𝟐. 𝟎𝟒

Step 6: Decide.

Rejection region

zcrit = +1.645
STAT 201: Statistical Methods I

zcomp = 2.04

Decision: Since the computed value of z is greater than the critical value Zcomp (+2.04) > Zcrit
+1.645), Ho is rejected.

Step 7: Conclusion.
Since Ho was rejected, conclude that the new drug is superior.

43
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

TWO-SAMPLE Z-TEST: INFERENCE ON THE DIFFERENCE IN MEANS OF TWO NORMAL


DISTRIBUTIONS WITH KNOWN VARIANCES
A z-test for two means is a hypothesis test that attempts to make a claim about the population means (μ1
and μ2). More specifically, we are interested in assessing whether or not it is reasonable to claim that the two
population means the population means μ1 and μ2 are equal, based on the information provided by the samples.

Test Null Hypothesis: Ho: (𝜇1 − 𝜇2 ) = ∆0


Procedure on Test Statistic: ̅𝟏 − 𝑿
(𝑿 ̅ 𝟐 ) − (∆𝟎 )
𝒛=
the 𝟐 𝟐
√𝝈 𝟏 + 𝝈 𝟐
Difference in 𝒏 𝟏 𝒏 𝟐
Means,
Variance where:
Known 𝑋̅1 and 𝑋̅2 are sample means
𝜇1 and 𝜇2 are population means
∆𝟎 is the difference in means (μ1 − μ2 )
𝜎1 and 𝜎2 are population standard deviations
𝑛1 and 𝑛2 are the number of cases or observation/sample sizes in each
group

Alternative Hypothesis, P-Value Rejection Criterion for


Ha Fixed-Level Tests
(𝜇1 − 𝜇2) ≠ ∆0 Probability above |z0| and 𝑧0 > 𝑧𝛼 or 𝑧0 < −𝑧𝛼
2
probability below –|z0|, 2

𝑃 = 2[1 − Φ(|𝑧0 |)]


(𝜇1 − 𝜇2) > ∆0 Probability above |z0|, 𝑧0 > 𝑧𝛼
𝑃 = 1 − Φ(|𝑧0 |)
(𝜇1 − 𝜇2) < ∆0 Probability below |z0|, 𝑧0 < −𝑧𝛼
𝑃 = Φ(|𝑧0 |)

Example 1: A nutrition teacher wants to compare the food values of the nutrition and dietetics students
with those of the engineering students. She constructed a questionnaire composed of
composed of 15 items. The teacher administered the questionnaire to 75 engineering students
and obtained a mean of 3.98, while the 40 nutrition students had a mean of 4.12. If the
population standard deviation if 0.27, what conclusion can the nutrition teacher draw about
the food value of the students? Use 0.05 level of significance.
Solutions:
Given: Nutrition and
Engineering
Dietetics
Students
Students
𝑥̅ 3.98 4.12
𝜎 0.27 0.27
𝑛 75 students 40 students

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: There is no significant difference in the food values of the nutrition and dietetics
students with those of the engineering students. 𝜇𝐸 − 𝜇𝑁𝐷 = 0 or 𝜇𝐸 = 𝜇𝑁𝐷
Ha: There is a significant difference in the food values of the nutrition and dietetics
students with those of the engineering students. 𝜇𝐸 − 𝜇𝑁𝐷 ≠ 0 or 𝜇𝐸 ≠ 𝜇𝑁𝐷
STAT 201: Statistical Methods I

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
Since Ha is expressed in terms of a non-directional expression, it would be appropriate to
use a 2-tail test.

Step 4: Determine the tabular value for the test from the table above.
zcrit = ±1.96.

44
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 5: Compute for the required statistical test.


̅𝐸 − ̅
(X X 𝑁𝐷 ) − (μ𝐸 − μ𝑁𝐷 ) (3.98 − 4.12) − (0)
z= =
σ 2 σ 2 2 2
√ 𝐸 + 𝑁𝐷 √0.27 + 0.27
n𝐸 n𝑁𝐷 75 40
𝒁𝒄𝒐𝒎𝒑 = −𝟐. 𝟔𝟓

Step 6: Decide.

Rejection region Rejection region

zcomp = –2.65
zcrit = –1.96 zcrit = +1.96

Decision: Since the computed value of z is greater than the critical value Zcomp (–2.65) > Zcrit
(–1.96), reject Ho.

Step 7: Conclusion.
Since Ho was rejected, conclude “There is a significant difference in the food values of the
nutrition and dietetics students with those of the engineering students.

Example 2: A manufacturer claims that the average tensile strength of thread A exceeds the average
tensile strength of thread B by at least 12 kilograms. To test this claim, 50 pieces of each type
of thread were tested under similar conditions. Type A thread had an average tensile strength
of 86.7 kilograms with a standard deviation of 6.28 kilograms, while type B thread had an
average tensile strength of 77.8 kilograms with a standard deviation of 5.61 kilograms. Test
the manufacturer’s claim using a 0.05 level of significance.
Solutions:
Given: Thread A Thread B
𝑥̅ 86.7 kilograms 77.8 kilograms
𝜎 6.28 kilograms 5.61 kilograms
𝑛 50 pieces 50 pieces

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: The average tensile strength of thread A exceeds the average tensile strength of
thread B by at least 12 kilograms. (𝜇𝐴 − 𝜇𝐵 ) ≥ 12
Ha: The average tensile strength of thread A exceeds the average tensile strength of
thread B less than 12 kilograms. (𝜇𝐴 − 𝜇𝐵 ) < 12

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
Since Ha is expressed in terms of a directional expression (less than), it would be appropriate
STAT 201: Statistical Methods I

to use a 1–tailed, left test.

Step 4: Determine the tabular value for the test from the table.
zcrit = −1.645.

Step 5: Compute for the required statistical test.


The quantity of interest is the difference in the mean tensile strengths of the two types of
threads. Thus, we shall use the difference in their means (μ𝐴 − μ𝐵 ) established in the
manufacturer’s claim to be 12 kilograms.

45
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

̅A − ̅
(X X B ) − (μA − μB ) (86.7 − 77.8) − (12)
z= =
σ 2 σ 2 2 2
√ A + A √6.28 + 5.61
nA nA 50 50
𝒁𝒄𝒐𝒎𝒑 = −𝟐. 𝟔𝟎

Step 6: Decide.

Rejection region

zcrit = –1.645
zcomp = –2.60

Decision: Since the computed value of z is greater than (figuratively, the value falls in the
rejection region) the critical value Zcomp (–2.60) > Zcrit –1.645), Ho is rejected.

Step 7: Conclusion.
Since Ho was rejected, there is no sufficient data to support the manufacturer claims that the
average tensile strength of thread A exceeds the average tensile strength of thread B by at
least 12 kilograms.

Z-TEST FOR TWO PORPORTIONS: INFERENCE ON TWO POPULATION PROPORTIONS


This module discusses hypothesis tests of the difference, ratio, or odds ratio of two independent
proportions. The test statistics analyzed by this procedure assume that the difference between the two proportions
is zero or their ratio is one under the null hypothesis.

For example, suppose you want to compare two methods for treating cancer. Your experimental design
might be as follows. Select a sample of patients and randomly assign half to one method and half to the other.
After five years, determine the proportion surviving in each group and test whether the difference in the
proportions is significantly different from zero.

Suppose you have two populations from which dichotomous (binary) responses will be recorded. The
probability (or risk) of obtaining the event of interest in population 1 (the treatment group) is p1 and in population
2 (the control group) is p2. The corresponding failure proportions are given by q1 = 1 − p1 and q2 = 1 – p2.

Test Null Hypothesis: Ho: μ = μo


Procedure Test Statistic: ̂𝟏 − 𝑷
(𝑷 ̂ 𝟐 ) − 𝜹𝟎
𝒛=
for the 𝒑𝟏 𝒒𝟏 𝒑𝟐 𝒒𝟐
Difference √ 𝒏 + 𝒏
𝟏 𝟐
between Two
Proportions 𝑃̂1 and 𝑃̂2 are sample proportions
𝑝1 and 𝑝2 are population proportions
𝛿0 is the difference in the population proportions (𝑝1 − 𝑝2)
STAT 201: Statistical Methods I

𝑞1 = 1 − 𝑝1 and 𝑞2 = 1 − 𝑝2
𝑛1 and 𝑛2 are the number of cases or observation/sample sizes in each
group

46
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Alternative Hypothesis, P-Value Rejection Criterion for


Ha Fixed-Level Tests
𝑝1 − 𝑝2 ≠ 𝛿0 Probability above |z0| and 𝑧0 > 𝑧𝛼 or 𝑧0 < −𝑧𝛼
2
probability below –|z0|, 2

𝑃 = 2[1 − Φ(|𝑧0 |)]


𝑝1 − 𝑝2 > 𝛿0 Probability above |z0|, 𝑧0 > 𝑧𝛼
𝑃 = 1 − Φ(|𝑧0 |)
𝑝1 − 𝑝2 < 𝛿0 Probability below |z0|, 𝑧0 < −𝑧𝛼
𝑃 = Φ(|𝑧0 |)

Example 1: A vote is to be taken among the residents of a town and the surrounding county to determine
whether a proposed chemical plant should be constructed. The construction site is within the
town limits, and for this reason many voters in the county believe that the proposal will pass
because of the large proportion of town voters who favor the construction. To determine if
there is a significant difference in the proportions of town voters and county voters favoring
the proposal, a poll is taken. If 120 of 200 town voters favor the proposal and 240 of 500
county residents favor it, would you agree that the proportion of town voters favoring the
proposal is higher than the proportion of county voters? Use an α = 0.05 level of significance.
Solutions:
Given: Let T and C denote town voters and county voters, respectively.
120
p𝑇 =
200
240
p𝐶 =
500
𝑛 𝑇 = 200
𝑛𝐶 = 500
𝛼 = 0.05

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: There is no significant difference in the proportion of town voters favoring the
proposal and the proportion of county voters. (pT > pC )
Ha: The proportion of town voters favoring the proposal is higher than the proportion
of county voters. (pT > pC )

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
Since Ha is expressed in terms of a directional expression “greater than”, it would be
appropriate to use a 1–tail, right test.

Step 4: Determine the tabular value for the test from the table above.
zcrit = +1.645.

Step 5: Compute for the required statistical test.


It is of interest to determine the significant difference in the proportions of town voters and
county voters favoring the proposal.
120 240
̂
(P ̂
𝑇 − P𝐶 ) − (p 𝑇 − p𝐶 )
(200 − 500)
STAT 201: Statistical Methods I

z= =
p𝑇 q 𝑇 p𝐶 q 𝐶 120 80 240 260
√ n + n
𝑇 𝐶 √(200) (200) (500) (500)
+
200 500
𝒁𝒄𝒐𝒎𝒑 = 𝟐. 𝟗𝟏

47
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 6: Decide.

Rejection region

zcrit = +1.645 zcomp = 2.91

Decision: Since the computed value of z is greater than (figuratively, it lies within the
rejection region) the critical value Zcomp (+2.91) > Zcrit (+1.645), reject Ho.
Step 7: Conclusion.
Since Ho was rejected, agree to the claim that the proportion of town voters favoring the
proposal is higher than the proportion of county voters.

Example 2: Two hundred fifty AIDS victims during the year 2000 had only 1% chance of survival. After
3 years, a new medication was found and tested. A total of 500 AIDS victims have been
treated and 15 have survived. Does this result show that the new medication is more
successful than the old one? Use the 0.05 level of confidence.
Solutions:
Given: Let N and O denote new medication and old medication, respectively.
p𝑂 = 0.01
𝑛𝑂 = 250
15
p𝑁 =
500
𝑛𝑁 = 500
𝛼 = 0.05

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: There is no significant difference in the effectivity of the new and old medications.
(p𝑁 = p𝑂 )
Ha: The new medication is more successful than the old one. (p𝑁 > p𝑂 )

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
Since Ha is expressed in terms of a directional expression (greater than), it would be
appropriate to use a 1–tailed, right test.

Step 4: Determine the tabular value for the test from the table.
zcrit = +1.645.

Step 5: Compute for the required statistical test.


15
̂
(P ̂
𝑁 − P𝑂 ) − (p𝑁 − p𝑂 )
( − 0.01)
z= = 500
STAT 201: Statistical Methods I

p𝑁 q 𝑁 p𝑂 q 𝑂 15 485
√ n + n
𝑁 𝑂 √(500) (500) (0.01)(0.99)
+
500 250
𝒁𝒄𝒐𝒎𝒑 = 𝟐. 𝟎𝟐

48
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 6: Decide.

Rejection region

zcrit = +1.645
zcomp = 2.02

Decision: Since the computed value of z is greater than the critical value Zcomp (+2.02) > Zcrit
+1.645), Ho is rejected.

Step 7: Conclusion.
Since Ho was rejected, conclude that the new medication is more successful than the old one.

Activity 05:
Perform hypothesis testing to the following problems. Use a 0.05 level of significance.

1. A group of biology students wish to determine whether an insect population found only in one location of a
forest belonged to certain specie. The only morphological characteristic which appeared from that of the
known members of the specie was wing length. The mean wing length of the specie was 15.4 mm with a
standard deviation of 2.3 mm. The students measured the wing length of 50 insects and had a mean of 17.4
mm. Can the students conclude that the insects are of different species?

2. A study at the University of Colorado at Boulder shows that running increases the percent resting metabolic
rate (RMR) in older women. The average RMR of 50 elderly women runners was 34.0% higher than the
average RMR of 50 sedentary elderly women, and the population standard deviations were reported to be
10.5 and 10.2%, respectively. Was there a significant increase in RMR of the women runners over the
sedentary women?

3. At a certain college, it is estimated that at most 25% of the students ride bicycles to class. Does this seem to
be a valid estimate if, in a random sample of 90 college students, 28 are found to ride bicycles to class?

4. A 2003 New York Times/CBS News poll sampled 523 adults who were planning a vacation during the next
six months and found that 141 were expecting to travel by airplane (New York Times News Service, March
2, 2003). A similar survey question in a May 1993 New York Times/CBS News poll found that of 477 adults
who were planning a vacation in the next six months, 81 were expecting to travel by airplane. Is there a
significant change occurred in the population proportion planning to travel by airplane over the 10-year
period?
STAT 201: Statistical Methods I

49
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Hypothesis testing means finding out if the mean difference is statistically significant or not. A t-test or
a z-test may be used for the particular purpose.

Z-TEST VERSUS T-TEST


Basis Z-test t-test
Basic Definition Z-test is kind of hypothesis test which The t-test can be referred to a kind of
ascertains if the averages of the two parametric test that is applied to an identity,
datasets are different from each other when how the averages of two sets of data differ
standard deviation or variance is given. from each other when the standard
deviation or variance is not given.
Population Variance The population variance or standard The population variance or standard
deviation is known. deviation is unknown.
Sample Size The sample size is large (n≥30). The sample size is small.
Key Assumption All data points are independent. All data points are not dependent.
Normal Distribution for Z, with an average Sample values are to be recorded and taken
zero and variance = 1. accurately.
Based on the Type of Normal distribution. Student-t distribution.
Distribution
Test on the Mean of a 𝑋̅ − 𝜇 𝑥̅ − 𝜇
𝑍= 𝑡= 𝑠
Normal Distribution 𝜎
(One-Sample) √𝑛 √(𝑛)
Inference on the (𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2) (𝑥̅1 − 𝑥̅2) − (𝜇1 − 𝜇2)
𝑧= 𝑡=
Difference in Means 𝜎12 𝜎22
√ 𝑆12 𝑆22
of Two Normal 𝑛1 + 𝑛2 √
𝑛1 + 𝑛2
Distributions (Two-
Sample)

THE STUDENT T-TEST


The t test tells you how significant the differences between groups are; In other words, it lets you know
if those differences (measured in means/averages) could have happened by chance.

A very simple example: Let’s say you have a cold and you try a naturopathic remedy. Your cold lasts a couple of
days. The next time you have a cold, you buy an over-the-counter pharmaceutical and the cold lasts a week. You
survey your friends and they all tell you that their colds were of a shorter duration (an average of 3 days) when
they took the homeopathic remedy. What you really want to know is, are these results repeatable? A t test can tell
you by comparing the means of the two groups and letting you know the probability of those results happening
by chance.

Another example: Student’s T-tests can be used in real life to compare means. For example, a drug company may
want to test a new cancer drug to find out if it improves life expectancy. In an experiment, there’s always a control
group (a group who are given a placebo, or “sugar pill”). The control group may show an average life expectancy
of +5 years, while the group taking the new drug might have a life expectancy of +6 years. It would seem that the
drug might work. But it could be due to a fluke. To test this, researchers would use a Student’s t-test to find out if
the results are repeatable for an entire population.

THE T-SCORE
The t-score is a ratio between the difference between two groups and the difference within the groups.
STAT 201: Statistical Methods I

The larger the t score, the more difference there is between groups. The smaller the t score, the more similarity
there is between groups. A t score of 3 means that the groups are three times as different from each other as they
are within each other. When you run a t test, the bigger the t-value, the more likely it is that the results are
repeatable.
A large t-score tells you that the groups are different.
A small t-score tells you that the groups are similar.

50
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

REJECTION AND NON-REJECTION REGIONS


Depending on the type of statistical test to be used (based on how the alternative hypothesis Ha is
expressed), the following illustrations may be used as a guide to whether the null hypothesis (Ho) is be rejected
or not.
2-tailed test

–to +to

1-tailed test, left 1-tailed test, right

–to +to

T-VALUES AND P-VALUES


Every t-value has a p-value to go with it. A p-value is the probability that the results from your sample
data occurred by chance. P-values are from 0% to 100%. They are usually written as a decimal. For example, a p
value of 5% is 0.05. Low p-values are good; they indicate your data did not occur by chance. For example, a p-
value of .01 means there is only a 1% probability that the results from an experiment happened by chance. In most
cases, a p-value of 0.05 (5%) is accepted to mean the data is valid.

CALCULATING THE STATISTIC / TEST TYPES


There are three main types of t-test:
• A One sample t-test tests the mean of a single group against a known mean.
• An Independent Samples t-test compares the means for two groups.
• A Paired sample t-test compares means from the same group at different times (say, one year apart).

TOPIC 2: t-TEST FOR INDEPENDENT OR UNCORRELATED SAMPLES

A. ONE SAMPLE T-TEST


One-sample t-test tests the mean of a single group against a known population mean.

The one-sample t-test determines whether the sample mean is statistically different from a known or
hypothesized population mean. The One Sample t Test is a parametric test. This test is also known as single-
sample t-test. In a one-sample t-test, the test variable is compared against a "test value", which is a known or
hypothesized value of the mean in the population.
STAT 201: Statistical Methods I

The one-sample t-test is commonly used to test the following:


• Statistical difference between a sample mean and a known or hypothesized value of the mean in the
population.
• Statistical difference between the sample mean and the sample midpoint of the test variable.
• Statistical difference between the sample mean of the test variable and chance.
• This approach involves first calculating the chance level on the test variable. The chance level
is then used as the test value against which the sample mean of the test variable is compared.
• Statistical difference between a change score and zero.
• This approach involves creating a change score from two variables, and then comparing the
mean change score to zero, which will indicate whether any change occurred between the two

51
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

time points for the original measures. If the mean change score is not significantly different
from zero, no significant change occurred.

Note: The one-sample t-test can only compare a single sample mean to a specified constant. It cannot compare
sample means between two or more groups. If you wish to compare the means of multiple groups to each other,
you will likely want to run an Independent Samples t-test (to compare the means of two groups) or a One-Way
ANOVA (to compare the means of two or more groups).

T-TEST: TEST ON THE MEAN OF A NORMAL DISTRIBUTION, VARIANCE UNKNOWN (ONE-


SAMPLE)
Test Procedure Null Hypothesis: Ho: μ = μo
for the One- Test Statistic: ̅−𝝁
𝒙
𝒕= 𝒔
Sample t-Test
√(𝒏)

where:
𝑥̅ = sample mean
𝜇 = population mean
𝑠 = sample standard deviation
𝑛 = sample size/number of observations

Degrees of freedom: 𝑑𝑓 = 𝑛 − 1

Alternative Hypothesis, P-Value Rejection Criterion for


Ha Fixed-Level Tests
μ ≠ μo Probability above |t0| and 𝑡0 > 𝑡𝛼,𝑛−1 or 𝑡0 <
probability below –|t0| 2
−𝑡𝛼,𝑛−1
2
μ > μo Probability above |t0| 𝑡0 > 𝑡𝛼,𝑛−1
2
μ < μo Probability below |t0| 𝑡0 < −𝑡𝛼 ,𝑛−1
2

The P-values and critical regions for these situations are shown in the figure above.

Example 1: Joan’s Nursery specializes in custom-designed landscaping for residential areas. The
estimated labor cost associated with a particular landscaping proposal is based on the number
of plantings of trees, shrubs, and so on to be used for the project. For cost-estimating
purposes, managers use two hours of labor time for the planting of a medium-sized tree.
Actual times from a sample of 10 plantings during the past month follow (times in hours).
1.7 1.5 2.6 2.2 2.4
2.3 2.6 3.0 1.4 2.3

Test to see whether the mean tree-planting time is two hours.


Solutions:
Given: 𝜇 = 2.0
n = 10
𝑥̅ = 2.2
S = 0.52
STAT 201: Statistical Methods I

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: The mean tree-planting time is two hours.
𝜇 = 2.0
Ha: The mean tree-planting time is not two hours.
𝜇 ≠ 2.0

Step 2: Set the level of significance.


𝛼 = 0.05

52
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
2–tail

Step 4: Determine the tabular value for the test from the table.

tcrit(0.05, 9) = ±2.262.

Step 5: Compute for the required statistical test.


̅−𝝁
𝒙 𝟐. 𝟐 − 𝟐. 𝟎
𝒕= 𝒔 =
𝟎. 𝟓𝟐
√(𝒏) √𝟏𝟎
tcomp = +1.2163

Step 6: Decide.

Rejection region Rejection region

tcomp = +1.2163
tcrit = –2.262 tcrit = +2.262

Decision: Since tcomp < tcrit, do not reject Ho.

Step 7: Conclusion.
Since Ho was not rejected, we can conclude Ho.
“The mean tree-planting time is two hours.”
STAT 201: Statistical Methods I

Example The Edison Electric Institute has published figures on the number of kilowatt hours used annually
2: by various home appliances. It is claimed that a vacuum cleaner uses an average of 46 kilowatt
hours per year. If a random sample of 12 homes included in a planned study indicates that vacuum
cleaners use an average of 42 kilowatt hours per year with a standard deviation of 11.9 kilowatt
hours, does this suggest at the 0.05 level of significance that vacuum cleaners use, on average,
less than 46 kilowatt hours annually? Assume the population of kilowatt hours to be normal.

Solutions:
Given: 𝜇 = 46 kilowatt hours

53
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

n = 12 homes
𝑥̅ = 42 kilowatt hours
S = 11.9 kilowatt hours
𝛼 = 0.05

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: Vacuum cleaners use, on average, at least 46 kilowatt hours annually.
𝜇 ≥ 46
Ha: Vacuum cleaners use, on average, less than 46 kilowatt hours annually.
𝜇 < 46

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
1–tail, left

Step 4: Determine the tabular value for the test from the table.

tcrit(0.05, 11) = –1.796.

Step 5: Compute for the required statistical test.


̅−𝝁
𝒙 𝟒𝟐 − 𝟒𝟔
𝒕= 𝒔 = 𝟏𝟏. 𝟗
√(𝒏) √𝟏𝟐
STAT 201: Statistical Methods I

tcomp = –1.1644

54
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 6: Decide.

Rejection region

tcrit = –1.796

tcomp = –1.1644

Decision: Since tcomp < tcrit, do not reject Ho.


Step 7: Conclusion.
“There is no sufficient evidence to conclude that the vacuum cleaners use, on average, less than
46 kilowatt hours annually.”
Or
“Vacuum cleaners use, on average, at least 46 kilowatt hours annually.”

Example A perfume company claims that the best-selling perfume contains at most 25% alcohol. Twenty
3: bottles were selected and found to have a mean of 29.7% and a standard deviation of 4.8%. Test
the claim of the perfume company at the 0.05 level of significance.
Solutions:
Given: 𝜇 = 25%
n = 20
𝑥̅ = 29.7%
S = 4.8%
𝛼 = 0.05

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: The best-selling perfume contains at most 25% alcohol.
𝜇 ≤ 25
Ha: The best-selling perfume contains more than 25% alcohol.
𝜇 > 25

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
1–tail, right

Step 4: Determine the tabular value for the test from the table.
STAT 201: Statistical Methods I

55
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

tcrit(0.05, 19) = +1.729.

Step 5: Compute for the required statistical test.


̅ − 𝝁 𝟐𝟗. 𝟕 − 𝟐𝟓
𝒙
𝒕= 𝒔 =
𝟒. 𝟖
√(𝒏) √𝟐𝟎
tcomp = +4.3790

Step 6: Decide.

Rejection region
STAT 201: Statistical Methods I

tcrit = +1.729

tcomp = +4.3790

Decision: Since tcomp > tcrit, reject Ho.

Step 7: Conclusion.
Since Ho was rejected, there is no evidence to support the perfume company’s claim that the best-
selling perfume contains at most 25% alcohol.
Conclude Ha: “The best-selling perfume contains more than 25% alcohol.”

56
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

B. AN INDEPENDENT SAMPLES T-TEST


The Independent Samples t-Test compares the means of two independent groups in order to determine
whether there is statistical evidence that the associated population means are significantly different. The
Independent Samples t Test is a parametric test. This test is also known as:
Independent t-Test Two-Sample t-Test
Independent Measures t-Test Uncorrelated Scores t-Test
Independent Two-sample t-Test Unpaired t-Test
Student t-Test Unrelated t-Test

The variables used in this test are known as:


• Dependent variable, or test variable
• Independent variable, or grouping variable

The Independent Samples t-Test is commonly used to test the following:


• Statistical differences between the means of two groups
• Statistical differences between the means of two interventions
• Statistical differences between the means of two change scores

Note: The Independent Samples t-Test can only compare the means for two (and only two) groups. It cannot make
comparisons among more than two groups. If you wish to compare the means across more than two groups, you
will likely want to run an ANOVA.

In this section, the t-test can also be used to compare two sample means by determining if the samples
were obtained from normal populations with the same means. However, certain assumptions are required.
1. The population must be at least approximately normally distributed.
2. The population must be independent.
3. The population variances must be equal. Test for equality of variances are taken up in advance.

INFERENCE ON THE DIFFERENCE IN MEANS OF TWO NORMAL DISTRIBUTIONS, VARIANCES


UNKNOWN
Test Null Hypothesis: Ho: (𝜇1 − 𝜇2 ) = ∆0
Procedure Test Statistic: ̅𝟏 − 𝑿
(𝑿 ̅ 𝟐 ) − (∆𝟎 )
𝒕=
for 𝟐 𝟐
√𝒔𝟏 + 𝒔𝟐
Independent 𝒏𝟏 𝒏𝟐
Sample t-test
where:
𝑋̅1 and 𝑋̅2 are sample means
𝜇1 and 𝜇2 are population means
∆𝟎 is the difference in population means (μ1 − μ2)
𝑠1 and 𝑠2 are sample standard deviations
𝑛1 and 𝑛2 are the number of cases or observation/sample sizes in each
group

Degrees of freedom: 𝑑𝑓 = 𝑛1 + 𝑛2 − 2

Alternative Hypothesis, P-Value Rejection Criterion for


Ha Fixed-Level Tests
(𝜇1 − 𝜇2) ≠ ∆0 Probability above |t0| and 𝑡0 > 𝑡𝛼,𝑑𝑓 or 𝑡0 < −𝑡𝛼,𝑑𝑓
2
probability below –|t0| 2
STAT 201: Statistical Methods I

(𝜇1 − 𝜇2) > ∆0 Probability above |t0| 𝑡0 > 𝑡𝛼,𝑑𝑓


2
(𝜇1 − 𝜇2) < ∆0 Probability below |t0| 𝑡0 < −𝑡𝛼 ,𝑑𝑓
2

57
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Example 1: An experiment was performed to compare the abrasive wear of two different laminated
materials. Twelve pieces of material 1 were tested by exposing each piece to a machine
measuring wear. Ten pieces of material 2 were similarly tested. In each case, the depth of wear
was observed. The samples of material 1 gave an average (coded) wear of 85 units with a
sample standard deviation of 4, while the samples of material 2 gave an average of 81 with a
sample standard deviation of 5. Can we conclude at the 0.05 level of significance that the
abrasive wear of material 1 exceeds that of material 2 by less than 2 units?
Solutions:
Given: Material 1 Material 2
̅
𝒙 85 81
n 12 10
s 4 5

𝛼 = 0.05

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: The abrasive wear of material 1 exceeds that of material 2 by at most 2 units.
(𝜇𝑀1 − 𝜇𝑀2 ) ≥ 2
Ha: The abrasive wear of material 1 exceeds that of material 2 by less than 2 units.
(𝜇𝑀1 − 𝜇𝑀2 ) < 2

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
1–tail, left

Step 4: Determine the tabular value for the test from the table.

tcrit(0.05, 20) = –1.725.

Step 5: Compute for the required statistical test.


(𝒙
̅𝟏 − 𝒙
̅𝟐 ) − (𝝁𝟏 − 𝝁𝟐 ) (𝟖𝟓 − 𝟖𝟏) − 𝟐
𝒕= =
STAT 201: Statistical Methods I

𝟐 𝟐
𝑺𝟐𝟏 𝑺𝟐𝟐 √𝟒 + 𝟓
√ 𝟏𝟐 𝟏𝟎
𝒏𝟏 + 𝒏𝟐
tcomp = +1.0215

58
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 6: Decide.

Rejection region

tcrit = +1.725

tcomp = +1.0215

Decision: Since tcomp < tcrit, do not reject Ho.

Step 7: Conclusion.
Since Ho was not rejected, there was no evidence to support that the abrasive wear of material 1
exceeds that of material 2 by more than 2 units.
Conclude: “The abrasive wear of material 1 exceeds that of material 2 by at most 2 units.”

Example 2: To find out whether a new drug will reduce the spread of cancer, 9 mice which have all reached
an advance state of the disease are selected. Five mice received the treatment and four did not.
The survival periods, in months from the time that the experiment commenced are as follows:

Treatment 2.4 5.3 1.4 4.6 0.9


No Treatment 1.9 0.8 2.8 3.7

At a 0.05 level of significance is there evidence to say that the new drug is effective?
Solutions:
Given: Treatment No Treatment
̅
𝒙 2.92 2.30
n 5 4
s 1.95 1.24

𝛼 = 0.05

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: There is no significant difference in the mean survival periods of the treated and not
treated mice.
𝜇𝑇 = 𝜇𝑁𝑇

Ha: The new drug is effective:


The mean survival period of the treated mice is greater than the mean survival period
of the not treated mice.
𝜇𝑇 > 𝜇𝑁𝑇
Step 2: Set the level of significance.
𝛼 = 0.05
STAT 201: Statistical Methods I

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
1–tail, right

59
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 4: Determine the tabular value for the test from the table.

tcrit(0.05, 7) = +1.895.

Step 5: Compute for the required statistical test.


(𝒙
̅𝑻 − 𝒙 ̅𝑵𝑻 ) − (𝝁𝑻 − 𝝁𝑵𝑻 ) (𝟐. 𝟗𝟐 − 𝟐. 𝟑𝟎) − 𝟎
𝒕= =
𝟐 𝟐
𝑺𝟐𝑻 𝑺𝟐𝑵𝑻 √𝟏. 𝟗𝟓 + 𝟏. 𝟐𝟒
√ + 𝟓 𝟒
𝒏𝑻 𝒏𝑵𝑻
tcomp = +0.5794

Step 6: Decide.

Rejection region

tcrit = +1.895
tcomp = +0.5794

Decision: Since tcomp < tcrit, do not reject Ho.

Step 7: Conclusion.
Since Ho was not rejected, there is no sufficient evidence to say that the new drug is effective.
That is, we can say that
STAT 201: Statistical Methods I

“There is no significant difference in the mean survival periods of the treated and not treated mice.”

Example 3: Engineers at a large automobile manufacturing company are trying to decide whether to purchase
brand A or brand B tires for the company’s new models. To help them arrive at a decision, an
experiment is conducted using 50 of each brand. The tires are run until they wear out. The results
are as follows:
Brand A: 𝑥1 = 37,900 kilometers, s1 = 5100 kilometers.
̅̅̅
Brand B: 𝑥2 = 39,800 kilometers, s2 = 5900 kilometers.
̅̅̅

60
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Test the hypothesis that there is no difference in the average wear of the two brands of tires.
Assume the populations to be approximately normally distributed
Solutions:
Given: Brand A Brand B
̅
𝒙 37,900 39,800
n 50 50
s 5,100 5,900

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: There is no significant difference in the average wear of the two brands of tires.
𝜇𝐴 = 𝜇𝐵

Ha: There is a significant difference in the average wear of the two brands of tires.
𝜇𝐴 ≠ 𝜇𝐵

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
2–tail

Step 4: Determine the tabular value for the test from the table.

tcrit(0.05, ∞) = ±1.960

Step 5: Compute for the required statistical test.


(𝒙
̅𝑨 − 𝒙̅𝑩 ) − (𝝁𝑨 − 𝝁𝑩 ) (𝟑𝟕, 𝟗𝟎𝟎 − 𝟑𝟗, 𝟖𝟎𝟎) − 𝟎
𝒕= =
𝟐 𝟐
𝑺𝟐𝑨 𝑺𝟐𝑩 √𝟓, 𝟏𝟎𝟎 + 𝟓, 𝟗𝟎𝟎
√ + 𝟓𝟎 𝟓𝟎
𝒏𝑨 𝒏𝑩
tcomp = –1.7227

Step 6: Decide.
STAT 201: Statistical Methods I

Rejection region Rejection region

tcrit = –1.96 tcrit = +1.96


tcomp = –1.7227

Decision: Since tcomp < tcrit, do not reject Ho.

61
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Step 7: Conclusion.
Since Ho was not rejected, conclude Ho:
“There is no significant difference in the average wear of the two brands of tires.”

TOPIC 3: t-TEST FOR CORRELATED SAMPLES

C. PAIRED SAMPLE T-TEST


A paired sample t-test is used to compare two population means where you have two samples in which
observations in one sample can be paired with observations in the other sample. Examples of where this might
occur are:
• Before-and-after observations on the same subjects (e.g. students’ diagnostic test results before and after
a particular module or course).
• A comparison of two different methods of measurements or two different treatments where the
measurements/treatments are applied to the same subjects (e.g. blood pressure measurements using a
stethoscope and a dynamap).

Test Null Hypothesis: Ho: 𝜇𝐷 = ∆0


Procedure Test Statistic: ̅ − ∆𝟎
𝒅
𝒕= 𝒔
for Paired t- 𝒅
test √(𝒏)

where:
𝑑̅ = mean difference between two observations on each pair (𝑑𝑖 = 𝑦𝑖 −
𝑥𝑖 )
𝑠𝑑 = standard deviation of the differences
𝑛 = sample size/number of observations

Degrees of freedom: 𝑑𝑓 = 𝑛 − 1

Alternative Hypothesis, P-Value Rejection Criterion for


Ha Fixed-Level Tests
𝜇𝐷 ≠ ∆ 0 Probability above |t0| and 𝑡0 > 𝑡𝛼,𝑛−1 or 𝑡0 < −𝑡𝛼,𝑛−1
2
probability below –|t0| 2

𝜇𝐷 > ∆ 0 Probability above |t0| 𝑡0 > 𝑡𝛼,𝑛−1


2
𝜇𝐷 < ∆ 0 Probability below |t0| 𝑡0 < −𝑡𝛼 ,𝑛−1
2

Example: Five samples of a ferrous-type substance were used to determine if there is a difference between
a laboratory chemical analysis and an X-ray fluorescence analysis of the iron content. Each
sample was split into two subsamples and the two types of analysis were applied. Following are
the coded data showing the iron content analysis:
Samples
Analysis 1 2 3 4 5
X-ray 2.0 2.0 2.3 2.1 2.4
Chemical 2.2 1.9 2.5 2.3 2.4
STAT 201: Statistical Methods I

Assuming that the populations are normal, test at the 0.05 level of significance whether the two
methods of analysis give, on the average, the same result.

Solutions:
Given: Brand A Brand B
̅
𝒙 37,900 39,800
n 50 50
s 5,100 5,900

Step 1: Formulate a null hypothesis (Ho) and the alternative hypothesis (Ha).
Ho: There is no significant difference in the average wear of the two brands of tires.

62
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

𝜇 𝐴 = 𝜇𝐵

Ha: There is a significant difference in the average wear of the two brands of tires.
𝜇 𝐴 ≠ 𝜇𝐵

Step 2: Set the level of significance.


𝛼 = 0.05

Step 3: Identify the type of statistical test as either one-tailed test or two-tailed test.
2–tail

Step 4: Determine the tabular value for the test from the table.

tcrit(0.05, ∞) = ±1.960

Step 5: Compute for the required statistical test.


(𝒙
̅𝑨 − 𝒙 ̅𝑩 ) − (𝝁𝑨 − 𝝁𝑩 ) (𝟑𝟕, 𝟗𝟎𝟎 − 𝟑𝟗, 𝟖𝟎𝟎) − 𝟎
𝒕= =
𝟐 𝟐
𝑺 𝟐
𝑺 𝟐 √𝟓, 𝟏𝟎𝟎 + 𝟓, 𝟗𝟎𝟎
√ 𝑨+ 𝑩 𝟓𝟎 𝟓𝟎
𝒏 𝑨 𝒏𝑩
tcomp = –1.7227

Step 6: Decide.

Rejection region Rejection region

tcrit = –1.96 tcrit = +1.96


tcomp = –1.7227

Decision: Since tcomp < tcrit, do not reject Ho.


STAT 201: Statistical Methods I

Step 7: Conclusion.
Since Ho was not rejected, conclude Ho:
“There is no significant difference in the average wear of the two brands of tires.”

63
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

ACTIVITY 06: t-TEST


1. According to a dietary study, high sodium intake may be related to ulcers, stomach cancer, and migraine
headaches. The human requirement for salt is only 220 milligrams per day, which is surpassed in most single
servings of ready-to-eat cereals. If a random sample of 20 similar servings of a certain cereal has a mean
sodium content of 244 milligrams and a standard deviation of 24.5 milligrams, does this suggest at the 0.05
level of significance that the average sodium content for a single serving of such cereal is greater than 220
milligrams? Assume the distribution of sodium contents to be normal.

2. The College Board provided comparisons of Scholastic Aptitude Test (SAT) scores based on the highest level
of education attained by the test taker’s parents. A research hypothesis was that students whose parents had
attained a higher level of education would on average score higher on the SAT. During 2003, the overall mean
SAT verbal score was 507 (The World Almanac, 2004). SAT verbal scores for independent samples of students
follow. The first sample shows the SAT verbal test scores for students whose parents are college graduates
with a bachelor’s degree. The second sample shows the SAT verbal test scores for students whose parents are
high school graduates but do not have a college degree.

Student’s Parents
College Grads High School Grads
485 487 442 492
534 533 580 478
650 526 479 425
554 410 486 485
550 515 528 390
572 578 524 535
497 448
592 469

At 0.05 level of significance, determine whether the sample data support the hypothesis that students show a
higher mean verbal score on the SAT if their parents attained a higher level of education.

3. Suppose a sample of 20 students were given a diagnostic test before studying a particular module and then
took the same test after completing the module. We want to find out if, in general, our teaching leads to
improvements in students’ knowledge/skills (i.e. test scores). We can use the results from our sample of
students to draw conclusions about the impact of this module in general.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x 18 21 16 22 19 24 17 21 23 18 14 16 16 19 18 20 12 22 15 17
y 22 25 17 24 16 29 20 23 19 20 15 15 18 26 18 24 18 25 19 16
x: test score before the module; y: test score after the module

STAT 201: Statistical Methods I

64
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

TOPIC 4: ONE-WAY ANALYSIS OF VARIANCE (ONE-WAY ANOVA)

SINGLE FACTOR EXPERIMENTS


RANDOMIZED COMPLETE BLOCK DESIGN FOR ONE-FACTOR EXPERIMENT
The randomized block design is an extension of the paired t-test to situations where the factor of interest
has more than two levels; that is, more than two treatments must be compared.

For example, suppose that three methods could be used to evaluate the strength readings on steel plate
girders. We may think of these as three treatments, say t1, t2, and t3. If we use four girders as the experimental
units, a randomized complete block design would appear as shown below:

Block 1 Block 2 Block 3 Block 4


t1 t1 t1 t1
t2 t2 t2 t2
t3 t3 t3 t3

The design is called a randomized complete block design because each block is large enough to hold all
the treatments and because the actual assignment of each of the three treatments within each block is done
randomly.

Once the experiment has been conducted, the data are recorded in a table, such as is shown below

Treatments Block (Girder)


(Method) 1 2 3 4
1 𝑦11 𝑦12 𝑦13 𝑦14
2 𝑦21 𝑦22 𝑦23 𝑦24
3 𝑦31 𝑦32 𝑦33 𝑦34

The observations in this table, say yij, represent the response obtained when method i is used on girder j.

The general procedure for a randomized complete block design consists of selecting b blocks and running
a complete replicate of the experiment in each block. The data that result from running a randomized complete
block design for investigating a single factor with a levels and b blocks are shown below.

Blocks
Treatments Totals Averages
1 2 ⋯ b
1 𝑦11 𝑦12 ⋯ 𝑦1𝑏 𝑦1∙ 𝑦̅1∙
2 𝑦21 𝑦22 ⋯ 𝑦2𝑏 𝑦2∙ 𝑦̅2∙
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
a 𝑦𝑎1 𝑦𝑎2 ⋯ 𝑦𝑎𝑏 𝑦𝑎∙ 𝑦̅𝑎∙
Totals 𝑦∙1 𝑦∙2 ⋯ 𝑦∙𝑏 𝑦∙∙
Averages 𝑦̅∙1 𝑦̅∙2 ⋯ 𝑦̅∙𝑏 𝑦̅∙∙

There will be a observations (one per factor level) in each block, and the order in which these
observations are run is randomly assigned within the block.
STAT 201: Statistical Methods I

65
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Example: A manufacturer of paper used for making grocery bags is interested in improving the
tensile strength of the product. Product engineering thinks that tensile strength is a
function of the hardwood concentration in the pulp and that the range of hardwood
concentrations of practical interest is between 5 and 20%. A team of engineers responsible
for the study decides to investigate four levels of hardwood concentration: 5%, 10%, 15%,
and 20%. They decide to make up six test specimens at each concentration level, using a
pilot plant. All 24 specimens are tested on a laboratory tensile tester, in random order. The
data from this experiment are shown below:

Hardwood Concentration (%)


Observations 5 10 15 20 Totals
1 7 12 14 19
2 8 17 18 19
3 15 13 19 22
4 11 18 17 23
5 9 19 16 18
6 10 15 18 20
Totals 60 94 102 121 383
Averages 10 15.67 17 20.17 15.96

This is an example of a completely randomized single-factor experiment with four levels of the factor.
The levels of the factor are sometimes called treatments, and each treatment has six observations or replicates.
The role of randomization in this experiment is extremely important. By randomizing the order of the 24 runs,
the effect of any nuisance variable that may influence the observed tensile strength is approximately balanced out.
For example, suppose that there is a warm-up effect on the tensile testing machine; that is, the longer the machine
is on, the greater the observed tensile strength. If all 24 runs are made in order of increasing hardwood
concentration (that is, all six 5% concentration specimens are tested first, followed by all six 10% concentration
specimens, etc.), any observed differences in tensile strength could also be due to the warm-up effect.

ANALYSIS OF VARIANCE (ANOVA)


Simple analysis of variance is commonly called ANOVA. It is an extension of t-test. Both ANOVA and
t-test are used for testing the non-significant difference of means, used to find out the non-significance of
difference between two groups, however, ANOVA is also used to test the no-significance of difference among
several groups.

The analysis of variance (ANOVA) F-test method is a method for providing the variation observed into
different parts, each part assignable to a known source, cause, or factor. The ANOVA was developed by R.A.
Fisher and reported by him in 1923. Simply stated, it is used when we wish to test the significance of the difference
between two or more means obtained from independent samples.

F-test is a parametric test made; it has the same assumptions attributed for a parametric test made:
1. Random selection of subjects from a normal population with equal variances;
2. Samples or groups are independent; and
3. Data being analyzed must be interval.

BASIC STEPS IN PERFORMING ANOVA


STAT 201: Statistical Methods I

1. State the null and alternative hypotheses.


2. Calculate the appropriate test statistic.
3. Obtain the critical value for the F-distribution.
4. Construct ANOVA Summary Table
5. State the decision rule. The null hypothesis is rejected if the computed value of F is greater tha the critical
value: Fcomp > Fcrit.
6. Determine the p-value for your conclusion.
7. Interpretation.

66
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

UNDERSTANDING ANALYSIS OF VARIANCE


Summary Table: The Analysis of Variance (ANOVA) for a Single-Factor Experiment, Fixed-Effects
Model
Source of Sum of Mean
df F-Ratio
Variation Squares Square
Between SSB k–1 MSB F
Within SSW N–k MSW
Total SST N–1

FORMULA where:
((∑ 𝑋) 𝑇 )2 Σ𝑋 2= sum of all the squared scores
𝑆𝑆𝑇 = Σ𝑋 2 − Σ𝑋 = sum of all the scores
𝑁
𝑁 = total number of scores

𝑘
(Σ𝑋𝑖 )2 ((∑ 𝑋) 𝑇 )2 Σ𝑋𝑖 = sum of the scores in any group
𝑆𝑆𝐵 = {∑ }− 𝑛𝑖 = number of scores in any group
𝑛𝑖 𝑁
𝑖=1
𝑆𝑆𝑊 = 𝑆𝑆𝑇 − 𝑆𝑆𝐵
𝑑𝑓𝑏 = 𝑘 − 1 𝑘 = the number of groups

𝑑𝑓𝑤 = 𝑁 − 𝑘 𝑁 = the total number of scores


𝑘 = the number of groups

𝑑𝑓𝑡 = 𝑁 − 1 𝑁 = the total number of scores

𝑆𝑆𝑏 𝑀𝑆𝑏 = mean square of the between-groups


𝑀𝑆𝑏 =
𝑑𝑓𝑏 𝑑𝑓𝑏 = degrees of freedom between-groups

𝑆𝑆𝑤 𝑀𝑆𝑤 = mean square of the within-groups


𝑀𝑆𝑤 =
𝑑𝑓𝑤 𝑑𝑓𝑏 = degrees of freedom within-groups

𝑀𝑆𝑏
𝐹=
𝑀𝑆𝑤

ONE-WAY ANOVA IN EXCEL


Preliminary step:
Make sure that the “Analysis ToolPak” is installed.
Under “Tools” is the option “Data Analysis” present?
If yes – ToolPak is installed.
If no – select “Add-ins.” Check the boxes entitled “Analysis ToolPak” and “Analysis ToolPak – VBA”
and click “OK”. This will install the “Data Analysis ToolPak.”

(1.) Under “Tools” select “Data Analysis” In the window that appears select “ANOVA: One factor” and click
“OK.”
(2.) Using your mouse highlight the cells containing the data.
STAT 201: Statistical Methods I

(3.) Select “Columns” if each treatment is its own column or “Row” if each treatment is its own row.
(4.) Set your level of significance. (The default is 5% or 0.05.) (5.) Click “OK” and the ANOVA output will
appear on a new worksheet.

67
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Example: You wish to test the effects of a number of experimental treatments (Counseling Approaches):
group counseling, peer counseling, and individual counseling on the self-concept of students.
In this case, the independent variable, counseling approaches, has three levels. Necessarily,
there should be three groups randomly selected from the school population which will be
exposed to the three different counseling approaches. The dependent variable, self-concept,
may be measured through a standardized self-concept instrument which yields interval scores
for the subject.

Counseling Approach Self-Concept of Students


Group (G) 78 46 41 50 69 82
Peer (P) 77 83 97 69 79 87
Individual (I) 78 91 97 82 85 77

Step 1: Statement of the null and alternative hypotheses.


Ho: There is no significant difference in the self-concept among the three groups of students
exposed to different counseling approaches.
𝜇𝐺 = 𝜇𝑃 = 𝜇𝐼
Ha: There is a significant difference in the self-concept among the three groups of students exposed
to different counseling approaches.
There is an effect of the counseling approach on the self-concept of students.

Step 2: Calculate the test statistics (Raw Score Method).


a. Compute the raw scores.
b. Compute for the sums of N for each group, the sums of raw scores X, and the sums of squared
scores X2.
𝚺𝑵 = 6 + 6 + 6 = 18
𝚺𝑿 = 366 + 492 + 510 = 1,368
𝚺𝑿𝟐 = 23,866 + 40,789 + 43,652 = 108,316

Group (G) Peer (P) Individual (I)


78 77 78
46 83 91
41 97 97
50 69 82
69 79 85
82 87 77
Sums 366 492 510 1,368
Sum of Squares 23,866 40,798 43,652 108,316
N 6 6 6 18
Means 61 82 85

c. Compute Sums of Squares


SST (Sums of Squares for the Total Variability)
(𝚺𝑿)𝟐
𝑺𝑺𝑻 = 𝚺𝑿𝟐 −
𝚺𝑵
(1,368)2
𝑆𝑆𝑇 = 108,316 −
18
𝑆𝑆𝑇 = 4,348
STAT 201: Statistical Methods I

SSB (Sums of Squares for Between Group Variability)


(𝚺𝑿𝒊 )𝟐 (𝚺𝑿)𝟐
𝑺𝑺𝑩 = (∑ )−
𝑵𝒊 𝚺𝑵
3662 4922 5102 1,3682
𝑆𝑆𝐵 = + + − = 106,020 − 103,968
6 6 6 18
𝑆𝑆𝐵 = 2,052

SSW (Sums of Squares for Within Group Variability)

68
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

𝑺𝑺𝑾 = 𝑺𝑺𝑻 − 𝑺𝑺𝑩


𝑆𝑆𝑊 = 4,348 − 2,052
𝑆𝑆𝑊 = 2,296

d. Determine the degree of freedom.


Between groups: 𝒅𝑭𝑩 = 𝑲 − 𝟏 = 3 − 1 = 2
Within groups: 𝒅𝑭𝑾 = 𝑵 − 𝑲 = 18 − 3 = 15

e. Find the Mean Squares (MS)


𝑺𝑺𝑩 2,052
𝑴𝑺𝑩 = = = 1,026
𝒅𝑭𝑩 2
𝑺𝑺𝑾 2,296
𝑴𝑺𝑾 = = = 153.07
𝒅𝑭𝑾 15

f. Compute for the F-ratio.


𝑴𝑺𝑩 1,026
𝐹= = = 6.70
𝑴𝑺𝑾 153.07

Step 3: Construct ANOVA Summary Table.


Sums of Mean of
Source of Variation dF F-Ratio
Squares Squares
Between Groups 2,052 2 1.206
6.70
Within Groups 2,296 15 153.07
Total 4,348 17

Step 4: Determine the critical value of F.


At =0.05, Fcrit(2,15) = 3.68.

Step 5: State your decision rule.


STAT 201: Statistical Methods I

Since the computed value of F (Fcomp=6.70) is greater than the critical value of F (Fcrit=3.68),
Fcomp>Fcrit, reject the null hypothesis.

Step 6: Determine the p-value for your conclusion.


p<0.05

Step 7: Interpretation:
The significant F-ratio reveals the rejection of the null hypothesis. We may now accept the
alternative hypothesis that there is an effect of the counseling approaches on the self-concept
of the students. The group means show that it is the individual counseling samples registered
the lowest mean. At this point, it may be correct to say that the individual counseling approach

69
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

is significantly more effective than the group counseling approach in enhancing the self-
concepts of the students. However, the data also reveals that there is a small difference between
the means of the group G and that of group I.

TOPIC 5: MULTIPLE COMPARISONS FOLLOWING THE ANOVA

The Analysis of Variance (ANOVA) test has long been an important tool for researchers conducting
studies on multiple experimental groups and one or more control groups. It has been a powerful procedure for
testing the homogeneity of a set of means. However, however, if we reject the null hypothesis and accept the
stated alternative—that the means are not all equal—we still do not know which of the population means are equal
and which are different. ANOVA cannot provide detailed information on differences among the various study
groups, or on complex combinations of study groups. Methods for investigating this issue are called multiple
comparisons methods.

To fully understand group differences in an ANOVA, researchers must conduct tests of the differences
between particular pairs of experimental and control groups. Tests conducted on subsets of data tested previously
in another analysis are called post hoc tests. Post-hoc (Latin, meaning “after this”) means to analyze the results
of your experimental data. A class of post hoc tests that provide this type of detailed information for ANOVA
results are called "multiple comparison analysis" tests. Different multiple comparison analyses have specific
uses, advantages, and disadvantages. Some are best used for testing theory while others are useful in generating
new theory. Selection of the appropriate post hoc test will provide researchers with the most detailed information
while limiting Type 1 errors due to alpha inflation.

THE FISHER LEAST SIGNIFICANT DIFFERENCE (LSD)


• Fisher developed the least significant difference test in 1935, which is only used when you reject the null
hypothesis as a result of your hypothesis test results. The LSD calculates the smallest significant between
two means as if a test had been run on those two means (as opposed to all of the groups together). This
enables you to make direct comparisons between two means from two individual groups. Any difference
larger than the LSD is considered a significant result.
• The Fisher LSD is a tool to identify which pairs of means are statistically different. Essentially the same
as Duncan’s MRT, but with t-values instead of Q values.

Tip: The LSD will only make sense if you have a significant result from ANOVA (i.e. if you reject the null
hypothesis). Therefore, you shouldn’t run the test if you do not get a significant result from ANOVA.

The formula for the least significant difference is:


𝟏 𝟏
𝑳𝑺𝑫𝑨,𝑩 = 𝒕𝒄𝒓𝒊𝒕,𝒅𝒇𝒘 √𝑴𝑺𝒘 ( + )
𝒏𝑨 𝒏𝑩
where:
𝑡𝑐𝑟𝑖𝑡,𝑑𝑓𝑤 = critical value from the t-distribution table referred at dfw
MSw = mean square within, obtained from the results of your ANOVA test
n = number of scores used to calculate the means

General Steps:
1. Assuming a two-sided alternative hypothesis, find the tcrit.
2. Compute for LSD.
STAT 201: Statistical Methods I

3. Calculate the difference between the means of the two groups, 𝑥̅𝐴 − 𝑥̅𝐵 .
4. If 𝑥̅𝐴 − 𝑥̅𝐵 ≥ 𝐿𝑆𝐷, reject the null hypothesis. The pair of means 𝑥̅𝐴 and 𝑥̅𝐵 would be declared
significantly different.

70
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Example: You wish to test the effects of a number of experimental treatments (Counseling Approaches): group
counseling, peer counseling, and individual counseling on the self-concept of students. In this case, the
independent variable, counseling approaches, has three levels. Necessarily, there should be three
groups randomly selected from the school population which will be exposed to the three different
counseling approaches. The dependent variable, self-concept, may be measured through a standardized
self-concept instrument which yields interval scores for the subject.

Counseling Approach Self-Concept of Students


Group 78 46 41 50 69 82
Peer 77 83 97 69 79 87
Individual 78 91 97 82 85 77

Summary of Means:
Counselling
Mean
Approaches
Group 61
Peer 82
Individual 85
F-test Reject Ho.

ANOVA Summary Table


Sums of Mean of
Source of Variation dF F-Ratio
Squares Squares
Between Groups 2,052 2 1.206
6.70
Within Groups 2,296 15 153.07
Total 4,348 17

After performing Analysis of Variance, the null hypothesis was rejected.


It was concluded that there is a significant difference in the self-concept among the three groups
of students exposed to different counseling approaches. That means at least two of the means
are significantly different.

Post-hoc test using Fisher’s Least Significant Difference Test will be performed to identify
which pairs of means are statistically different.

Step 1: Assuming a two-sided alternative hypothesis, find the tcrit.


tcrit (0.05, 15) = 2.131

Step 2: Compute for the Mean Difference and LSD.


• The difference between the means of the two groups is
𝑴𝑫 = |𝒙 ̅𝑨 − 𝒙
̅𝑩 |.
• Fisher’s Least Significant Difference
𝟏 𝟏
𝑳𝑺𝑫𝑨,𝑩 = 𝒕𝒄𝒓𝒊𝒕,𝒅𝒇𝒘 √𝑴𝑺𝒘 ( + )
STAT 201: Statistical Methods I

𝒏 𝑨 𝒏𝑩

Pairwise Mean Diff LSD


Comparisons
1 1
|G–P| |61 − 82| 21 2.131 ∗ √153.07 ( + ) 15.2217
6 6

71
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

1 1
|G–I| |61 − 85| 24 2.131 ∗ √153.07 ( + ) 15.2217
6 6

1 1
|P–I| |82 − 85| 3 2.131 ∗ √153.07 ( + ) 15.2217
6 6

Step 3: If 𝑥̅𝐴 − 𝑥̅𝐵 ≥ 𝐿𝑆𝐷, reject the null hypothesis. The pair of means 𝑥̅𝐴 and 𝑥̅𝐵 would be declared
significantly different.
Pairwise
MD LSD Remarks
Comparisons
|G–P| 21 15.2217 G and P are significantly different.
|G–I| 24 15.2217 G and I are significantly different
|P–I| 3 15.2217 P and I are not significantly different.

Step 4: Setup the Descriptives (Means)


The Fisher’s LSD investigated which of the pair of means are significantly different. The results can
now be included in the descriptive table.
Counselling
Mean
Approaches
Group 61b
Peer 82a
Individual 85a
F-test Reject Ho.

TUKEY’S HONEST SIGNIFICANT DIFFERENCE TEST (TUKEY HSD)


• The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the Honest
Significant Difference, a number that represents the distance between groups, to compare
every mean with every other mean.
• The Tukey Test (or Tukey procedure), also called Tukey’s Honest Significant Difference test, is a post-
hoc test based on the studentized range distribution.
• After you have run an ANOVA and found significant results, then you can run Tukey’s HSD to find out
which specific group means (compared with each other) are different. The test compares all possible
pairs of means.

Assumptions for the test


• Observations are independent within and among groups.
• The groups for each mean in the test are normally distributed.
• There is equal within-group variance across the groups associated with each mean in the test
(homogeneity of variance).

To test all pairwise comparisons among means using the Tukey HSD, calculate HSD for each pair of means using
the following formula:
̅𝑨 − 𝒙
𝒙 ̅𝑩
𝑯𝑺𝑫 =
𝑴𝑺𝒘

𝒏
STAT 201: Statistical Methods I

where:
𝑥̅𝐴 − 𝑥̅ 𝐵 is the difference between the pair of means (𝑥̅𝐴 should be larger than 𝑥̅𝐵 )
MSW is the Mean Square Within
n is the number in the group or treatment

Tukey-Kramer Method
If you have unequal sample sizes, you have to calculate the estimated standard deviation for each pairwise
comparison. This is called the Tukey-Kramer Method.
̅𝑨 − 𝒙
𝒙 ̅𝑩
𝑯𝑺𝑫 =
𝑴𝑺 𝟏 𝟏
√ 𝟐 𝒘 (𝒏 + 𝒏 )
𝑨 𝑩

72
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

General Steps:
1. Perform the ANOVA test. Assuming your F value is significant; you can run the post hoc test.
2. Calculate the HSD statistic for the Tukey test using the formula.
3. Find the score in Tukey’s critical value table.
4. If the calculated value of HSD is greater than the critical value, the null hypothesis is rejected. This
implies that the two means are significantly different.

SCHEFFÉ’S METHOD
• The Scheffe Test (also called Scheffe’s procedure or Scheffe’s method) is a post-hoc test used in Analysis
of Variance. It is named for the American statistician Henry Scheffe. After you have run ANOVA and got
a significant F-statistic (i.e. you have rejected the null hypothesis that the means are the same), then you
run Sheffe’s test to find out which pairs of means are significant.
• Used when you want to look at post-hoc comparisons in general (as opposed to just pairwise
comparisons). Scheffe’s controls for the overall confidence level. It is customarily used with unequal
sample sizes.
• Out of the three mean comparisons test you can run (the other two are Fisher’s LSD and Tukey’s HSD).
The Scheffe test is the most flexible, but it is also the test with the lowest statistical power. Deciding
which test to run largely depends on what comparisons you’re interested in:
o If you only want to make pairwise comparisons, run the Tukey procedure because it will have a
narrower confidence interval.
o If you want to compare all possible simple and complex pairs of means, run the Scheffe test as
it will have a narrower confidence interval.

Note: Only perform this test if you have rejected the null hypothesis in an ANOVA test, indicating that the means
are not the same. Otherwise, the means are equal and so there is no point in running this test.
Reject the null hypothesis if the Scheffe test statistic is greater than the critical value.

General Steps:
1. Calculate the absolute values of pair wise differences between sample means, 𝑥̅𝐴 − 𝑥̅𝐵 .
2. Use the following formula to find a set of Scheffe values:
𝟏 𝟏
𝑭𝑺 = √𝒅𝒇𝑩 ∗ 𝑭 ∗ 𝑴𝑺𝑾 ∗ ( + )
𝒏𝑨 𝒏𝑩
where:
𝑑𝑓𝐵 is the between samples degrees of freedom.
F is the F-value (from ANOVA)
MSW is the mean square error (MS within groups from ANOVA).
3. For all values 𝑥̅𝐴 − 𝑥̅𝐵 ≥ 𝐹𝑆 , the null hypothesis is rejected. This implies statistically significant at your
chosen alpha level.

DUNNETT’S CORRECTION
• Dunnett’s Test (also called Dunnett’s Method or Dunnett’s Multiple Comparison) compares means from
several experimental groups against a control group mean to see is there is a difference.
• This post-hoc test is used to compare means, but unlike Tukey, it compares every mean to a control mean.
• One fixed “control” group is compared to all of the other samples, so it should only be used when you
have a control group. If you don’t have a control group, use Tukey’s Test.
STAT 201: Statistical Methods I

As Dunnett’s compares two groups, it acts similarly to a t-test. The following formula gives you a value that
you can use to compare mean differences.The formula is:
𝟐𝑴𝑺𝑾
𝑫𝑫𝒖𝒏𝒏𝒆𝒕𝒕 = 𝒕𝑫𝒖𝒏𝒏𝒆𝒕𝒕 √
𝒏
Steps:
1. Look up the tDunnett critical value in the Dunnett-critical value table (referred to, n, and dfw).
2. If the difference between the control group mean and an experimental group mean is greater than the
critical value of tDunnett, then that difference is significant.

73
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

OTHER POST HOC TESTS


Post hoc tests Descriptions
Duncan’s Multiple Range Duncan’s Multiple Range test (DMRT) is a post hoc test to measure specific
Test (DMRT) differences between pairs of means which was originally designed by David B.
Duncan as a higher-power alternative to Newman–Keuls.
DMRT is more useful than the LSD when larger pairs of means are being
compared, especially when those values are in a table. DMRT tends to require
larger differences between means compared to the LSD, which guards
against Type I error.
Duncan’s Multiple Range Test will identify the pairs of means (from at least
three) that differ.
The MRT is similar to the LSD, but instead of a t-value, a Q Value is used.

This procedure is also based on the general notion of studentized range. The
range of any subset of p sample means must exceed a certain value before any
of the p means are found to be different. This value is called the least significant
range for the p means and is denoted by Rp, where
𝟐𝑴𝑺𝒘
𝑹𝒑 = 𝒓𝒑 √
𝒏
The values of the quantity rp, called the least significant studentized range,
depend on the desired level of significance and the number of degrees of
freedom of the mean square error. These values may be obtained from Q table.
Dunn’s Multiple Dunn’s Test can be used to pinpoint which specific means are significant from
Comparison Test the others. Dunn’s Multiple Comparison Test is a post hoc non-parametric
test (a “distribution free” test that doesn’t assume your data comes from a
particular distribution). It is one of the least powerful of the multiple
comparisons tests and can be a very conservative test–especially for larger
numbers of comparisons.
Dunn vs. Tukey and Dunnett
The Dunn is an alternative to the Tukey test when you only want to test for
differences in a small subset of all possible pairs; For larger numbers of pairwise
comparisons, use Tukey’s instead. Use Dunn’s when you choose to test a
specific number of comparisons before you run the ANOVA and when you are
not comparing to controls. If you are comparing to a control group, use
the Dunnett test instead.
Bonferroni Procedure This multiple-comparison post-hoc correction is used when you are performing
(Bonferonni Correction) many independent or dependent statistical tests at the same time. The problem
with running many simultaneous tests is that the probability of
a significant result increases with each test run. This post-hoc test sets
the significance cut off at α/n. For example, if you are running 20 simultaneous
tests at α=0.05, the correction would be 0.0025. The Bonferroni does suffer
from a loss of power. This is due to several reasons, including the fact that Type
II error rates are high for each test. In other words, it overcorrects for Type I
errors.
The ordinary Bonferroni method is sometimes viewed as too conservative.
Holm-Bonferroni Holm’s sequential Bonferroni post-hoc test is a less strict correction for multiple
Procedure comparisons.
Newman-Keuls Like Tukey’s, this post-hoc test identifies sample means that are different from
STAT 201: Statistical Methods I

each other. Newman-Keuls uses different critical values for comparing pairs
of means. Therefore, it is more likely to find significant differences.
Rodger’s Method Considered by some to be the most powerful post-hoc test for detecting
differences among groups. This test protects against loss of statistical power as
the degrees of freedom increase.
Benjamin-Hochberg (BH) If you perform a very large amount of tests, one or more of the tests will have
procedure a significant result purely by chance alone. This post-hoc test accounts for
that false discovery rate.

74
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

ACTIVITY 07: ONE-FACTOR ANOVA. Perform One-way ANOVA, and if found to have significant difference,
perform post-hoc test.
1. Auditors must make judgments about various aspects of an audit on the basis of their own direct
experience, indirect experience, or a combination of the two. In a study, auditors were asked to make
judgments about the frequency of errors to be found in an audit. The judgments by the auditors were then
compared to the actual results. Suppose the following data were obtained from a similar study; lower
scores indicate better judgments.

Direct Indirect Combination


17.0 16.6 25.2
18.5 22.2 24.0
15.8 20.5 21.5
18.2 18.3 26.8
20.2 24.2 27.5
16.0 19.8 25.8
13.3 21.2 24.2

Use α = 0.05 to test to see whether the basis for the judgment affects the quality of the judgment. What
is your conclusion?

2. A researcher used different laboratory tests in an experiment involving volunteer patients, with the fast
results, in hours, given below. Test the hypothesis that the different laboratory test results have the same
mean at the 0.01 level of significance.

Test A 5 7 10 8 6 3
Test B 8 7 5 5 8
Test C 10 8 9 7 10 8 11 13

TOPIC 5: TWO-WAY ANALYSIS OF VARIANCE (TWO-WAY ANOVA)

The two-way ANOVA compares the mean differences between groups that have been split on two
independent variables (called factors). The primary purpose of a two-way ANOVA is to understand if there is an
interaction between the two independent variables on the dependent variable. For example, you could use a two-
way ANOVA to understand whether there is an interaction between gender and educational level on test anxiety
amongst university students, where gender (males/females) and education level (undergraduate/postgraduate) are
your independent variables, and test anxiety is your dependent variable. Alternately, you may want to determine
whether there is an interaction between physical activity level and gender on blood cholesterol concentration in
children, where physical activity (low/moderate/high) and gender (male/female) are your independent variables,
and cholesterol concentration is your dependent variable.

The interaction term in a two-way ANOVA informs you whether the effect of one of your independent
variables on the dependent variable is the same for all values of your other independent variable (and vice versa).
For example, is the effect of gender (male/female) on test anxiety influenced by educational level
(undergraduate/postgraduate)? Additionally, if a statistically significant interaction is found, you need to
determine whether there are any "simple main effects", and if there are, what these effects are (we discuss this
later in our guide).
STAT 201: Statistical Methods I

Assumptions:
1. Dependent variable should be measured at the continuous level (i.e., they are interval or ratio variables).
2. Two independent variables should each consist of two or more categorical, independent groups.
3. Independence of observations, which means that there is no relationship between the observations in
each group or between the groups themselves.
4. No significant outliers.
5. Dependent variable should be approximately normally distributed for each combination of the groups of
the two independent variables.
6. Variances for each combination of the groups of the two independent variables are homogenous.

The simplest type of factorial experiment involves only two factors, say A, and B. There are a levels of

75
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

factor A and b levels of factor B. This two-factor factorial is shown below:

Factors B
Totals Averages
1 2 ⋯ b
𝑦111 , 𝑦112 , 𝑦121 , 𝑦122 , 𝑦1𝑏1 , 𝑦1𝑏2 ,
1 ⋯ 𝑦1∙∙ 𝑦̅1∙∙
⋯, 𝑦11𝑛 ⋯, 𝑦12𝑛 ⋯, 𝑦1𝑏𝑛
𝑦211 , 𝑦221 , 𝑦2𝑏1 ,
2 𝑦212 , ⋯, 𝑦222 , ⋯, ⋯ 𝑦2𝑏2 , ⋯, 𝑦2∙∙ 𝑦̅2∙∙
Factors A 𝑦21𝑛 𝑦22𝑛 𝑦2𝑏𝑛
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑦𝑎11 , 𝑦𝑎21 , 𝑦𝑎𝑏1 ,
a 𝑦𝑎12 , ⋯, 𝑦𝑎22 , ⋯, ⋯ 𝑦𝑎𝑏2 , ⋯, 𝑦𝑎∙∙ 𝑦̅𝑎∙∙
𝑦𝑎1𝑛 𝑦𝑎2𝑛 𝑦𝑎𝑏𝑛
Totals 𝑦∙1∙ 𝑦∙2∙ ⋯ 𝑦∙𝑏∙ 𝑦∙∙∙
Averages 𝑦̅∙1∙ 𝑦̅∙2∙ ⋯ 𝑦̅∙𝑏∙ 𝑦̅∙∙∙

The experiment has n replicates, and each replicate contains all ab treatment combinations. The
observation in the ijth cell for the kth replicate is denoted by yijk. In performing the experiment, the abn observations
would be run in random order. Thus, like the single-factor experiment, the two-factor factorial is a completely
randomized design and the analysis of variance (ANOVA) will be used to test its hypotheses. Since there are two
factors in the experiment, the test procedure is sometimes called the two-way analysis of variance.

STATISTICAL ANALYSIS OF THE FIXED-EFFECTS MODEL


The analysis of variance can be used to test hypotheses about the main factor effects of A and B and the
AB interaction.

Notations for Totals and 𝑏 𝑛


𝑦𝑖∙∙
Means 𝑦𝑖∙∙ = ∑ ∑ 𝑦𝑖𝑗𝑘 𝑦̅𝑖∙∙ =
𝑏𝑛
𝑖 = 1,2, ⋯ , 𝑎
𝑗=1 𝑘=1
𝑎 𝑛
𝑦∙𝑛∙
𝑦∙𝑗∙ = ∑ ∑ 𝑦𝑖𝑗𝑘 𝑦̅∙𝑗∙ = 𝑗 = 1,2, ⋯ , 𝑏
𝑎𝑛
𝑖=1 𝑘=1
𝑛
𝑦𝑖𝑗∙
𝑦𝑖𝑗∙ = ∑ 𝑦𝑖𝑗𝑘 𝑦̅𝑖𝑗∙ = 𝑖 = 1,2, ⋯ , 𝑎
𝑛
𝑘=1
𝑎 𝑏 𝑛
𝑦∙∙∙
𝑦∙∙∙ = ∑ ∑ ∑ 𝑦𝑖𝑗𝑘 𝑦̅∙∙∙ = 𝑗 = 1,2, ⋯ , 𝑏
𝑎𝑏𝑛
𝑖=1 𝑗=1 𝑘=1

The hypotheses that we will test are as follows:


1. Ho: 𝜏1 = 𝜏2 = ⋯ 𝜏𝑎 = 0 (No main effect of factor A)
Ha: at least one 𝜏𝑖 ≠ 0
2. Ho: 𝛽1 = 𝛽2 = ⋯ 𝛽𝑏 = 0 (No main effect of factor B)
Ha: at least one 𝛽𝑗 ≠ 0
3. Ho: 𝜏𝛽11 = 𝜏𝛽12 = ⋯ 𝜏𝛽𝑎𝑏 = 0 (No interaction of A and B)
Ha: at least one 𝜏𝛽𝑖𝑗 ≠ 0
STAT 201: Statistical Methods I

As before, the ANOVA tests these hypotheses by decomposing the total variability in the data into
component parts and then comparing the various elements in this decomposition.

Computing Formulas 𝒂 𝒃 𝒏
𝒚∙∙∙ 𝟐
for ANOVA: Two 𝑺𝑺𝑻 = ∑ ∑ ∑ 𝒚𝒊𝒋𝒌 𝟐 −
𝒂𝒃𝒏
Factors 𝒊=𝟏 𝒋=𝟏 𝒌=𝟏

𝒂
𝒚𝒊∙∙ 𝟐 𝒚∙∙∙ 𝟐
𝑺𝑺𝑨 = ∑ −
𝒃𝒏 𝒂𝒃𝒏
𝒊=𝟏

76
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

𝒃
𝒚∙𝒋∙ 𝟐 𝒚∙∙∙ 𝟐
𝑺𝑺𝑩 = ∑ −
𝒂𝒏 𝒂𝒃𝒏
𝒋=𝟏

𝒂 𝒃
𝒚𝒊𝒋∙ 𝟐 𝒚∙∙∙ 𝟐
𝑺𝑺𝑨𝑩 = ∑ ∑ − −𝑺𝑺𝑨 − 𝑺𝑺𝑩
𝒏 𝒂𝒃𝒏
𝒊=𝟏 𝒋=𝟏
𝑺𝑺𝑬 = 𝑺𝑺𝑻 − 𝑺𝑺𝑨𝑩 − 𝑺𝑺𝑨 − 𝑺𝑺𝑩

ANOVA for a Randomized Complete Block Design


Source of Sums of
df Mean Square F-ratio
Variation Squares
𝑆𝑆𝐴 𝑎−1 𝑆𝑆𝐴 𝑀𝑆𝐴
A Treatments 𝑀𝑆𝐴 =
𝑎−1 𝑀𝑆𝐸
𝑆𝑆𝐵 𝑏−1 𝑆𝑆𝐵 𝑀𝑆𝐵
B Treatments 𝑀𝑆𝐵 =
𝐵−1 𝑀𝑆𝐸
𝑆𝑆𝐴𝐵 (𝑎 − 1)(𝑏 − 1) 𝑆𝑆𝐴𝐵
Interactions 𝑀𝑆𝐴𝐵 =
(𝑎 − 1)(𝑏 − 1)
Error 𝑆𝑆𝐸 𝑎𝑏(𝑛 − 1)
𝑺𝑺𝑻 𝑎𝑏𝑛 − 1 𝑆𝑆𝐸
Total 𝑀𝑆𝐸 =
𝑎𝑏(𝑛 − 1)

Example: To illustrate, a researcher wishes to investigate the effects of outreach activities (EOA) and
socio-economic status (SES) on the social responsibility of teachers. There are two
independent variables: outreach activities and SES. The teachers may be classified into two
groups, one exposed to outreach activities and the other group not exposed to the same. This
independent variable, therefore, have two levels: with exposure and without exposure. The
SES factor has three levels: High SES, Average SES, and Low SES. If the criterion variable
(social responsibility) is of interval type (i.e., the instrument yields score points for the
subjects), then two-way Analysis of Variance may be applied to the data. The factorial design
is known as a 2x3 ANOVA.

Socio-Economic Exposure to Outreach Activities


Status With exposure Without exposure
8 4
High 4 4
3 3
20 20
Average 16 11
15 22
15 8
Low 18 5
2 10

Steps in Applying the Two-Way ANOVA


Step 1: Statement of the null and alternative hypotheses.
STAT 201: Statistical Methods I

Ho: There is no significant effect of the exposure to outreach activities on the social responsibility
on the teachers. (main effect of factor A)
There is no significant effect of the socio-economic status on the social responsibility of the
teachers. (main effect of factor B)
There will be no interaction effect of the exposure to outreach activities and socio-economic
status on the social responsibility of teachers. (AxB interaction)

Ha: There is a significant effect of the exposure to outreach activities on the social responsibility
on the teachers.
There is a significant effect of the socio-economic status on the social responsibility of the
teachers.

77
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

There is an effect of the interaction of the exposure to outreach activities and socio-economic
status on the social responsibility of teachers.

Step 2: Calculate the test statistics (Raw Score Method).


a. Find the sum of each cell, marginal totals and the grand total.
b. Compute for the squares of scores, and the sum of squared raw scores.

Exposure to
Socio-Economic Outreach Activities
Status With Without
exposure exposure Totals
X 15 11 26
High
X2 89 41 130
X 51 53 104
Average
X2 881 1,005 1,886
X 35 23 58
Low
X2 553 189 742
(X)T 101 87 188
Totals
(X2)T 1,523 1,235 2,758

c. Compute Sums of Squares


SST (Sums of Squares for the Total Variability)
(𝚺𝑿)𝟐
𝑺𝑺𝑻 = 𝚺𝑿𝟐 −
𝚺𝑵
(188)2
𝑆𝑆𝑇 = 2,758 −
18
𝑆𝑆𝑇 = 794

SSB (Sums of Squares for Between Group Variability)


(𝚺𝑿𝒊 )𝟐 (𝚺𝑿)𝟐
𝑺𝑺𝑩 = (∑ )−
𝑵𝒊 𝚺𝑵
152 512 352 112 532 232 1882
𝑆𝑆𝐵 = + + + + + − = 2,502 − 1,964
3 3 3 3 3 3 16
𝑆𝑆𝐵 = 538

SSW (Sums of Squares for Within Group Variability)


𝑺𝑺𝑾 = 𝑺𝑺𝑻 − 𝑺𝑺𝑩
𝑆𝑆𝑊 = 794 − 538
𝑆𝑆𝑊 = 256

d. Partition of the between-group variance (SSB) into three independent variances:


SSEOA (variance due to Factor A: outreach activities)
(𝚺𝑿𝑨𝒊 )𝟐 (𝚺𝑿𝒕 )𝟐
𝑺𝑺𝑬𝑶𝑨 = (∑ )−
𝑵𝑨𝒊 𝚺𝑵𝒕
(101)2 (87)2 (188)2
𝑆𝑆𝐸𝑂𝐴 = + −
9 9 18
𝑆𝑆𝐸𝑂𝐴 = 10

SSB (variance due to Factor B: Socio-Economic Status)


STAT 201: Statistical Methods I

(𝚺𝑿𝑨𝒊 )𝟐 (𝚺𝑿𝒕 )𝟐
𝑺𝑺𝑺𝑬𝑺 = (∑ )−
𝑵𝑨𝒊 𝚺𝑵𝒕
(26)2 (104)2 (58)2 (188)2
𝑆𝑆𝑆𝐸𝑆 = + + −
6 6 6 18
𝑆𝑆𝑆𝐸𝑆 = 513

SSAxB (variance due to interaction of A and B)


𝑺𝑺𝑨𝒙𝑩 = 𝑺𝑺𝑩 − 𝑺𝑺𝑶𝑹𝑨 − 𝑺𝑺𝑺𝑬𝑺
𝑆𝑆𝑊 = 538 − 10 − 513
𝑆𝑆𝑊 = 15

78
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

e. Determine the degree of freedom.


𝒅𝑭𝑬𝑶𝑨 = 𝑨 − 𝟏 = 2 − 1 = 1
𝒅𝑭𝑺𝑬𝑺 = 𝑩 − 𝟏 = 3 − 1 = 2
𝒅𝑭𝑨𝒙𝑩 = (𝑨 − 𝟏) ∗ (𝑩 − 𝟏) = 1 ∗ 2 = 2
𝒅𝑭𝑾 = 𝑵𝒕 − 𝑨𝒙𝑩 = 18 − 6 = 12

e. Compute for the Mean Squares (MS), by dividing each SS by its df.
𝑺𝑺𝑬𝑶𝑨 10
𝑴𝑺𝑬𝑶𝑨 = = = 10
𝒅𝑭𝑬𝑶𝑨 1
𝑺𝑺𝑺𝑬𝑺 513
𝑴𝑺𝑺𝑬𝑺 = = = 256.5
𝒅𝑭𝑺𝑬𝑺 2
𝑺𝑺𝑨𝒙𝑩
𝑴𝑺𝑨𝒙𝑩 = = 15 = 7.5
𝒅𝑭𝑨𝒙𝑩
𝑺𝑺𝑾 256
𝑴𝑺𝑾 = = = 21.33
𝒅𝑭𝑾 12

f. Compute for the F-ratios by dividing MS for A, B, and AxB by MSW.


𝑴𝑺𝑬𝑶𝑨 10
𝐹𝐸𝑂𝐴 = = = 0.469
𝑴𝑺𝑬𝑶𝑨 21.33
𝑴𝑺𝑺𝑬𝑺 256.5
𝐹𝑆𝐸𝑆 = = = 12.025
𝑴𝑺𝑺𝑬𝑺 21.33
𝑴𝑺𝑨𝒙𝑩 7.5
𝐹𝐴𝑥𝐵 = = = 0.3516
𝑴𝑺𝑨𝒙𝑩 21.33

Step 3: Determine the critical values of F.


At =0.05,
Factor A: Fcrit(1,12) = 4.75
Factor B: Fcrit(2,12) = 6.93
Factor AxB: Fcrit(2,12) = 6.93

Step 4: Construct ANOVA Summary Table.


Sums of Mean of
Source of Variation dF F-Ratio
Squares Squares
Exposure to
10 1 10 0.469
Outreach Activities
Socio-Economic
513 2 256.5 12.025*
Status
Interaction 15 2 7.5 0.3516
Total 256 12
**The mean difference is significant at 0.01 level.

Step 5: Determine the significance of the computed F-ratios with dF associated with the
numerator and denominator of each F.

Step 6: Determine the p-values.

Step 7: Interpretation:
• There is no significant effect of the exposure to outreach activities on the social
responsibility of the teachers (F0.05,1,12=0.469, p>0.05).
STAT 201: Statistical Methods I

• The socio-economic status, however, is found to have a significant effect, with the
Average SES showing the highest level of social responsibility (𝑥̅ = 17.33) among the
three groups (F0.05,2,12=12.025, p<0.01).
• Lastly, there is no effect of the interaction of the exposure to outreach activities and socio-
economic status on the social responsibility of teachers (F0.05,2,12=0.3516, p>0.05).

REFERENCES:
• Anderson, David R., D. J. Sweeney, and T.A. Williams. Statistics for Business and Economics, 11th
Edition. South-Western, Cengage Learning, 2011.
• D.C. Montgomery and G.C. Runger, Applied Statistics and Probability for Engineers, 5th Edition, John

79
CHAPTER III: TEST OF DIFFERENCE BETWEEN MEANS

Wiley & Sons, Inc., 2011.


• R.E. Walpole. R.H. Myers, S.L. Myers and K. Ye, Probability and Statistics for Engineers and
Scientists, 9th Edition, Pearson International Edition, 2012.
• Zulueta, F. M. and Nestor Edilberto B. Costales, Jr. (2005). Methods of Research: Thesis Writing and
Applied Statistics. Mandaluyong City: National Bookstore, Inc.
• https://ncss-wpengine.netdna-ssl.com

STAT 201: Statistical Methods I

80

You might also like