0% found this document useful (0 votes)
25 views

Chapter Five

Project 5

Uploaded by

Mõ Hãzàrd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Chapter Five

Project 5

Uploaded by

Mõ Hãzàrd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter Five

Analysis of Variance
.

1
Analysis of Variance (ANOVA)
• is a procedure to test the hypothesis that several populations have the same mean (it
is used to test the equality of several means).
• When testing for differences in means of more than two populations, we usually do
not proceed by considering all combinations of two populations at a time and testing
for differences in each pair.
1. Such an approach would require several tests rather than just one.
2. If each individual test were conducted using a level of significance of say α = 0.05,
then the overall level of significance would be higher than 0.05.
For example, if Ho: μ1 = μ2 = μ3, α (probability of rejecting a true Ho) = 0.143 (i.e.1-0.953).
2
Count…
• Thus, needs to test simultaneously for differences among the means of all the
populations, and to joint level of significance of the test to be α.
• To perform this test the F-distribution and ANOVA method are applied.

In order to use ANOVA, we assume the following:


1. All the samples were randomly selected and are independent of one another.
2. The populations are normally distributed. If however, the sample sizes are large
enough, we do not need the assumption of normality.
3. All the population variances are equal.

3
Count…
• ANOVA is based on a comparison of two different estimates of the variances, σ2, of overall
population.
1. The variance obtained by calculating the variation within the samples themselves, Mean
Square within (MSW).
2. The variance obtained by calculating the variation among sample means, Mean Square
between (MSB).
• Since both are estimates of σ2, they should be approximately equal in value when, the Ho is
true. If the Ho is not true, these two estimates will differ considerably.
• The three steps in ANOVA are:
1. Determine one estimate of the population variance from the variation among sample means
2. Determine a 2nd estimate of the population variance from the variation within the samples
3. Compare these two estimates. If they are approximately equal in value, accept
4
the Ho.
Calculating the Variance among the Sample Means – MSB
• The variance among the sample means is called Between Column Variance or Mean
Square between (MSB).
2
2
σ 𝑋−𝑋
Sample variance = 𝑆 =
𝑛−1
• Because of working with sample means and the grand mean, substitute 𝑋 for X, 𝑋ധ
for 𝑋, and K (number of samples) for n then;
2

2 σ 𝑋−𝑋ന
Variance among sample means = 𝑆𝑋 =
𝐾−1
• In sampling distribution of the mean, the standard error of the mean is calculated as
𝜎
𝜎𝑋 = . Cross multiplying the terms
𝑛
𝟐 𝟐
𝜎 = 𝜎𝑋 𝑛. Squaring both sides 𝝈 = 𝝈𝑿 ∗ 𝒏.
5
Count…
• In ANOVA, we do not have all the information needed to use the above

equation to find σ2. Specifically, we do not know 𝜎𝑋2 . However, we calculate


2
σ 𝑋−𝑋ധ
the variance among the sample means, 𝑆𝑋2 , using 𝑆𝑋2 = .
𝐾−1

• So, substitute 𝑆𝑋2 for 𝜎𝑋2 & calculate an estimate of the population variance is:

𝟐 𝟐
𝟐 ന
σ 𝒏 𝑿−𝑿 ന
𝒏 σ 𝑿−𝑿
𝝈𝒙 = 𝑺𝟐𝑿 ∗ 𝒏 = = , 𝐼𝑓𝑛1 , 𝑛2 , … . . . . 𝑛𝑘 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙.
𝑲−𝟏 𝑲−𝟏

6
Which sample size to use?
• n represents the sample size, but which sample size should we use when different
samples have different sizes?
2
• We solve this problem by multiplying 𝑋𝑗 − 𝑋ധ by its won/ gained/ appropriate nj,
2
and 𝜎𝑋 becomes:
2
2 σ 𝑛𝑗 𝑋𝑗 −𝑋ധ
MSB =𝜎 = .
𝐾−1
• Where:
2
𝜎 = First estimate of the population variance based on the variation among
sample means (the Between Column Variance – MSB)
nj = the size of the jth sample
𝑋𝑗 = the sample mean of the jth sample
𝑋ധ = the grand mean
K = the number of samples
K-1 = the degrees of freedom associated with SSB,v 7
Calculating the Variance with In the Samples (MSW)
➢ It is based on the variation of the sample observations within each sample.
➢ It is called the within column variance or Mean Square Within (MSW). The sample
𝟐
σ 𝑿−𝑿
variance for each sample is calculated as, 𝑺𝟐 = 𝒏−𝟏
.

• Since, assumed that the variances of the populations from which samples have been
drawn are equal, we could use any one of the sample variances as the second
estimate of the population variance. Statistically, we can get a better estimate of the
population variance by using a weighted average of all sample variances.

8
count…
• The general formula for this second estimate of 𝜎 2 is:
𝟐 σ𝒌 𝟐
𝒊=𝟏 𝒏𝒋 −𝟏 𝑺𝒋 2 𝑛−1 σ𝑘 2
𝑖=1 𝑆𝑗
MSW = 𝝈 = 𝒏𝑻 −𝒌
, If n1, n2,…, nk are equal MSW = 𝜎 = .
𝑘 𝑛−1
• Where:
2
𝜎 = Second estimate of the population variance based on the variation within
the samples (the Within Column Variance – MSB)
nj = the size of the jth sample
nj-1 = degree of freedom in each sample
nT – k = degrees of freedom associated with SSB
𝑆𝑗2 = The sample variance of jth sample
K = the number of samples
nT = Σnj = the total sample size = n1 + n2 + …….. + nk. 9
count…
• The estimate of population variance based on variation that exists between sample means
(MSB) is some what suspect because it is based on the notion that all the populations have the
same mean. That is, the estimate MSB is a good estimate of the σ2 only if Ho is true and all
the populations’ means are equal; μ1 = μ2 = μ3 = ------ = μk.
• If the unknown population means are not equal, and probably are radically different from one

another, then the sample means (𝑋𝑗 ) will most likely be radically different from each other

too. This difference will have a marked effect on MSB. That is to say, the 𝑋𝑗 values will vary
2
a great deal and the 𝑋𝑗 − 𝑋 terms will be large. Thus, if the population means are not all
equal, then the MSB estimate will be large relative to the MSW estimate. That is, is the MSB
is large relative to the MSW, and then the hypothesis that all the population means are equal is
not likely to be true. 10
count…
• The important question is, of course, How large is “large?” also, how do we measure
the relative sizes of the two variance estimates? The answer to these questions is
given by the F-distribution.
• If k samples of nj (j = 1, 2… k) items of each are taken from k normal populations
that have equal variances and for which the hypothesis Ho: μ1 = μ2 = …= μk is true,
then the ratio of the MSB to the MSW is an F-value that follows an F-probability
distribution.
𝑀𝑆𝐵
𝐹=
𝑀𝑆𝑊

11
THE F-DISTRIBUTION
• Characteristics of F-distribution
1. It is a continuous probability distribution
2. It is unimodal
3. It has two parameters; pair of degrees of freedom, ν1 and ν2
ν1 = the number of degrees of freedom in the numerator of F-ratio; ν1 = k – 1
ν2 = the number of degrees of freedom in the denominator of F-ratio; ν2 = nT - k
4. It is a positively skewed distribution, and tends to get more symmetrical as the
degrees of freedom in the numerator and denominator increase.
𝜈
5. The mean for an F-distribution is 2 , for ν2 > 2; and
𝜈2 −2
2𝜈22 𝜈1 +𝜈2 −2
the standard deviation is for ν2 > 4.
𝜈1 𝜈2 −2 2 𝜈2 −4
12
Example 1

The training director of a company is trying to evaluate three different methods of


training new employees. The first method assigns each to an experienced employee
for individual help in the factory. The second method puts all new employees in a
training room separate from the factory, and the third method uses training films and
programmed learning materials. The training director chooses 18 new employees
assigned at random to the three training methods and records their daily production
after they complete the programs. Below are productivity measures for individuals
trained by each method.

13
Count…
Method 1 Method 2 Method 3
45 59 41
40 43 37
50 47 43
39 51 40
53 39 52
44 49 37
271 288 250
𝑋1 = 45.17 𝑋2 = 48.00 𝑋3 = 41.67 𝑋ധ = 44.94
𝑆12 = 30.17 𝑆22 = 47.60 𝑆32 = 31.07

At the 0.05 level of significance, do the three training methods lead to different levels
of productivity?
14
Solution
1. Ho: μ1 = μ2 = μ3
Ha: μ1, μ2, and μ3 are not all equal
2. α = 0.05
ν1 = K – 1= 3 - 1 = 2 and ν2 = nT - k= 18 – 3 = 15
F0.05, 2,15 = 3.68, Reject Ho if sample F > 3.68
3. Sample F
2
σ 𝑛𝑗 𝑋𝑗 −𝑋 6 45.17−44.94 2 + 48.00−44.94 2 + 41.67−44.94 2 120.66
MSB = = = = 60.33
𝐾−1 3−1 2
σ 𝑛𝑗 −1 𝑆12 5 30.17+47.60+31.07 108.84
MSW = = = = 36.28
𝑛𝑇 −𝐾 15 3
𝑀𝑆𝐵 60.33
𝐹= = = 1.663
𝑀𝑆𝑊 36.28
4. Do not reject Ho. There are no differences in the effects of the three training programs
(methods) on employee productivity. 15
Example 2
A department store chain is considering building a new store at one of the four
different sites. One of the important factors in the decision is the annual household
income of the residents of the four areas. Suppose that, in a preliminary study, various
residents in each area are asked what their annual household incomes are. The results
are shown in the accompanying table below. Is there sufficient evidence to conclude
that differences exist in the average annual household incomes among the four
communities? Use α = 0.01.

16
Count…
Area 1 Area 2 Area 3 Area 4
25 32 27 18
27 35 32 23
21 30 48 29
17 46 25 26
29 32 20 42
30 22 12
19 18
51
27
159 294 182 138
𝑋1 = 26.50 𝑋2 = 32.67 𝑋3 = 26.00 𝑋4 = 27.60 𝑋ധ = 28.63

𝑆12 = 26.30 𝑆22 = 107.5 𝑆32 = 136.33 𝑆42 = 81.30

17
Solution
1. Ho: μ1 = μ2 = μ3 = μ4
Ha: μ1, μ2, μ3 and μ4 are not all equal
2. α = 0.01
ν1 = K - 1= 4 - 1 = 3 and ν2 = nT - k= 27 – 4 = 23
F0.01, 3,23 = 4.76, Reject Ho if sample F > 4.76
3. Sample F
2
σ 𝑛𝑗 𝑋𝑗 −𝑋 6 26.5−28.63 2 +9 32.67−28.63 2 +7 26.00−28.63 2 +5 27.60−28.63 2
MSB = =
𝐾−1 4−1
227.84
= = 75.95
3

σ 𝑛𝑗 −1 𝑆1
2
5 26.3 +8 107.5 +6 136.33 +4 81.3 2134.68
MSW = = = = 92.81
𝑛𝑇 −𝐾 27−4 23
𝑀𝑆𝐵 75.95
𝐹 = 𝑀𝑆𝑊 = 92.81 = 0.82
4. Do not reject Ho. No difference exists in the average annual household incomes among the
four communities. 18

You might also like