Notes for Chapter 4
Notes for Chapter 4
Definition
Analysis of Variance (ANOVA): Definition and Purpose
ANOVA works by partitioning the total variance observed in the data into two
components:
The core idea is to compare the between-group variance to the within-group variance.
If the between-group variance is significantly larger than the within-group variance,
this suggests that the group means are not all equal and that at least one group
differs from the others.
The statistical test used in ANOVA is the F-test, which produces an F-statistic. This
statistic is then compared to a critical value from the F-distribution to determine
statistical significance.
Types of ANOVA
Two-way ANOVA: Examines the effect of two independent variables and can
also test for interactions between them.
Example Applications
Medical research: Comparing the effectiveness of different treatments across
patient groups.
Interpretation
If the ANOVA test indicates a statistically significant difference, not all group means
are equal. However, ANOVA does not specify which groups differ from each other;
further post-hoc tests (such as Tukey's HSD) are needed to identify specific group
differences.
Historical Context
ANOVA was developed by Ronald Fisher in 1918 and became widely used after being
featured in his 1925 book, "Statistical Methods for Research Workers". It is considered
an extension of the t-test to more than two groups.
Summary Table
Developed
Ronald Fisher 1918
by
ANOVA is a foundational tool in statistics for comparing multiple group means and
understanding whether observed differences are likely due to chance or reflect real
effects.
Explanation of the Slide
Key Points:
Research Question:
"Which advertising theme results in the highest preference for the new VW
Roadster?"
This asks which of the three themes leads to the highest average preference
score among respondents.
Treatments:
The three advertising themes (Sport, Family, Environment) are considered
different "treatments" or experimental conditions.
Preference Measurement:
Participants rate their preference for the car after seeing each ad theme on a
10-point scale. The bar chart shows the average preference score for each
theme, with error bars indicating variability (e.g., standard error).
Statistical Comparison:
The red text ("How to measure differences across treatments? Several t-Tests?")
highlights a common statistical challenge. While you could compare each pair of
themes using multiple t-tests, this increases the risk of Type I error (false
positives).
Interpretation of Results:
The chart shows that the "Sport" theme has the highest average preference,
followed by "Environment," with "Family" lowest. The notation "p < .05"
indicates that at least one of the differences between group means is
statistically significant (the probability that the observed differences are due to
chance is less than 5%).
Summary:
This slide uses the VW Roadster ad example to illustrate why ANOVA is used when
comparing more than two groups. It avoids the pitfalls of multiple t-tests and provides
a robust answer to which advertising theme is most effective.
Alpha-error
• Statistical tests (e.g., t-test) are calculated using a certain significance level (α
level)
• Having 3 groups, 3 t-tests are required to compare groups with each other (1- 2, 1-3,
and 2-3) • If each test uses a 5% significance level, the following applies to each test:
the probability of incorrectly rejecting the null hypothesis (type 1 error) is 5%
• Thus, the probability of no type 1 error (counter probability): 95%. If the 3 tests are
independent of each other, the probabilities can be multiplied
The p-value from your statistical test is compared to alpha. If p < α, you reject the
null hypothesis, but there is still a probability (alpha) that this is a false positive
What
Is This Slide About?
This slide shows a simple example of how to use ANOVA (Analysis of Variance) to
compare people’s preferences for the new VW Roadster after seeing three different
types of advertisements: Sport, Family, and Environment.
Respondent Preference: Each column lists how much each person liked the
car after seeing one of the ad types. The numbers are their scores (out of 10).
o For example, after seeing the Sport ad, R06 gave a score of 10, R11 gave
a 9, etc.
Mean Y_bar: This is the overall average score, combining all groups.
o Add all scores (45+10+20 = 75), divide by total number of people (15), so
the mean is 5.
The Sport ad group has the highest average preference (9 out of 10), meaning
people liked the car most after seeing the Sport ad.
The Family ad group has the lowest average preference (2 out of 10).
These averages are compared using ANOVA to see if the differences are
statistically significant (real differences, not just due to random chance).
If the ANOVA test says the difference is significant, we can confidently say that
the type of ad does affect how much people like the car.
Interpretation
Possible statement:
The significance test only states that at least one experimental group differs
significantly from one other
>>𝑋1 =𝑋2 =𝑋3 is not true. It makes no statement about which groups differ in
pairs.
Solution
ANOVA - Example
• It conquered the black market in the belief that it makes men better lovers
• DV: "objective measure of libido which was measured over the course of a week
Grand Mean: The average across all scores from all groups (3.467).
Grand SD (Standard Deviation): The overall spread of all scores (1.767).
Grand Variance: The overall variance of all scores (3.124).
High Dose group has the highest average score (5.00), suggesting the
strongest effect.
Placebo group has the lowest average (2.20), as expected if the drug works.
The data shows that higher doses lead to higher average scores.
The ANOVA test confirms that these differences are real and not just
luck.
Conclusion: The dose of Viagra has a significant effect on the measured
outcome (e.g., libido).
Post-Hoc-Tests
What Are Post-Hoc Tests?
Post-hoc tests are additional statistical tests you perform after an ANOVA shows that
there are significant differences between group means. The word "post-hoc" means
"after the event" in Latin.
ANOVA tells you that at least one group is different from the others, but it does
not tell you which groups are different.
Post-hoc tests help you find out exactly which pairs of groups (e.g., Placebo
vs. Low Dose, Low Dose vs. High Dose, Placebo vs. High Dose) are significantly
different from each other.
They adjust for the fact that you are making multiple comparisons, which helps
control the risk of making a Type I error (false positive).
Scheffé’s test: Very conservative, used when you want to be extra careful
about false positives.
From the ANOVA table, you know there is a significant difference between the Placebo,
Low Dose, and High Dose groups (p = 0.025).
A post-hoc test would tell you, for example:
Simple Summary
This table shows the results of post-hoc tests (specifically, Tukey’s HSD), which
compare each pair of groups to find out exactly where the differences are.
Mean Difference (I-J): The difference in average scores between each pair.
Interpretation:
The only significant difference is between the Placebo and High Dose groups
(p = 0.021), meaning High Dose significantly increases libido compared to
Placebo.
All other comparisons (Placebo vs. Low Dose, Low Dose vs. High Dose)
are not statistically significant (p > 0.05).
Alcohol does not increase attractiveness ratings for males; in fact, after 4 pints,
males rate attractiveness much lower.
Females’ ratings are less affected by alcohol, with only a slight decrease after 4
pints.
Males’ ratings become much more inconsistent (higher variance) after drinking.
The so-called "beer-goggles effect" (the idea that people seem more attractive
after drinking) is not supported by this data for males; it may even be the
opposite.
Drinking 4 pints of alcohol leads to a much lower attractiveness
rating compared to drinking none or just 2 pints.
The "beer-goggles effect" (the idea that people seem more attractive after
drinking) is not supported; in fact, heavy drinking (4 pints) makes people rate
others as less attractive.
Task 2: Please work with “Teach.sav”
• Indifference (indifferent)