Power and Effect Size
Power and Effect Size
→ p </= alpha: reject H0, different distribution. → Power is the probability that a test of significance will
→ p > alpha: fail to reject H0, same distribution. pick up on an effect that is present.
→ Where
→ Power is the probability that a test of significance will
→ Significance level (alpha): Boundary for specifying a
detect a deviation from the null hypothesis, should
statistically significant finding when interpreting the p-
such a deviation exist.
value.
→ We can see that the p-value is just a probability and that → Mathematically, power is 1 – beta. The power of a
in actuality the result may be different. The test could be hypothesis test is between 0 and 1; if the power is close
wrong. Given the p-value, we could make an error in our to 1, the hypothesis test is very good at detecting a
interpretation false null hypothesis. Beta is commonly set at 0.2, but
may be set by the researchers to be smaller.
2.0 TYPE OF ERROR
→ Consequently, power may be as low as 0.8, but may be
TYPE I ERROR higher. Powers lower than 0.8, while not impossible,
would typically be considered too low for most areas
→ Reject the null hypothesis when there is in fact no
of research.
significant effect (false positive). The p-value is
optimistically small. → When interpreting statistical power, we seek
TYPE II ERROR experiential setups that have high statistical power.
→ Not reject the null hypothesis when there is a significant → Low Statistical Power: Large risk of committing Type II
effect (false negative). The p-value is pessimistically errors, e.g. a false negative.
large.
→ High Statistical Power: Small risk of committing Type II
error
→ Statistical Power. The probability of accepting the → The p-value is not enough
alternative hypothesis if it is true.
→ A lower p-value is sometimes interpreted as meaning
→ All four variables are related. For example, a larger there is a stronger relationship between two variables.
sample size can make an effect easier to detect, and the However, statistical significance means that it is
statistical power can be increased in a test by increasing unlikely that the null hypothesis is true (less than 5%).
the significance level.
→ Therefore, a significant p-value tells us that an
→ A power analysis involves estimating one of these four intervention works, whereas an effect size tells us how
parameters given values for three other parameters. much it works.
This is a powerful tool in both the design and in the
analysis of experiments that we wish to interpret using → It can be argued that emphasizing the size of effect
statistical hypothesis tests. promotes a more scientific approach, as unlike
significance tests, effect size is independent of sample
→ For example, the statistical power can be estimated size.
given an effect size, sample size and significance level.
Alternatively, the sample size can be estimated given → To compare the results of studies done in different
different desired levels of significance. settings
→ Perhaps the most common use of a power analysis is in → Unlike a p-value, effect sizes can be used to
the estimation of the minimum sample size required for quantitatively compare the results of studies done in a
an experiment. different setting. It is widely used in meta-analysis.
→ As a practitioner, we can start with sensible defaults for HOW TO CALCULATE AND INTERPRET EFFECT SIZES
some parameters, such as a significance level of 0.05
and a power level of 0.80. We can then estimate a → Effect sizes either measure the sizes of associations
desirable minimum effect size, specific to the between variables or the sizes of differences
experiment being performed. A power analysis can then between group means.
be used to estimate the minimum sample size required
COHEN'S D
→ In addition, multiple power analyses can be performed → Cohen's d is an appropriate effect size for the
to provide a curve of one parameter against another, comparison between two means. It can be used, for
such as the change in the size of an effect in an example, to accompany the reporting of t-test and
experiment given changes to the sample size. More ANOVA results. It is also widely used in meta-analysis.
elaborate plots can be created varying three of the
parameters. This is a useful tool for experimental → Cohen suggested that d = 0.2 be considered a 'small'
design. effect size, 0.5 represents a 'medium' effect size and 0.8
a 'large' effect size. This means that if the difference
3.0 EFFECT SIZE
between two groups' means is less than 0.2 standard
deviations, the difference is negligible, even if it is
→ Statistical significance is the least interesting thing
statistically significant.
about the results. You should describe the results in
terms of measures of magnitude – not just, does a
treatment affect people, but how much does it affect
them.
PEARSON R CORRELATION
ODD RATIO