Power Analysis
Power Analysis
Power Analysis
• The statistical power of a hypothesis test is the probability of detecting an effect, if there is a true
effect present to detect.
• Power can be calculated and reported for a completed experiment to comment on the confidence
one might have in the conclusions drawn from the results of the study.
• It can also be used as a tool to estimate the number of observations or sample size required in
order to detect an effect in an experiment.
• A power analysis can be used to estimate the minimum sample size required for an experiment,
given a desired significance level, effect size, and statistical power.
Statistical Hypothesis Testing
• A statistical hypothesis test makes an assumption about the outcome, called the null hypothesis.
• For example, the null hypothesis for the Pearson’s correlation test is that there is no relationship between two
variables. The null hypothesis for the Student’s t test is that there is no difference between the means of two
populations.
• The test is often interpreted using a p-value, which is the probability of observing the result given that the
null hypothesis is true, not the reverse, as is often the case with misinterpretations.
• p-value (p): Probability of obtaining a result equal to or more extreme than was observed in the data.
• In interpreting the p-value of a significance test, you must specify a significance level, often referred to as the
Greek lower case letter alpha (a). A common value for the significance level is 5% written as 0.05.
Statistical Hypothesis Testing
• The p-value is interested in the context of the chosen significance level. A result of a significance
test is claimed to be “statistically significant” if the p-value is less than the significance level. This
means that the null hypothesis (that there is no result) is rejected.
• Where:
• Significance level (alpha): Boundary for specifying a statistically significant finding when
interpreting the p-value.
• We can see that the p-value is just a probability and that in actuality the result may be different.
The test could be wrong. Given the p-value, we could make an error in our interpretation.
Types of errors
• There are two types of errors; they are:
• Type I Error. Reject the null hypothesis when there is in fact no significant effect (false positive).
The p-value is optimistically small.
• Type II Error. Not reject the null hypothesis when there is a significant effect (false negative).
The p-value is pessimistically large.
• In this context, we can think of the significance level as the probability of rejecting the null
hypothesis if it were true. That is the probability of making a Type I Error or a false positive.
What Is Statistical Power?
• Statistical power, or the power of a hypothesis test is the probability that the test correctly
rejects the null hypothesis.
• That is, the probability of a true positive result. It is only useful when the null hypothesis
is rejected.
• … statistical power is the probability that a test will correctly reject a false null
hypothesis. Statistical power has relevance only when the null is false.
• The higher the statistical power for a given experiment, the lower the probability of
making a Type II (false negative) error. That is the higher the probability of detecting an
effect when there is an effect. In fact, the power is precisely the inverse of the probability
of a Type II error.
• More intuitively, the statistical power can be thought of as the probability of accepting an alternative
hypothesis, when the alternative hypothesis is true.
• When interpreting statistical power, we seek experiential setups that have high statistical power.
•Low Statistical Power: Large risk of committing Type II errors, e.g. a false negative.
•High Statistical Power: Small risk of committing Type II errors.
• Experimental results with too low statistical power will lead to invalid conclusions about the meaning of the
results. Therefore a minimum level of statistical power must be sought.
• It is common to design experiments with a statistical power of 80% or better, e.g. 0.80. This means a 20%
probability of encountering a Type II area. This different to the 5% likelihood of encountering a Type I error
for the standard value for the significance level.
Power Analysis
Statistical power is one piece in a puzzle that has four related parts; they are:
• Effect Size. The quantified magnitude of a result present in the population. Effect size is
calculated using a specific statistical measure, such as Pearson’s correlation coefficient for the
relationship between variables or Cohen’s d for the difference between groups.
• Significance. The significance level used in the statistical test, e.g. alpha. Often set to 5% or 0.05.
• All four variables are related. For example, a larger sample size can make an effect easier to
detect, and the statistical power can be increased in a test by increasing the significance level.
• A power analysis involves estimating one of these four parameters given values for three other
parameters. This is a powerful tool in both the design and in the analysis of experiments that we
wish to interpret using statistical hypothesis tests.
• For example, the statistical power can be estimated given an effect size, sample size and
significance level. Alternately, the sample size can be estimated given different desired levels of
significance.
References
• https://machinelearningmastery.com/statistical-power-and-power-an
alysis-in-python/
• chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://
scholar.harvard.edu/files/pbalan/files/balan_slides_week5_ho.pdf
• chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://
www.utstat.toronto.edu/~brunner/oldclass/378f16/readings/
CohenPower.pdf
• https://www.spotfire.com/glossary/what-is-power-analysis#:~:text=A
%20power%20analysis%20is%20the,it%20is%20genuine%20and%20si
gnificant
.
• The ideal power of a study is considered to be 0.8 (which can also
be specified as 80%).
• Sufficient sample size should be maintained to obtain a Type I
error as low as 0.05 or 0.01 and a power as high as 0.8 or 0.9.
Power and Sample Size Determination
• https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_power/bs
704_power_print.html