0% found this document useful (0 votes)
15 views

Power and Effect Size

Uploaded by

Kittine Formilos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Power and Effect Size

Uploaded by

Kittine Formilos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

A.Y.

2024 – 2025 | FIRST SEMESTER


POWER AND EFFECT SIZE

1.0 STATISICAL POWER → Statistical power, or the power of a hypothesis test is


the probability that the test correctly rejects the null
→ The statistical power of a hypothesis test is the hypothesis.
probability of detecting an effect, if there is a true
effect present to detect. → That is, the probability of a true positive result. It is
only useful when the null hypothesis is rejected.
STATISTICAL HYPOTHESIS TESTING
→ The higher the statistical power for a given experiment,
→ A statistical hypothesis test makes an assumption the lower the probability of making a Type II (false
about the outcome, called the null hypothesis. negative) error. That is the higher the probability of
detecting an effect when there is an effect. In fact, the
→ p-value (p): Probability of obtaining a result equal to or power is precisely the inverse of the probability of a Type
more extreme than was observed in the data. II error.
→ In interpreting the p-value of a significance test, you → More intuitively, the statistical power can be thought of
must specify a significance level, often referred to as the as the probability of accepting an alternative
Greek lower case letter alpha (a). A common value for hypothesis, when the alternative hypothesis is true.
the significance level is 5% written as 0.05.
→ Power is the probability of rejecting the null hypothesis
→ The p-value is interested in the context of the chosen when, in fact, it is false.
significance level. A result of a significance test is
claimed to be “statistically significant” if the p-value → Power is the probability of making a correct decision
is less than the significance level. This means that the (to reject the null hypothesis) when the null hypothesis
null hypothesis (that there is no result) is rejected. is false.

→ p </= alpha: reject H0, different distribution. → Power is the probability that a test of significance will
→ p > alpha: fail to reject H0, same distribution. pick up on an effect that is present.
→ Where
→ Power is the probability that a test of significance will
→ Significance level (alpha): Boundary for specifying a
detect a deviation from the null hypothesis, should
statistically significant finding when interpreting the p-
such a deviation exist.
value.
→ We can see that the p-value is just a probability and that → Mathematically, power is 1 – beta. The power of a
in actuality the result may be different. The test could be hypothesis test is between 0 and 1; if the power is close
wrong. Given the p-value, we could make an error in our to 1, the hypothesis test is very good at detecting a
interpretation false null hypothesis. Beta is commonly set at 0.2, but
may be set by the researchers to be smaller.
2.0 TYPE OF ERROR
→ Consequently, power may be as low as 0.8, but may be
TYPE I ERROR higher. Powers lower than 0.8, while not impossible,
would typically be considered too low for most areas
→ Reject the null hypothesis when there is in fact no
of research.
significant effect (false positive). The p-value is
optimistically small. → When interpreting statistical power, we seek
TYPE II ERROR experiential setups that have high statistical power.

→ Not reject the null hypothesis when there is a significant → Low Statistical Power: Large risk of committing Type II
effect (false negative). The p-value is pessimistically errors, e.g. a false negative.
large.
→ High Statistical Power: Small risk of committing Type II
error

→ Power is increased when a researcher increases


sample size, as well as when a researcher increases
effect sizes and significance levels. There are other
variables that also influence power, including variance
(σ2),.

→ In reality, a researcher wants both Type I and Type II


errors to be small. In terms of significance level and
power, Weiss says this means we want a small
significance level (close to 0) and a large power (close
to 1).

TSU PSYCHOLOGY DEPARTMENT | PSY 103 REVIEWER mmelm ⋆。°✩ 1


A.Y. 2024 – 2025 | FIRST SEMESTER
POWER AND EFFECT SIZE

POWER ANALYSIS → Effect size is a quantitative measure of the magnitude


of the experimental effect. The larger the effect size
→ Statistical power is one piece in a puzzle that has four the stronger the relationship between two variables.
related parts
→ You can look at the effect size when comparing any two
→ Effect Size. The quantified magnitude of a result groups to see how substantially different they are.
present in the population. Effect size is calculated using Typically, research studies will comprise an
a specific statistical measure, such as Pearson’s experimental group and a control group. The
correlation coefficient for the relationship between experimental group may be an intervention or treatment
variables or Cohen’s d for the difference between which is expected to affect a specific outcome.
groups.
→ For example, we might want to know the effect of a
→ Sample Size. The number of observations in the therapy on treating depression. The effect size value will
sample. show us if the therapy has had a small, medium or large
effect on depression.
→ Significance. The significance level used in the
statistical test, e.g. alpha. Often set to 5% or 0.05. WHY REPORT EFFECT SIZES?

→ Statistical Power. The probability of accepting the → The p-value is not enough
alternative hypothesis if it is true.
→ A lower p-value is sometimes interpreted as meaning
→ All four variables are related. For example, a larger there is a stronger relationship between two variables.
sample size can make an effect easier to detect, and the However, statistical significance means that it is
statistical power can be increased in a test by increasing unlikely that the null hypothesis is true (less than 5%).
the significance level.
→ Therefore, a significant p-value tells us that an
→ A power analysis involves estimating one of these four intervention works, whereas an effect size tells us how
parameters given values for three other parameters. much it works.
This is a powerful tool in both the design and in the
analysis of experiments that we wish to interpret using → It can be argued that emphasizing the size of effect
statistical hypothesis tests. promotes a more scientific approach, as unlike
significance tests, effect size is independent of sample
→ For example, the statistical power can be estimated size.
given an effect size, sample size and significance level.
Alternatively, the sample size can be estimated given → To compare the results of studies done in different
different desired levels of significance. settings

→ Perhaps the most common use of a power analysis is in → Unlike a p-value, effect sizes can be used to
the estimation of the minimum sample size required for quantitatively compare the results of studies done in a
an experiment. different setting. It is widely used in meta-analysis.

→ As a practitioner, we can start with sensible defaults for HOW TO CALCULATE AND INTERPRET EFFECT SIZES
some parameters, such as a significance level of 0.05
and a power level of 0.80. We can then estimate a → Effect sizes either measure the sizes of associations
desirable minimum effect size, specific to the between variables or the sizes of differences
experiment being performed. A power analysis can then between group means.
be used to estimate the minimum sample size required
COHEN'S D
→ In addition, multiple power analyses can be performed → Cohen's d is an appropriate effect size for the
to provide a curve of one parameter against another, comparison between two means. It can be used, for
such as the change in the size of an effect in an example, to accompany the reporting of t-test and
experiment given changes to the sample size. More ANOVA results. It is also widely used in meta-analysis.
elaborate plots can be created varying three of the
parameters. This is a useful tool for experimental → Cohen suggested that d = 0.2 be considered a 'small'
design. effect size, 0.5 represents a 'medium' effect size and 0.8
a 'large' effect size. This means that if the difference
3.0 EFFECT SIZE
between two groups' means is less than 0.2 standard
deviations, the difference is negligible, even if it is
→ Statistical significance is the least interesting thing
statistically significant.
about the results. You should describe the results in
terms of measures of magnitude – not just, does a
treatment affect people, but how much does it affect
them.

TSU PSYCHOLOGY DEPARTMENT | PSY 103 REVIEWER mmelm ⋆。°✩ 2


A.Y. 2024 – 2025 | FIRST SEMESTER
POWER AND EFFECT SIZE

PEARSON R CORRELATION

→ This parameter of effect size summarizes the strength of


the bivariate relationship. The value of the effect size of
Pearson r correlation varies between -1 (a perfect
negative correlation) to +1 (a perfect positive
correlation).

→ According to Cohen (1988, 1992), the effect size is low if


the value of r varies around 0.1, medium if r varies
around 0.3, and large if r varies more than 0.5.

GLASS’S Δ METHOD OF EFFECT SIZE

→ This method is similar to Cohen's method, but in this


method standard deviation is used for the second
group. Mathematically this formula can be written as:

HEDGES’ G METHOD OF EFFECT SIZE

→ This method is the modified method of Cohen’s d


method. Hedges’ g method of effect size can be written
mathematically as follows:

COHEN’S F2 METHOD OF EFFECT SIZE

→ Cohen’s f2 method measures the effect size when we


use methods like ANOVA, multiple regression, etc. The
Cohen’s f2 measure effect size for multiple regressions
is defined as the following:

CRAMER’S Φ OR CRAMER’S V METHOD OF EFFECT SIZE

→ Chi-square is the best statistic to measure the effect


size for nominal data. In nominal data, when a variable
has two categories, then Cramer’s phi is the best
statistical use. When these categories are more than
two, then Cramer’s V statistics will give the best result
for nominal data.

ODD RATIO

→ The odds ratio is the odds of success in the treatment


group relative to the odds of success in the control
group. This method is used in cases when data is
binary. For example, it is used if we have the following
tabl

TSU PSYCHOLOGY DEPARTMENT | PSY 103 REVIEWER mmelm ⋆。°✩ 3

You might also like