Errors in Hypothesis Testing: Pamelah N. Kihembo
Errors in Hypothesis Testing: Pamelah N. Kihembo
Pamelah N. Kihembo
• Hypothesis tests use sample data to make inferences about the properties
of a population. You gain tremendous benefits by working with random
samples because it is usually impossible to measure the entire population.
• However, there are tradeoffs when you use samples. The samples we use
are typically a minuscule percentage of the entire population. Consequently,
they occasionally misrepresent the population severely enough to cause
hypothesis tests to make errors.
• The sample data must provide sufficient evidence to reject the
null hypothesis and conclude that the effect exists in the population.
Ideally, a hypothesis test fails to reject the null hypothesis when the effect is
not present in the population, and it rejects the null hypothesis when the
effect exists.
Type 1 Error
• When you see a p-value that is less than your significance level, you
get excited because your results are statistically significant. However,
it could be a type I error. The supposed effect might not exist in the
population. Again, there is usually no warning when this occurs.
• Even though we don’t know for sure which studies have false positive
results, we do know their rate of occurrence. The rate of occurrence
for Type I errors equals the significance level of the hypothesis test,
which is also known as alpha (α).
Type 1 error
• In some ways, the investigator’s problem is similar to that faced by a judge
judging a defendant.
• The absolute truth whether the defendant committed the crime cannot be
determined.
• Instead, the judge begins by presuming innocence — the defendant did not
commit the crime: in our case we begin by saying there is no difference
between the independent and dependent variables
• The judge must decide whether there is sufficient evidence to reject the
presumed innocence of the defendant;
• A judge can err, however, by convicting a defendant who is innocent, or by
failing to convict one who is actually guilty.
Cont……..
• In similar fashion, the investigator starts by presuming the null hypothesis,
or no association/difference between the predictor and outcome variables
in the population.
• Based on the data collected in his sample, the investigator uses statistical
tests to determine whether there is sufficient evidence to reject the null
hypothesis in favour of the alternative hypothesis that there is an
association in the population.
• The judge may however convict the defendant when he is not guilty or may
fail to convict him when he is guilty
• similarly, the investigator may reject the null hypothesis when it is correct
in the population or fail to reject it when it is false in the population
Cont…….
• A Type I error is made when HO (null hypothesis) is true but rejected.
That is if there is no association but the researcher concludes that there
is an association between the independent and dependent variables.
• When the significance level is 0.05 and the null hypothesis is true, there is
a 5% chance that the test will reject the null hypothesis incorrectly. If you
set alpha to 0.01, there is a 1% of a false positive. If 5% is good, then 1%
seems even better, right?
• Type I errors are caused by one thing, sample error (poor sampling
methods).
Type 11 errors
• When you perform a hypothesis test and your p-value is greater than
your significance level, your results are not statistically significant.
That’s disappointing because your sample provides insufficient
evidence for concluding that the effect you’re studying exists in the
population.
• However, there is a chance that the effect is present in the population
even though the test results don’t support it. If that’s the case, you’ve
just experienced a Type II error. The probability of making a Type II
error is known as beta (β).
Cont……
• What causes Type II errors? Whereas Type I errors are caused by one
thing, sample error, there are a host of possible reasons for Type II
errors—small effect sizes, small sample sizes, and high data variability.
Type 2 error
• Just like a judge’s conclusion, an investigator’s conclusion may be wrong.
Sometimes, by chance alone or a sample is not representative of the population.
• Thus the results in the sample do not reflect reality in the population, and the
random error leads to an erroneous inference.
• A type I error (false-positive) occurs if an investigator rejects a null hypothesis
that is actually true in the population;
• A type II error (false-negative) occurs if the investigator fails to reject a null
hypothesis that is actually false in the population.
• Although type I and type II errors can never be avoided entirely, the investigator
can reduce their likelihood by increasing the sample size (the larger the sample,
the lesser is the likelihood that it will differ substantially from the population).
Cont…….
• The likelihood that a study will be able to detect an association between a
predictor variable and an outcome variable depends, of course, on the actual
magnitude of that association in the target population. If it is large it will be easy
to detect in the sample.
• Conversely, if the size of the association is small it will be difficult to detect in the
sample. Unfortunately, the investigator often does not know the actual magnitude
of the association — one of the purposes of the study is to estimate it. Instead,
the investigator must choose the size of the association that he would like to be
able to detect in the sample. This quantity is known as the effect size.
• Selecting an appropriate effect size is the most difficult aspect of sample size
planning. Sometimes, the investigator can use data from other studies or pilot
tests to make an informed guess about a reasonable effect size when there are no
data with which to estimates.
Cont….
• The choice of the effect size is always somewhat arbitrary, and considerations
of feasibility are often paramount. When the number of available subjects is
limited, the investigator may have to work backward to determine whether
the effect size that his study will be able to detect with that number of
subjects is reasonable.
• Usually in research, the investigator establishes the maximum chance
of making type I and type II errors in advance of the study.
• The probability of committing a type I error (rejecting the null
hypothesis when it is actually true) is called α (alpha) the other name
for this is the level of statistical significance.
Cont…..
• If a study of mandazi eating and obesity is designed with α = 0.05, for example, then
the investigator has set 5% as the maximum chance of incorrectly rejecting the null
hypothesis (and erroneously inferring that eating mandazi is associated with obesity).
• This is the level of reasonable doubt that the investigator is willing to accept when he
uses statistical tests to analyse the data after the study is completed.
• The probability of making a type II error (failing to reject the null hypothesis when it is
actually false) is called β (beta). The quantity (1 - β) is called power, the probability of
observing an effect in the sample (if one), of a specified effect size or greater exists in
the population.
• If β is set at 0.10, then the investigator has decided that he is willing to accept a 10%
chance of missing an association of a given effect size between mandazi and obesity.
• This represents a power of 0.90, i.e., a 90% chance of finding an association of that
size.
Cont….
• Ideally alpha and beta errors would be set at zero, eliminating the possibility of
false-positive and false-negative results.
• In practice they are made as small as possible. Reducing them, however, usually
requires increasing the sample size.
• Sample size planning aims at choosing a sufficient number of subjects to keep alpha
and beta at acceptably low levels without making the study unnecessarily expensive
or difficult.
• Many studies set alpha at 0.05 and beta at 0.20 (a power of 0.80). These are
somewhat arbitrary values, and others are sometimes used; the conventional range
for alpha is between 0.01 and 0.10; and for beta, between 0.05 and 0.20.
• In general the investigator should choose a low value of alpha when the research
question makes it particularly important to avoid a type I (false-positive) error, and
he should choose a low value of beta when it is especially important to avoid a type
II error.
Tabular representation
Test Rejects Null Test Fails to Reject Null
Correct decision/No effect
Null is True Type I Error/False Positive
• Critical values tell you how many standard deviations away from the mean
you need to go in order to reach the desired confidence level for your
confidence interval.
• There are three steps to find the critical value.
• Choose your alpha (α) value.
• The alpha value is the probability threshold for statistical significance. The
most common alpha value is p = 0.05, but 0.1, 0.01, and even 0.001 are
sometimes used. It’s best to look at the research papers published in your
field to decide which alpha value to use.
Confidence values
Confidence level 90% 95% 99%
Where:
•CI = the confidence interval
•X̄ = the population mean
•Z* = the critical value of the z distribution
•σ = the population standard deviation
•√n = the square root of the population size
• Example: Calculating the confidence intervalIn the survey of
Americans’ and Brits’ television watching habits, we can use the
sample mean, sample standard deviation, and sample size in place of
the population mean, population standard deviation, and population
size.
• To calculate the 95% confidence interval, we can simply plug the
values into the formula.
EXE.
Draw a two tailed curve for the confidence interval
• A confidence interval is a range around a measurement that conveys how precise
the measurement is.
• The commonest used in biostatics is a 95% C1.
• With a 95 percent confidence interval, you have a 5 percent chance of being
wrong.
• With a 90 percent confidence interval, you have a 10 percent chance of being
wrong.
• The 95%CI gives a range around the mean where we expect the "true"
(population) mean to lie, using the formulae: