0% found this document useful (0 votes)
56 views

Errors in Hypothesis Testing: Pamelah N. Kihembo

The document discusses types of errors that can occur in hypothesis testing using sample data. A Type I error occurs when the null hypothesis is rejected even though it is true in the population. The rate of Type I errors is equal to the significance level (α). A Type II error occurs when the null hypothesis is not rejected even though it is false in the population. Sample size and effect size impact the probability of Type II errors. Both types of errors can never be fully avoided but the likelihood can be reduced by increasing sample size.

Uploaded by

kihembo pamelah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Errors in Hypothesis Testing: Pamelah N. Kihembo

The document discusses types of errors that can occur in hypothesis testing using sample data. A Type I error occurs when the null hypothesis is rejected even though it is true in the population. The rate of Type I errors is equal to the significance level (α). A Type II error occurs when the null hypothesis is not rejected even though it is false in the population. Sample size and effect size impact the probability of Type II errors. Both types of errors can never be fully avoided but the likelihood can be reduced by increasing sample size.

Uploaded by

kihembo pamelah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Errors in Hypothesis testing

Pamelah N. Kihembo
• Hypothesis tests use sample data to make inferences about the properties
of a population. You gain tremendous benefits by working with random
samples because it is usually impossible to measure the entire population.
• However, there are tradeoffs when you use samples. The samples we use
are typically a minuscule percentage of the entire population. Consequently,
they occasionally misrepresent the population severely enough to cause
hypothesis tests to make errors.
• The sample data must provide sufficient evidence to reject the 
null hypothesis and conclude that the effect exists in the population.
Ideally, a hypothesis test fails to reject the null hypothesis when the effect is
not present in the population, and it rejects the null hypothesis when the
effect exists.
Type 1 Error
• When you see a p-value that is less than your significance level, you
get excited because your results are statistically significant. However,
it could be a type I error. The supposed effect might not exist in the
population. Again, there is usually no warning when this occurs.
• Even though we don’t know for sure which studies have false positive
results, we do know their rate of occurrence. The rate of occurrence
for Type I errors equals the significance level of the hypothesis test,
which is also known as alpha (α).
Type 1 error
• In some ways, the investigator’s problem is similar to that faced by a judge
judging a defendant.
• The absolute truth whether the defendant committed the crime cannot be
determined.
• Instead, the judge begins by presuming innocence — the defendant did not
commit the crime: in our case we begin by saying there is no difference
between the independent and dependent variables
• The judge must decide whether there is sufficient evidence to reject the
presumed innocence of the defendant;
• A judge can err, however, by convicting a defendant who is innocent, or by
failing to convict one who is actually guilty.
Cont……..
• In similar fashion, the investigator starts by presuming the null hypothesis,
or no association/difference between the predictor and outcome variables
in the population.
• Based on the data collected in his sample, the investigator uses statistical
tests to determine whether there is sufficient evidence to reject the null
hypothesis in favour of the alternative hypothesis that there is an
association in the population.
• The judge may however convict the defendant when he is not guilty or may
fail to convict him when he is guilty
• similarly, the investigator may reject the null hypothesis when it is correct
in the population or fail to reject it when it is false in the population
Cont…….
• A Type I error is made when HO (null hypothesis) is true but rejected.
That is if there is no association but the researcher concludes that there
is an association between the independent and dependent variables.
• When the significance level is 0.05 and the null hypothesis is true, there is
a 5% chance that the test will reject the null hypothesis incorrectly. If you
set alpha to 0.01, there is a 1% of a false positive. If 5% is good, then 1%
seems even better, right?
• Type I errors are caused by one thing, sample error (poor sampling
methods).
Type 11 errors
• When you perform a hypothesis test and your p-value is greater than
your significance level, your results are not statistically significant.
That’s disappointing because your sample provides insufficient
evidence for concluding that the effect you’re studying exists in the
population.
• However, there is a chance that the effect is present in the population
even though the test results don’t support it. If that’s the case, you’ve
just experienced a Type II error. The probability of making a Type II
error is known as beta (β).
Cont……
• What causes Type II errors? Whereas Type I errors are caused by one
thing, sample error, there are a host of possible reasons for Type II
errors—small effect sizes, small sample sizes, and high data variability.
Type 2 error
• Just like a judge’s conclusion, an investigator’s conclusion may be wrong.
Sometimes, by chance alone or a sample is not representative of the population.
• Thus the results in the sample do not reflect reality in the population, and the
random error leads to an erroneous inference.
• A type I error (false-positive) occurs if an investigator rejects a null hypothesis
that is actually true in the population;
• A type II error (false-negative) occurs if the investigator fails to reject a null
hypothesis that is actually false in the population.
• Although type I and type II errors can never be avoided entirely, the investigator
can reduce their likelihood by increasing the sample size (the larger the sample,
the lesser is the likelihood that it will differ substantially from the population).
Cont…….
• The likelihood that a study will be able to detect an association between a
predictor variable and an outcome variable depends, of course, on the actual
magnitude of that association in the target population. If it is large it will be easy
to detect in the sample.
• Conversely, if the size of the association is small it will be difficult to detect in the
sample. Unfortunately, the investigator often does not know the actual magnitude
of the association — one of the purposes of the study is to estimate it. Instead,
the investigator must choose the size of the association that he would like to be
able to detect in the sample. This quantity is known as the effect size.
• Selecting an appropriate effect size is the most difficult aspect of sample size
planning. Sometimes, the investigator can use data from other studies or pilot
tests to make an informed guess about a reasonable effect size when there are no
data with which to estimates.
Cont….
• The choice of the effect size is always somewhat arbitrary, and considerations
of feasibility are often paramount. When the number of available subjects is
limited, the investigator may have to work backward to determine whether
the effect size that his study will be able to detect with that number of
subjects is reasonable.
• Usually in research, the investigator establishes the maximum chance
of making type I and type II errors in advance of the study.
• The probability of committing a type I error (rejecting the null
hypothesis when it is actually true) is called α (alpha) the other name
for this is the level of statistical significance.
Cont…..
• If a study of mandazi eating and obesity is designed with α = 0.05, for example, then
the investigator has set 5% as the maximum chance of incorrectly rejecting the null
hypothesis (and erroneously inferring that eating mandazi is associated with obesity).
• This is the level of reasonable doubt that the investigator is willing to accept when he
uses statistical tests to analyse the data after the study is completed.
• The probability of making a type II error (failing to reject the null hypothesis when it is
actually false) is called β (beta). The quantity (1 - β) is called power, the probability of
observing an effect in the sample (if one), of a specified effect size or greater exists in
the population.
• If β is set at 0.10, then the investigator has decided that he is willing to accept a 10%
chance of missing an association of a given effect size between mandazi and obesity.
• This represents a power of 0.90, i.e., a 90% chance of finding an association of that
size.
Cont….
• Ideally alpha and beta errors would be set at zero, eliminating the possibility of
false-positive and false-negative results.
• In practice they are made as small as possible. Reducing them, however, usually
requires increasing the sample size.
• Sample size planning aims at choosing a sufficient number of subjects to keep alpha
and beta at acceptably low levels without making the study unnecessarily expensive
or difficult.
• Many studies set alpha at 0.05 and beta at 0.20 (a power of 0.80). These are
somewhat arbitrary values, and others are sometimes used; the conventional range
for alpha is between 0.01 and 0.10; and for beta, between 0.05 and 0.20.
• In general the investigator should choose a low value of alpha when the research
question makes it particularly important to avoid a type I (false-positive) error, and
he should choose a low value of beta when it is especially important to avoid a type
II error.
Tabular representation
Test Rejects Null Test Fails to Reject Null
Correct decision/No effect
Null is True Type I Error/False Positive

Test Rejects Null Test Fails to Reject Null


Correct decisionNo effect
Null is True Type I ErrorFalse Positive
Graphical representation
Example:
• A fire alarm provides a good analogy for the types of hypothesis testing
errors. Preferably, the alarm rings when there is a fire and does not ring
in the absence of a fire. However, if the alarm rings when there is no fire,
it is a false positive, …………….in statistical terms. Conversely, if the fire
alarm fails to ring when there is a fire, it is a false negative, …………………..
E.gs…….
• Examples to summarise Type I error and Type I I error errors
• Type I error
• In a case study investigating the difference between obese and average weight patients, the
p- value associated with the significance test is 0.0067. Therefore, the null hypothesis was
rejected and it was concluded that nurses intend to spend less time with obese patients
• Despite the low probability value, it is possible that the null hypothesis of no true difference
between obese and average-weight patients is true and that the large difference between
sample means occurred by chance.
• If this is the case, then the conclusion that nurses intend to spend less time with obese
patients is in error.
• This type of error is called a Type I error. More generally, a Type I error occurs when a
significance test results in the rejection of a true null hypothesis.
Example 2……………..
• Type 11 error
• In this type of error the researcher fails to reject a false null hypothesis.
• Unlike a Type I error, a Type II error is not really an error. When a statistical
test is not significant, it means that the data do not provide strong evidence
that the null hypothesis is false.
• Lack of significance does not support the conclusion that the null hypothesis
is true. Therefore, a researcher should not make the mistake of incorrectly
concluding that the null hypothesis is true when a statistical test was not
significant (i.e observe nothing in our sample when it exists in the
population eg. smoking does not cause cancer). Instead, the researcher
should consider the test inconclusive.
Confidence intervals
• When you make an estimate in statistics, whether it is a summary statistic or
a test statistic, there is always uncertainty around that estimate because the
number is based on a sample of the population you are studying.
• The confidence interval is the range of values that you expect your estimate
to fall between a certain percentage of the time if you run your experiment
again or re-sample the population in the same way.
• The confidence level is the percentage of times you expect to reproduce an
estimate between the upper and lower bounds of the confidence interval, and
is set by the alpha value.
Cont…
• A confidence interval is the mean of your estimate plus and minus the
variation in that estimate.
• Confidence, in statistics, is another way to describe probability. For
example, if you construct a confidence interval with a 95% confidence
level, you are confident that 95 out of 100 times the estimate will fall
between the upper and lower values specified by the confidence
interval.
• Your desired confidence level is usually one minus the alpha (α) value
 you used in your statistical test:
Cont….
Confidence level = 1 − a
• So if you use an alpha value of p < 0.05 for statistical significance, then your confidence level
would be 1 − 0.05 = 0.95, or 95%.
Example: Variation around an estimateYou survey 100 Brits and 100 Americans about their
television-watching habits, and find that both groups watch an average of 35 hours of television
per week.
• However, the British people surveyed had a wide variation in the number of hours watched,
while the Americans all watched similar amounts.
• The Americans had up to 90 hours of TV watching while the Brits had up to 60 hours of TV
watching.
• Even though both groups have the same point estimate (average number of hours watched),
the British estimate will have a wider confidence interval than the American estimate because
there is more variation in the data.
Cont.....
Point estimate
• The point estimate of your confidence interval will be whatever statistical
estimate you are making (e.g., population mean, the difference between
population means, proportions, variation among groups).
• Example: Point estimate: In the TV-watching example, the point estimate
is the mean number of hours watched: 35.
Finding the critical value

• Critical values tell you how many standard deviations away from the mean
you need to go in order to reach the desired confidence level for your
confidence interval.
• There are three steps to find the critical value.
• Choose your alpha (α) value.
• The alpha value is the probability threshold for statistical significance. The
most common alpha value is p = 0.05, but 0.1, 0.01, and even 0.001 are
sometimes used. It’s best to look at the research papers published in your
field to decide which alpha value to use.
Confidence values
Confidence level 90% 95% 99%

alpha for one-tailed CI 0.1 0.05 0.01

alpha for two-tailed CI 0.05 0.025 0.005

z statistic 1.64 1.96 2.57


Cont….
• Confidence intervals are used when instead of simply wanting to know the mean value of a
sample, we want a range that is likely to contain the true population.
• Example 1: For example we may be interested in increasing the mean weights of
malnourished children after feeding them a nutritious diet. From the sample of the children
in the experiment, we can work out the mean change in their weight. But this mean will
only be for that particular sample. If we took another group of patients we would not
expect to get exactly the same value because chance can also affect the change in weight.
• Example 2: Suppose the mean weight among sixteen p.7 school children is 30 kg; this
sample mean is called a point estimate, and it is used to estimate the corresponding
population mean.
• If repeated samples are taken from the same population there is no reason to assume that
the population mean will be exactly equal to the sample mean.
• 
Confidence interval for the mean of normally-distributed data
Normally-distributed data forms a bell shape when plotted on a graph, with the sample mean in the middle an
The confidence interval for data which follows a standard normal distribution is:

Where:
•CI = the confidence interval
•X̄ = the population mean
•Z* = the critical value of the z distribution
•σ = the population standard deviation
•√n = the square root of the population size
• Example: Calculating the confidence intervalIn the survey of
Americans’ and Brits’ television watching habits, we can use the
sample mean, sample standard deviation, and sample size in place of
the population mean, population standard deviation, and population
size.
• To calculate the 95% confidence interval, we can simply plug the
values into the formula.
EXE.
Draw a two tailed curve for the confidence interval
• A confidence interval is a range around a measurement that conveys how precise
the measurement is.
• The commonest used in biostatics is a 95% C1.
• With a 95 percent confidence interval, you have a 5 percent chance of being
wrong.
• With a 90 percent confidence interval, you have a 10 percent chance of being
wrong.
• The 95%CI gives a range around the mean where we expect the "true"
(population) mean to lie, using the formulae:

- 95% CI = sample mean ± 2SD/√n


Cont….
• The above interval is called the 95% confidence interval (CI) for a
population mean.
• Note that the width of the CI depends on the sample size and on the
variation of data values.
• A very wide interval may indicate that more data should be collected
before anything very definite can be said about the population mean.
Cont…..
• Example 1:
• Suppose the mean weight of 16 children in P.7 is 30kg, with an SD of ± 5 kg. What is the 95%
CI for the population mean?
• 95% CI = sample mean ± 2SD/√n
• =30± 2(5)/ √16
• =30±10/4
• =30±2.5 kg
• 95%CI=30 (27.5 to 32.5)
• The above result tells us that we are 95% confident that the range 27.5 to 32.5 (i.e ± 2.5)
contains the true mean weight of the P.7 children.
• We can also say that if we take another sample of P.7 children, 95 times out of 100 the true
mean weight will lie between 27.5 and 32.5 kg.
• A CI tells you about how stable the estimate is. A stable estimate is one that would be close
to the same value if the survey were repeated.
Cont…
Advantages of using confidence intervals:
• A confidence intervals remind us that study estimates have variability
• They provide the same information as a statistical test and more.
• They show the role that sample size in the estimation; i.e Large
sample size usually provide narrow confidence limits while small
sample sizes have wide confidence limits
 
 
Two sided Z test curve
Example with two sample means
• Patients with malnutrition were randomised to see either Nutritionist X
or Nutritionist Y. Nutritionist X ended up seeing 176 patients while
Nutritionist Y saw 200 patients as shown in the table below. Using the p
value and results of the CI, identify and explain the sets of data that
show a significant difference between the two Nutritionists. (Hint-state
the null hypothesis for each)
Item Nutritionist X Nutritionist Y P value 95% C1 for Nutritionist X
(n=200) (n=176) and Y
Patients satisfied with 186 (93) 168 (95) 0.38 95% C1=97%, 90% (X)
consultation % 95% C1=96%, 82% (Y)
Mean (SD) 16 (3.1) 6 (2.8) 0.001 95% C1=3.0, 4.2
consultation length
(minutes) 95% C1=2.0, 2.9

Patients getting a 65 (33) 67 (38) 0.28 95% C1=30,40


prescription (%) 95% C1=35, 40
Mean (SD) number of 3.58 (1.3) 3.61 (1.3) 0.82 95% C1=1.0-1.5
days off work 95% C1=1.0-1.6
Patients needing a follow 68 (34) 78 (44) 0.004 95% C1=30,37
up appointment (%)
95% C1=40,46
Hypothesis 1: There is no difference in number of Patients satisfied with
consultation between the two Nutritionists (X,Y)
• Ho: μ1= μ2
• With a p-value of 0.38, we conclude that there is no sufficient evidence
against the null hypothesis, since the p-value is above 0.05. Therefore,
we do not reject the null hypothesis and conclude that there is no
difference in satisfaction between the two nutritionists.
• With respect to the CIs, 95% C1=90%-97%(X); 95% C1=82%-96% (Y);
since there is an overlap between the two confidence intervals, we do
not reject the null hypothesis, both n nutritionists gave satisfactory
performance.
Hypothesis 1: There is no difference in number of Patients satisfied with
consultation between the two Nutritionists (X,Y)
• Ho: μ1= μ2
• With a p-value of 0.38, we conclude that there is no sufficient evidence
against the null hypothesis, since the p-value is above 0.05. Therefore,
we do not reject the null hypothesis and conclude that there is no
difference in satisfaction between the two nutritionists.
• With respect to the CIs, 95% C1=90%-97%(X); 95% C1=82%-96% (Y);
since there is an overlap between the two confidence intervals, we do
not reject the null hypothesis, both n nutritionists gave satisfactory
performance.

You might also like