Biostatistics 2b statistical testing theory
Biostatistics 2b statistical testing theory
1
First concepts of statistical testing theory (1)
Imagine some medicine has been developed to cure patients having some disease 𝐷. The
medicine has to replace on old medicine. It is known that the old medicine is successful in
60% of all cases.
The new medicine is distributed to 200 patients and 145 are cured by means of the new
medicine.
Can we prove that the new medicine is better than the old one?
Problem:
𝑝 = the probability that a patient is cured by the new medicine (is unknown)
145
The outcome of the estimator 𝑝̂ = 𝑋⁄𝑛 is = 72.5% , this is larger than 60% but is not free
200
from chance. Can we really conclude that the true probability 𝑝 is larger than 60%?
Our approach:
If we carry out a statistical test then we shall take one of the following two actions:
(1) We reject the null hypothesis, which means that the alternative hypothesis is proven,
(2) We don’t reject the null hypothesis, which means that we did not succeed to prove
the alternative hypothesis.
2
First concepts of statistical testing theory (2)
Doing statistics is never free from errors. Possible errors in testing theory are:
The error of the first kind (type I error): rejecting the null hypothesis if the null hypothesis is
true.
The error of the second kind (type II error): not rejecting the null hypothesis if the null
hypothesis is false.
If we don’t reject the null hypothesis, we can also say that we accept the null hypothesis.
The decision of the rejection of 𝐻0 is determined for a large part by the significance level 𝛼.
We require that the probability of the error of the first kind is at most 𝛼.
The action to reject the null hypothesis will be based on the test statistic, which determines
the statistical test for the major part.
3
The binomial test
We only observe: 𝑋 = the number of cured patients (by the new medicine)
Assuming a common probability 𝑝 for curing for all patients and independence between
patients a binomial situation arises with the consequence that
We are testing the null hypothesis 𝐻0 : 𝑝 = 0.60 against the alternative hypothesis 𝐻1 : 𝑝 >
0.60 .
It is natural to reject the null hypothesis for large values of 𝑋, say if 𝑋 ≥ 𝑐 for some ‘critical
value’ 𝑐 because if 𝑝 > 0.60 then larger values of 𝑋 get higher probabilities.
Assuming the null hypothesis 𝑋 has the binomial distribution with 𝑛 = 200 and 𝑝 = 0.60.
We select the number 𝑐 such that (1) 𝑃(𝑋 ≥ 𝑐) ≤ 0.05, and (2) 𝑃(𝑋 ≥ 𝑐) approximates 0.05
as close as possible.
Note this procedure can be established before 𝑋 is realized (before the experiment is carried
out).
4
The eight steps of a statistical test
Besides the choice of 𝛼 doing a statistical test can be described by eight steps:
1. 𝑋 has the binomial distribution with 𝑛 = 200 and unknown success probability
𝑝,
2. We test 𝐻0 : 𝑝 = 0.60 against 𝐻1 : 𝑝 > 0.60,
3. Test Statistic: 𝑋
4. Under 𝐻0 : 𝑋 has the binomial distribution with 𝑛 = 200 and 𝑝 = 0.60
5. Outcome of 𝑋: 145
6. We reject the null hypothesis if 𝑋 ≥ 132 (see earlier calculations)
7. As 𝑋 = 145 we reject the null hypothesis
8. We have proven that the probability that patients are cured by means of the
new medicine is larger than 60% (is thus better than the old one).
Remark:
If the experiment gave 𝑋 = 125, then we did not reject the null hypothesis. Then we did not
prove the alternative hypothesis of a higher probability (higher than 60%) for the new
medicine.
5
The power of this binomial test
This binomial test can be seen as a procedure for taking some action regarding
proving the alternative hypothesis.
Working with significance level 𝛼 = 5% means that there is some risk of at most 5%
for stating that the alternative hypothesis is true given a true null hypothesis (error of
the first kind).
At the contrary, suppose that the null hypothesis is not true. Imagine that the true
probability 𝑝 is given by 𝑝 = 0.70 (hence 𝐻1 is true).
What is now the probability of rejecting the null hypothesis (the right action now).
(𝑋~𝐵(200, 0.70) : 𝑋 has the binomial distribution with 𝑛 = 200 and 𝑝 = 0.70.)
This means that if the true probability of being cured is 70% then then the null
hypothesis is rejected with probability 90.4% in favor of the alternative hypothesis.
6
One sided and two sided tests
In our example we tested 𝐻0 : 𝑝 = 0.60 against 𝐻1 : 𝑝 > 0.60, we reject the null hypothesis if
𝑋 ≥ 𝑐.
If we tested (for some reason) 𝐻0 : 𝑝 = 0.60 against 𝐻1 : 𝑝 < 0.60 then we would reject the null
hypothesis if 𝑋 ≤ 𝑐. The critical value 𝑐 can be determined using the significance level 𝛼.
If we test 𝐻0 : 𝑝 = 0.60 against 𝐻1 : 𝑝 ≠ 0.60 then we have to reject the null hypothesis if 𝑋 ≤
𝑐1 or 𝑋 ≥ 𝑐2 . The critical values are determined by
𝑃(𝑋 ≤ 𝑐1 ) ≤ 𝛼 ⁄2 and 𝑃(𝑋 ≥ 𝑐2 ) ≤ 𝛼 ⁄2 using the binomial distribution with 𝑛 = 200 and 𝑝 =
0.60, select the values 𝑐1 and 𝑐2 such that the probabilities 𝑃(𝑋 ≤ 𝑐1 ) and 𝑃(𝑋 ≥ 𝑐2 )
approximate 𝛼 ⁄2 in the best way.
7
P-values (of the binomial test)
In this case the P-value is the probability 𝑃(𝑋 ≥ 145) , computed according to the
𝐵(200, 0.60)-distribution (the distribution of the test statistic under 𝐻0 ).
The general rule for P-values is: Reject the null hypothesis if P-value ≤ 𝜶.
For our example: we again reject the null hypothesis since the P-value is (much) smaller than
𝛼 = 5%.
If we tested 𝐻0 : 𝑝 = 0.60 against 𝐻1 : 𝑝 < 0.60 then we reject the null hypothesis if 𝑋 ≤ 𝑐 and
the P-value would be 𝑃(𝑋 ≤ 145), computed with 𝐵(200, 0.6).
Finally, if we test 𝐻0 : 𝑝 = 0.60 against 𝐻1 : 𝑝 ≠ 0.60 then the two sided P-value is as follows:
Compute (in this case) 𝑃(𝑋 ≤ 145) and 𝑃(𝑋 ≥ 145), according to 𝐵(200, 0.60).
This rule is invented to maintain the rule ‘Reject the null hypothesis if P-value ≤ 𝜶.’
Here 𝑃(𝑋 ≥ 145) = 0.00015 is the smallest probability, the two sided P-value would be
8
The normal approximation of the binomial test
This is, as a matter of fact, a second version of the previous test. Note that the binomial
distribution can be approximated by a normal distribution if 𝑛𝑝 ≥ 5 and 𝑛(1 − 𝑝) ≥ 5.
For our example the distribution of the test statistic is the 𝐵(200, 0.60) distribution which can
be approximated by the normal distribution with µ = 𝑛𝑝 = 120 and 𝜎 = √𝑛𝑝(1 − 𝑝) = 6.928.
The eight steps of the normal approximation of the binomial test are as follows:
(1) 𝑋 has the binomial distribution with 𝑛 = 200 and unknown success probability 𝑝
(2) We test 𝐻0 : 𝑝 = 0.60 against 𝐻1 : 𝑝 > 0.60
(3) Test statistic: 𝑍 = (𝑋 − µ)⁄𝜎 = (𝑋 − 120)⁄6.928
(4) Under 𝐻0 : 𝑍 ~ 𝑁(0,1)
145−120
(5) Outcome of 𝑍: 𝑍 = 6.928
= 3.61
(6) We reject if 𝑍 ≥ 𝑐, 𝛼 = 5% and standard normal table: 𝑐 = 1.645
(7) As 𝑍 = 3.61 we reject the null hypothesis
(8) We have proven that the probability that patients are cured by means of the
new medicine is larger than 60% (the new one is thus better than the old one).
Remarks:
9
Again the power for 𝒑 = 𝟎. 𝟕𝟎
Let us calculate the probability 𝑃(𝑟𝑒𝑗𝑒𝑐𝑡 𝑡ℎ𝑒 𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠) = 𝑃(𝑍 ≥ 1.645) for 𝑝 = 0.70 for
the normal approximation of the binomial test.
Note that in case of 𝑝 = 0.70 test statistic 𝑍 does not have a standard normal distribution but
𝑋 has approximately a normal distribution with µ = 𝑛𝑝 = 140 and 𝜎 = √𝑛𝑝(1 − 𝑝) = 6.481,
hence
Since
𝑋−120 𝑋−140+140−120 6.481
𝑍= = × = (𝑍2 + 3.086) × 0.9355
6.928 6.481 6.928
We have:
Power calculations often are carried out for finding a minimal number 𝑛 that satisfies a
desired level of the power.
Suppose we want to raise the power for 𝑝 = 0.70. We want to have power 95% for 𝑝 = 0.70
For arbitrary sample size 𝑛 the test statistic of the normal approximation is
we get
take 𝑛 = 244.
10
The one sample t-test (1)
Suppose a group of 𝑛 patients which have a high score for systolic blood pressure.
For each patient the (systolic) blood pressure before and the blood pressure after the
treatment have been measured.
We define: 𝑋 = blood pressure before treatment minus blood pressure after treatment
Assume the variable 𝑋 is normally distributed and that the data can be summarized as
follows:
We assume that 𝑋 has a normal distribution with expectation µ and standard deviation 𝜎.
So µ denotes the average blood pressure reduction for the population of patients (long run
average).
The null hypothesis states that there is no blood pressure reduction on the average, and we
want to prove that there exists some blood pressure reduction.
11
The one sample t-test (2)
Because 𝐻0 and 𝐻1 are statements about the expectation µ the corresponding estimator 𝑋̅ =
(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )⁄𝑛 should play a dominant role in this testing problem.
We know: 𝑋̅ has a normal distribution with expectation µ and standard deviation 𝜎⁄√𝑛 .
(𝑋̅−µ)
From standard theory we get furthermore: 𝑍= ~ 𝑁(0,1)
𝜎 ⁄ √𝑛
In case of independent stochastic variables 𝑋𝑖 ~ 𝑁(µ, 𝜎 2 ), one can prove the following result:
(𝑋̅−µ)
has a t-distribution with 𝑛 − 1 degrees of freedom.
𝑆 ⁄ √𝑛
The shape of the t-distributions resemble the shape of the 𝑁(0,1)-distribution, they are again
symmetric around 0 but the tails are thicker.
(𝑋̅ −µ)
𝑃 (−2.064 < < 2.064) = 0.95 as 𝑛 − 1 = 24
𝑆 ⁄√ 𝑛
This kind of calculations are necessary for performing tests or constructing confidence
intervals, here the difference between 2.064 and 1.96 is caused by estimation of 𝜎 which is
unknown as well.
(𝑋̅−µ)
The theoretical result about renders the test statistic.
𝑆 ⁄ √𝑛
𝑋̅
Note that under 𝐻0 the expectation µ disappears, we get 𝑇 = 𝑆⁄ which is observable
√𝑛
and hence suitable for being a test statistic.
𝑋̅
Under 𝐻0 the test statistic 𝑇 = 𝑆⁄ has a t-distribution with 𝑛 − 1 degrees of freedom
√𝑛
(short notation 𝑑𝑓 = 𝑛 − 1).
12
The one sample t-test (3)
We are now ready to do the eight steps of the one sample t-test for our example. We choose
𝛼 = 5% .
(1) The individual blood pressure reductions are independent and normally distributed
with expectation µ and standard deviation 𝜎 .
(2) We test 𝐻0 : µ = 0 against 𝐻1 : µ > 0.
𝑋̅
(3) Test statistic: 𝑇 =
𝑆⁄√𝑛
(4) Under 𝐻0 : 𝑇 has the t-distribution with 𝑛 − 1 = 24 degrees of freedom
5.32
(5) Outcome of 𝑇: 𝑇 = = 5.61
4.74/√25
(6) We reject the null hypothesis if 𝑇 ≥ 𝑐.
t-table, 𝛼 = 5% : 𝑐 = 1.711
(7) As 𝑇 = 5.61 we reject the null hypothesis.
(8) We conclude the treatment is successful in reducing the blood pressure.
Remarks:
The one sided P-value is here 𝑃(𝑇 ≥ 5.61), calculated according the t-distribution
with 𝑑𝑓 = 24. From the t-table we conclude that this P-value is smaller than 0.0005
(since 5.61 is larger than 3.745).
The two sided P-value here is 2 × 𝑃(𝑇 ≥ 5.61) = 𝑃(𝑇 ≤ −5.61) + 𝑃(𝑇 ≥ 5.61).
𝑋̅ −5
If e.g. we test 𝐻0 : µ = 5 against 𝐻1 : µ > 5 then the test statistic becomes 𝑇 = , under
𝑆 ⁄√ 𝑛
𝐻0 the new test statistic 𝑇 has again the t-distribution with 𝑛 − 1 = 24 degrees of
freedom.
13
Confidence intervals for µ (1)
Applying the t-test we concluded that the treatment has some effect on the blood
pressure, there is some blood pressure reduction.
Note that the t-test does not give information about the size of the effect.
The estimator 𝑋̅ gives this information, but the inaccuracy of estimation should
expressed as well.
holds. Instead of the confidence level 95% one may choose other levels, e.g. 99% or
90% etc.
The boundaries 𝐿 and 𝑅 have to be statistics which can be computed from the
sample. For the construction of the confidence interval we need again:
(𝑋̅−µ)
has a t-distribution with 𝑛 − 1 degrees of freedom (here: 𝑛 − 1 = 24)
𝑆 ⁄ √𝑛
From the table of the t-distribution we can conclude that the following event occurs with
probability 95%:
(𝑋̅−µ)
−2.064 < < 2.064
𝑆 ⁄√ 𝑛
𝑆 𝑆
2.064 × > − 𝑋̅ + µ > −2.064 ×
√ 𝑛 √𝑛
𝑆 𝑆
𝑋̅ − 2.064 × < µ < 𝑋̅ + 2.064 ×
√𝑛 √𝑛
𝑆 𝑆
So we should take: 𝐿 = 𝑋̅ − 2.064 × and 𝑅 = 𝑋̅ + 2.064 ×
√𝑛 √𝑛
14
Confidence intervals for µ (2)
The 95% confidence interval for µ is therefore (5.32 − 1.96, 5.32 + 1.96) =
(3.36, 7.28).
We are 95% confident that average reduction of blood pressure (population mean) is
lying between 3.36 and 7.28 .
15
Confidence intervals for p
In a similar way confidence intervals for 𝑝 can be constructed if we observe a count 𝑋
that has a binomial distribution with certain 𝑛 and unknown success probability 𝑝.
We only consider large 𝑛 such that we can apply the normal approximation of the
binomial distribution.
So our starting point is that the distribution of the count 𝑋 is approximately the normal
distribution with expectation µ = 𝑛𝑝 and standard deviation 𝜎 = √𝑛𝑝(1 − 𝑝).
Hence
𝑋−𝑛𝑝 𝑝̂−𝑝 𝑋
𝑍= = has the standard normal distribution with 𝑝̂ = 𝑛
√𝑛𝑝(1−𝑝) √𝑝(1−𝑝)/𝑛
So we conclude:
Applying the standard normal distribution we get that the next event has probability 95%:
So the boundaries of the 95% confidence interval are given by: 𝑝̂ ± 1.96 × 𝑠𝑒(𝑝̂ )
16
Assignment of lectures 𝟐𝒂 and 𝟐𝒃 (CLT and testing theory)
Send your solutions by mail. Use a Word file with your typed solutions, or a Word file
converted to a pdf-file. In case of handwritten solutions: collect your handwritten pages in
one Word file or one pdf-file.
Exercise 1
Consider random numbers 𝑋𝑖 that are independent and all have the uniform distribution on
the interval (0,1). We study the sample mean 𝑋̅ = (𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )⁄𝑛.
𝑎.
For the sample sizes 𝑛 = 100, 𝑛 = 1000, 𝑛 = 10 000 and 𝑛 = 100 000 approximate the
probability 𝑃(𝑋̅ ≤ 0.508) using the central limit theorem (CLT).
𝑏.
The probability 𝑃(𝑋̅ ≤ 0.508) is an increasing function of 𝑛. Check whether your calculations
are in agreement with this statement and explain why this statement is true.
Exercise 2
A certain tennis player makes a successful first serve 83% of the time. Assume that each
serve is independent of the others. Suppose the tennis player serves 100 times in an match.
Define 𝑋 = number of good first serves.
Use the normal approximation of the binomial distribution to compute/approximate the
following probabilities:
𝑎. 𝑃(𝑋 > 90)
𝑏. 𝑃(𝑋 ≤ 75)
𝑐. 𝑃(75 ≤ 𝑋 < 85)
Exercise 3
The waiting time 𝑋 of patients in some hospital has an exponential distribution with (long run)
average 𝜇 = 14 (unit is minute).
𝑎.
Calculate the probability 𝑃(𝑋 > 20), the probability that an arbitrary patient has to wait more
than 20 minutes.
𝑏.
Consider the total waiting time 𝑇 = 𝑋1 + 𝑋2 + ⋯ + 𝑋70 of 70 patients and assume that the 70
waiting times are independent and exponentially distributed with expectation 14.
Approximate the probability that the total waiting time exceeds 1000 minutes.
Exercise 4
Let us return to the binomial test of the lecture notes. We observe
𝑋 = the number of cured patients by the new medicine
which has the binomial distribution with 𝑛 = 200 and unknown 𝑝,
we test 𝐻0 : 𝑝 = 0.60 against 𝐻1 : 𝑝 > 0.60
and we reject the null hypothesis if 𝑋 ≥ 𝑐 . In this exercise we choose 𝜶 = 𝟐%.
𝑎.
Use Excel and the statistical function BINOM.DIST to determine 𝑐.
17
𝑏.
Calculate the probability 𝑃(𝑋 ≥ 𝑐) for 𝑝 = 0.65 using Excel and BINOM.DIST. This is the
power of the test for 𝑝 = 0.65.
𝑐.
Calculate the probability 𝑃(𝑋 ≥ 𝑐) for 𝑝 = 0.70, 𝑝 = 75, … and sketch the graph of the
probability 𝑃(𝑋 ≥ 𝑐), this is the power of the test as function of 𝑝 > 0.60. You should see a
curve/function that increases.
𝑑.
Imagine that we change the value of 𝑛. We take 𝑛 = 500 instead of 𝑛 = 200. The graph of
the power will change as well. Indicate in which way the graph of the power will change and
explain why.
Exercise 5
Let us now consider the one sample t-test of the lecture notes. We study a group of 𝑛
patients.
We assume that the individual blood pressure reductions 𝑋1 , 𝑋2 , … , 𝑋𝑛 are independent and
are all normally distributed with expectation µ and standard deviation 𝜎.
𝑋̅
We test 𝐻0 : µ = 0 against 𝐻1 : µ > 0 using test statistic 𝑇 = 𝑆⁄ 𝑛. We choose 𝛼 = 𝟏%.
√
Suppose that 𝑋̅ = 0.79 and 𝑆 = 2.70 summarize the data.
𝑎.
Determine the critical value 𝑐 and determine whether we have to reject the null hypothesis in
case of 𝑛 = 25, 𝑛 = 50 and 𝑛 = 100 .
𝑏.
For ‘power calculations’ the t-distribution of the test statistic is approximated by the standard
normal distribution. This can be motivated by the fact that for large 𝑛 the difference between
a t-distribution and the standard normal distribution is rather small.
For large 𝑛 we reject then the null hypothesis if 𝑇 ≥ 2.33 (verify this).
Determine the minimal value for 𝑛 such that the power 𝑃(𝑇 ≥ 2.33) is equal to 0.95 for
µ⁄𝜎 = 0.5.
18