Wilson Interval
Wilson Interval
(1-)100% CI for
1/2 2
1 /2 1 /2
2
1 /2
Z n Z
(1 )
n Z 4n
o o
o
t t t +
+
Where
2
n
i 1 /2
i 1
2
1 /2
y Z 2
n Z
o
=
o
+
t =
+
For n > 40, use the Agresti-Coull
1 /2
2
1 /2
(1 )
Z
n Z
o
o
t t
t
+
Where
*** This is essentially a Wald interval where we add
2
1 /2
Z / 2
o
successes and
2
1 /2
Z / 2
o
failures to the
observed data. In fact, when o = 0.05, Z
1-o/2
= 1.96 ~
2. Then
2 n n
i i
i 1 i 1
2
y 2 2 y 2
n 2 n 4
= =
+ +
t = =
+ +
++> When n < 40, the Agresti-Coull interval is
generally still better than the Wald interval.
++> The Wilson interval can be used when n < 40 as
well, and it is generally better than the Agresti-Coull.
*********TRUE CONFIDENCE LEVEL
One could find the EXACT true confidence level without Monte Carlo
simulation! Below are the steps:
1) Find all possible intervals that one could have with w = 0, 1, , n.
2) Form I(w) = 1 if the interval for a w contains t and 0 otherwise.
3) Calculate the true confidence level as
n
w n w
w 0
n
I(w) (1 )
w
=
| |
t t
|
\ .
This is what Brown et al. (2001) did for their paper. The key to using a non-
simulation based approach is there are a finite number of possible values for
the random variable of interest. In other settings beyond confidence
intervals for t, this will usually not occur and simulation will be the only
approach for a finite sample size n.
****LIKELIHOOD RATIO TEST
The LRT statistic, A, is the ratio of two likelihood
functions. The numerator is the likelihood
function maximized over the parameter space
restricted under the null hypothesis. The
denominator is the likelihood function maximized
over the unrestricted parameter space. The test
statistic is written as:
o
o a
Max. lik. when parameters satisfy H
Max. lik. when parameters satisfy H or H
A =
Wilks (1935, 1938) shows that 2log(A) can be
approximated by a
2
u
_
for a large sample and
under Ho where u is the difference in dimension
between the alternative and null hypothesis
parameter spaces.
suppose the hypothesis test
H0:t = 0.5 vs. Ha:t = 0.5 is of interest.
Remember that
i
y 4 =
and n = 10.
The numerator of A is the maximum possible
value of the likelihood function under the null
hypothesis. Because t = 0.5 is the null
hypothesis, the maximum can be found by just
substituting t = 0.5 in the likelihood function:
i i
y n y
1 n
L( 0.5| y ,...,y ) 0.5 (1 0.5)
E E
t = =
Then
4 10 4
1 n
L( 0.5| y ,...,y ) 0.5 (0.5) 0.0009766
t = = =
The denominator of A is the maximum possible
value of the likelihood function under the null OR
alternative hypotheses. Because this includes
all possible values of t here, the maximum is
achieved when the maximum likelihood estimate
is substituted for t in the likelihood function! As
shown previously, the maximum value is
0.001194.
Therefore,
o
o a
Max. lik. when parameters satisfy H
Max. lik. when parameters satisfy H or H
0.0009766
0.8179
0.001194
A =
= =
Then 2log(A) = -2log(0.8179) = 0.4020 is the
test statistic value. The critical value is
2
1,0.95
_ =
3.84 using o = 0.05:
> qchisq(p = 0.95, df = 1)
[1] 3.841459
There is not sufficient evidence to reject the
hypothesis that t = 0.5.