TP_stat_inf_104241
TP_stat_inf_104241
Your Name
Abstract
SECTION A
h = ak+1 − ak for k = 1, 2, . . . , K.
1
Group II Inferential statistics Exam 2017-2018
Z ∞
µ= xf (x) dx.
−∞
Using the formula for Hist(x) in the intervals Bk = (ak , ak+1 ], we can write this as:
Z aK+1 K
X vk
µ̂Hist = x· · 1Bk (x) dx,
a1 k=1
nh
where 1Bk (x) is the indicator function that is 1 if x ∈ Bk and 0 otherwise.
This integral simplifies as:
K Z ak+1
1 X
µ̂Hist = vk x dx.
nh k=1 ak
ak+1 ak+1
x2 a2k+1 a2k
Z
x dx = = − .
ak 2 ak 2 2
K
a2k+1 a2k
1 X
µ̂Hist = vk − .
nh k=1 2 2
K
1 X
µ̂Hist = vk (ak+1 + ak ).
2n k=1
This is the desired expression for the estimated mean based on the histogram.
Interval (5, 10] (10, 15] (15, 20] (20, 25] (25, 30] (30, 35]
Frequency 1 11 39 38 10 1
Group II Inferential statistics Exam 2017-2018
6
1 X
µ̂Hist = vk · (ak+1 + ak ),
2n k=1
1
µ̂Hist = (1(7.5) + 11(12.5) + 39(17.5) + 38(22.5) + 10(27.5) + 1(32.5)) ,
100
Simplifying:
1990
µ̂Hist = = 19.90.
100
A2
E[θ̂] = θ.
In other words, the estimator does not systematically overestimate or underestimate the
true parameter value.
n m
1X 1 X
X̄ = Xi and Ȳ = Yj .
n i=1 m j=1
Since X1 , X2 , . . . , Xn are independent and follow the normal distribution N (µ, σ 2 ), the
expected value of X̄ is:
" n
# n n
1X 1X 1X
E[X̄] = E Xi = E[Xi ] = µ = µ.
n i=1 n i=1 n i=1
2. Expected Value of Ȳ :
Similarly, since Y1 , Y2 , . . . , Ym are independent and follow the distribution N (µ, τ 2 ), the
expected value of Ȳ is:
" m
# m m
1 X 1 X 1 X
E[Ȳ ] = E Yj = E[Yj ] = µ = µ.
m j=1 m j=1 m j=1
We have already shown in part (i) that both X̄ and Ȳ are unbiased estimators of µ.
Specifically,
This follows from the fact that each of the Xi ’s and Yj ’s are independent and follow the
distributions N (µ, σ 2 ) and N (µ, τ 2 ), respectively. Thus, their sample means are unbiased
estimators of the population mean µ.
Var(Xi )
Var(X̄) = .
n
Since the Xi ’s are independent and follow N (µ, σ 2 ), the variance of each Xi is σ 2 . There-
fore, the variance of X̄ is:
σ2
Var(X̄) = .
n
Var(Yj )
Var(Ȳ ) = .
m
Since the Yj ’s are independent and follow N (µ, τ 2 ), the variance of each Yj is τ 2 . Thus,
the variance of Ȳ is:
τ2
Var(Ȳ ) = .
m
Summary of Variances:
σ2 τ2
- The variance of X̄ is Var(X̄) = n
. - The variance of Ȳ is Var(Ȳ ) = m
.
σ2 τ2
Thus, both X̄ and Ȳ are unbiased estimators of µ, and their variances are n
and m
,
respectively.
µ̂ = wX̄ + (1 − w)Ȳ ,
We are tasked with finding the value of w that minimizes the variance of µ̂.
1. Variance of µ̂:
We know that:
Group II Inferential statistics Exam 2017-2018
σ2 τ2
Var(X̄) = , Var(Ȳ ) = .
n m
Therefore, the variance of µ̂ becomes:
σ2 τ2
Var(µ̂) = w2 + (1 − w)2 .
n m
d σ2 τ2
Var(µ̂) = 2w − 2(1 − w) .
dw n m
Setting the derivative equal to zero:
σ2 τ2
2w − 2(1 − w) = 0.
n m
Simplifying:
σ2 τ2
w = (1 − w) .
n m
Expanding:
σ2 τ2 τ2
w = −w .
n m m
Collecting terms involving w:
σ2 τ 2 τ2
w + = .
n m m
Solving for w:
τ2
m
w= σ2 τ2
.
n
+ m
nτ 2
w= .
mσ 2 + nτ 2
Thus, the value of w that minimizes the variance of µ̂ is:
nτ 2
w= .
mσ 2 + nτ 2
Group II Inferential statistics Exam 2017-2018
To ensure that this value of w minimizes the variance, we calculate the second derivative
of the variance with respect to w.
The second derivative is:
d2 σ2 τ2
Var(µ̂) = 2 + 2 .
dw2 n m
σ2 τ2
Since both n
and m
are positive, we have:
d2 σ2 τ 2
Var(µ̂) = 2 + > 0.
dw2 n m
Therefore, the function Var(µ̂) is concave upwards, confirming that the value of w =
nτ 2
mσ 2 +nτ 2
indeed minimizes the variance.
A3
This means we want to test whether the population mean is 13.5, or if it differs from this
value (i.e., the mean could be either greater than or less than 13.5).
Now, we will proceed with the hypothesis tests for the following two cases:
Group II Inferential statistics Exam 2017-2018
Given that n = 8, ni=1 xi = 113.6627, and σ 2 = 1, we use a two-tailed t-test. Note that
P
since the sample size is small (n = 8), we use the t-distribution even though σ 2 is known.
This is because, for small sample sizes, the t-distribution is more appropriate than the
normal distribution, which assumes that the sample is large enough to approximate the
true distribution of the mean.
The test statistic is given by:
x̄ − µ0
t=
√σ
n
n
1X 113.6627
x̄ = xi = = 14.2084
n i=1 8
The critical value for a two-tailed test at the 5% significance level with 7 degrees of
freedom is approximately ±2.3646. Since |t| = 2.00 is less than the critical value 2.3646,
we fail to reject the null hypothesis H0 at the 5% significance level.
x̄ − µ0
t= s′
√
n
n Pn !
2
1 X ( i=1 xi )
s′2 = x2i −
n−1 i=1
n
(113.6627)2
′2 1 1
s = 1621.391 − = × 6.846 = 0.9779
7 8 7
The critical value for a two-tailed test at the 5% significance level with 7 degrees of
freedom is approximately ±2.3646. Since |t| = 2.03 is less than the critical value 2.3646,
we fail to reject the null hypothesis H0 at the 5% significance level.
A4
P (a(X) ≤ θ ≤ b(X)) = 1 − α.
This means that, in repeated sampling, the interval [a(X), b(X)] will contain the true
value of the parameter θ in 1−α of the cases. The confidence level 100(1−α)% represents
the probability that the confidence interval will contain the true value of the parameter.
In other words, if we performed the same experiment multiple times and calculated the
interval I(X) each time, this interval would contain the true value of θ in 1 − α of the
trials.
ples. The estimator of the parameter θ is defined as θ̂ = X̄1 − X̄2 , where X̄1 and X̄2 are
independent and follow normal distributions:
σ12 σ2
X̄1 ∼ N (µ1 , ) and X̄2 ∼ N (µ2 , 2 ).
n m
σ12 σ22
Var(θ̂) = Var(X̄1 ) + Var(X̄2 ) = + .
n m
σ12 σ22
θ̂ ∼ N (µ1 − µ2 , + ).
n m
10
X 20
X
n = 10, m = 20, X1i = 96.08, X2j = 237.09.
i=1 j=1
96.08 237.09
X̄1 = = 9.608, X̄2 = = 11.8545.
10 20
Next, we calculate the sample variances (which are now given as σ12 = 2 and σ22 = 4):
Now, to construct a 95% confidence interval for θ = µ1 − µ2 , we use the fact that the
sampling distribution of θ̂ is normal. Since the sample sizes n = 10 and m = 20 are
relatively small, we will use the Student’s t-distribution for the confidence interval.
The standard error of θ̂ is:
r
√
r
σ12 σ22 2 4 √
SE(θ̂) = + = + = 0.2 + 0.2 = 0.4 ≈ 0.6325.
n m 10 20
Using the t-distribution with ν = min(n − 1, m − 1) = min(10 − 1, 20 − 1) = 9 degrees
of freedom and a 95% confidence level, we look up the critical value t0.025,9 from the
t-distribution table, which is approximately 2.262.
Thus, the 95% confidence interval for θ = µ1 − µ2 is:
[−3.6755, −0.8175].
SECTION B
B5
Expanding the expression for the sum of squared deviations from the sample mean:
n
X n
X
2
(Xi − X̄) = (Xi − µ + µ − X̄)2 .
i=1 i=1
We know that: n n
X X
2
(Xi − X̄) = (Xi − µ)2 − n(X̄ − µ)2 .
i=1 i=1
2
Since E[(Xi − µ)2 ] = σ 2 and E[(X̄ − µ)2 ] = σn , we get:
σ2
2 1 2
E[S ] = nσ − n = σ2.
n−1 n
(n − 1)S 2
∼ χ2n−1 ,
σ2
(n−1)S 2
which means that σ2
follows a chi-squared distribution with n−1 degrees of freedom.
Step 2: Confidence Interval Construction
To construct a confidence interval for σ 2 , we need to use the fact that the chi-squared
distribution is not symmetric, but its cumulative distribution function (CDF) gives us
the probability of the value falling within a particular range.
We want to find critical values corresponding to the desired confidence level. These
critical values are denoted as: - χ2n−1,α/2 , the critical value corresponding to the lower tail
of the distribution. - χ2n−1,1−α/2 , the critical value corresponding to the upper tail of the
distribution.
Step 3: Deriving the Confidence Interval
Using the properties of the chi-squared distribution, we can derive the confidence interval
for σ 2 as follows:
(n − 1)S 2
2 2
P χn−1,α/2 ≤ ≤ χn−1,1−α/2 = 1 − α.
σ2
Group II Inferential statistics Exam 2017-2018
• n = 10
•
Pn
i=1 xi = 104.334
•
Pn 2
i=1 xi = 1132.207
B6
n
Y
L(p) = pX (Xi )
i=1
n
Y
L(p) = (1 − p)Xi p
i=1
Since the product involves n terms, we can simplify the expression as:
n
Y
n
L(p) = p (1 − p)Xi
i=1
We want to maximize this likelihood function with respect to p. We took the log-likelihood
function:
n
!
X
ℓ(p) = log(L(p)) = n log(p) + Xi log(1 − p)
i=1
Taking the derivative of ℓ(p) with respect to p and setting it to 0, we found that:
Pn
n Xi
i=1
=
p 1−p
Cross-multiplying and simplifying:
n
X
n − np = p Xi
i=1
n
!
X
n=p n+ Xi
i=1
Solving for p:
n
p= Pn
n+ i=1 Xi
n
X
Xi = nX̄
i=1
n 1
p= =
n + nX̄ 1 + X̄
Therefore, the maximum likelihood estimator for p is:
1
p̂ =
1 + X̄
Group II Inferential statistics Exam 2017-2018
1−b 1−a
(iii) Showing the Relation Between P (a < p < b) and P b < X̄ < a
From Part (ii), we know that the maximum likelihood estimator for p is given by:
1
p̂ =
1 + X̄
Now, we want to compute the probability P (a < p < b). Using the relationship between
p and p̂, we have:
1
P (a < p < b) = P a< <b
1 + X̄
1 1
< 1 + X̄ <
b a
2. Subtract 1 from both sides:
1 1
− 1 < X̄ < − 1
b a
3. Simplify each side:
1−b 1−a
< X̄ <
b a
Thus, we have:
1−b 1−a
P (a < p < b) = P < X̄ <
b a
Since X̄ is the sample mean, this relation allows us to calculate the probability for p in
terms of the sample mean X̄.
1−p 1 − 0.25
E(X1 ) = = =3
p 0.25
1−p 1 − 0.25
Var(X1 ) = 2
= = 0.12
p (0.25)2
1−p 0.12
E(X̄) = = 3, Var(X̄) = = 0.12
p 100
√
SD(X̄) = 0.12 ≈ 0.3464
Now,using the result from part (iii), we can express the probability P (0.22 < p̂ < 0.26)
as:
1 − 0.26 1 − 0.22
P (0.22 < p̂ < 0.26) = P < X̄ < .
0.26 0.22
X̄ − E(X̄)
Z= .
SD(X̄)
2.8462 − 3
Zlower = ≈ −0.444.
0.3464
For X̄ = 3.5455:
3.5455 − 3
Zupper = ≈ 1.576.
0.3464
Using the standard normal distribution table, we find:
Therefore, the approximate probability that 0.22 < p̂ < 0.26 is:
0.61415 .
B7
(i)
Consider a clinical study on a new treatment for Rhinovirus with n patients. Each
patient has a probability p of recovering, independently of other patients. Let Xi denote
the indicator variable for the i-th patient’s recovery:
(
1 if the i-th patient recovers
Xi =
0 if the i-th patient does not recover
H0 : p = p0 vs H1 : p > p0
We are interested in finding the expression for the sample proportion p̂ and its approxi-
mate distribution under the null hypothesis H0 .
The sample proportion p̂ is simply the average of the Xi ’s, i.e., the proportion of patients
that recover in the sample. The mathematical expression for the sample proportion is:
n
1X
p̂ = Xi
n i=1
This quantity p̂ represents the observed proportion of recovered patients in the sample.
Xi ∼ Bernoulli(p0 )
n
1X
p̂ = Xi
n i=1
E[p̂] = p0
Now, applying the **Central Limit Theorem (CLT)**, which states that the sample
mean of a large number of independent and identically distributed random variables will
approximate a normal distribution, we find that for large n, the sample proportion p̂
approximately follows a normal distribution:
p0 (1 − p0 )
p̂ ∼ N p0 ,
n
Thus, for large n, the distribution of p̂ is approximately normal with mean p0 and variance
p0 (1−p0 )
n
.
Conclusion
n
1X
p̂ = Xi
n i=1
Under the null hypothesis H0 : p = p0 , for large n, the approximate distribution of p̂ is:
p0 (1 − p0 )
p̂ ∼ N p0 ,
n
This result allows us to use the normal approximation for hypothesis testing and confi-
dence intervals regarding the recovery proportion p.
Another way of testing H0 vs H1 is to reject H0 when Z2 > zα . This gives a test with
significance level α2 , and for large n, α2 ≈ α. Usually, α1 approximates α more closely
than α2 .
It is known that under the testing procedure based on Z2 , H0 is rejected if and only if
p̂ > γ, where:
√
z2 zα2
−b + b2 − 4ac
γ= , with a = 1 + α , b = − 2p0 + , c = p20 .
2a n n
Under the hypothesis test, we reject H0 if p̂ > γ, where γ is defined by the quadratic
formula. The probability of rejecting H0 is:
γ−p
P (p̂ > γ) = P Z > q ,
p(1−p)
n
γ−p γ−p √
q =p · n.
p(1−p) p(1 − p)
n
Using the cumulative distribution function Φ of the standard normal distribution, we can
write the probability as:
Group II Inferential statistics Exam 2017-2018
!
γ−p √
P (p̂ > γ) = 1 − Φ p · n .
p(1 − p)
Conclusion
The probability of rejecting H0 under the alternative hypothesis p > p0 is given by:
√ !
n(γ − p)
P (p̂ > γ) = 1 − Φ p .
p(1 − p)
This expression provides the desired form where the square root of n appears in the
numerator, as requested.
Given Parameters
Step 1: Compute γ
2
- b = − 2p0 + zα
n
,
- c = p20 .
Substituting the known values:
- zα = −1.6449,
- p0 = 0.3,
- n = 200,
we compute:
(−1.6449)2 2.702
a=1+ =1+ = 1 + 0.01352 = 1.01352,
200 200
(−1.6449)2
b = − 2(0.3) + = − (0.6 + 0.01352) = −0.61352,
200
c = (0.3)2 = 0.09.
Thus,
√
0.61352 + 0.0116 0.61352 + 0.1077 0.72122
γ= = = ≈ 0.3557.
2(1.01352) 2.02702 2.02704
So, γ ≈ 0.3557.
Now that we have γ ≈ 0.3557, we compute the probability of rejecting H0 when the true
proportion is p = 0.35.
The probability of rejecting H0 is:
!
γ−p √
P (p̂ > γ) = 1 − Φ p · n .
p(1 − p)
!
0.3557 − 0.35 √
P (p̂ > γ) = 1 − Φ p · 200 .
0.35(1 − 0.35)
p √ √
0.35(1 − 0.35) = 0.35 × 0.65 = 0.2275 ≈ 0.4769.
Now calculate:
Conclusion
0.4412 .
This means that there is approximately **44.12%** chance of rejecting H0 when the true
proportion is p = 0.35, p0 = 0.3, and the sample size is n = 200, with a significance level
of α = 0.05.
p̂ − p0
Z2 = q ,
p̂(1−p̂)
n
Group II Inferential statistics Exam 2017-2018
where:
- p̂ is the sample proportion,
- p0 is the hypothesized proportion under the null hypothesis,
- n is the sample size.
Under the testing procedure, we reject H0 if and only if Z2 > zα . This can be rewritten
as:
To find the equivalent condition in terms of p̂, we start with the expression for Z22 :
2
p̂ − p0 (p̂ − p0 )2
Z22 = q = p̂(1−p̂) .
p̂(1−p̂)
n n
(p̂ − p0 )2
p̂(1−p̂)
> zα2 .
n
p̂(1 − p̂)
(p̂ − p0 )2 > zα2 · .
n
Next, we solve for p̂ by considering the equality case. We set the inequality to equality
to find the critical threshold γ:
p̂(1 − p̂)
(p̂ − p0 )2 = zα2 · .
n
Expanding both sides:
p̂(1 − p̂)
p̂2 − 2p0 p̂ + p20 = zα2 · .
n
Now, multiply out the right-hand side:
zα2
p̂2 − 2p0 p̂ + p20 = (p̂ − p̂2 ).
n
Rearranging all terms involving p̂ to one side:
Group II Inferential statistics Exam 2017-2018
zα2 z2
p̂2 − 2p0 p̂ + p20 = p̂ − α p̂2 .
n n
Now, group the terms involving p̂2 together:
zα2 2 z2
p̂2 + p̂ − 2p0 p̂ + p20 = α p̂.
n n
zα2 zα2
2
1+ p̂ − 2p0 + p̂ + p20 = 0.
n n
This is a quadratic equation in p̂, which we solve using the quadratic formula:
√
−b ± b2 − 4ac
p̂ = ,
2a
where:
2
- a = 1 + znα ,
2
- b = − 2p0 + zα
n
,
- c = p20 .
Substituting these values into the quadratic formula, we obtain:
r 2
2
zα 2 2
2p0 + n
± 2p0 + znα − 4 1 + zα
n
p20
p̂ = 2
.
2 1 + znα
The positive root of this quadratic equation corresponds to the critical value γ, so we
define:
√
−b + b2 − 4ac
γ= .
2a
Thus, the rejection criterion is:
Conclusion
We have shown that under the testing procedure based on Z2 , H0 is rejected if and only
if p̂ > γ, where γ is given by the quadratic formula:
√
−b + b2 − 4ac
γ= ,
2a
with the coefficients:
Group II Inferential statistics Exam 2017-2018
2
- a = 1 + znα ,
2
- b = − 2p0 + zα
n
,
- c = p20 .
This shows that the test based on Z2 is equivalent to rejecting H0 when p̂ > γ, as required.