0% found this document useful (0 votes)
12 views26 pages

TP_stat_inf_104241

Uploaded by

Uriel Johnson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views26 pages

TP_stat_inf_104241

Uploaded by

Uriel Johnson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Statistical and Mathematical Analysis

Your Name

November 26, 2024

Abstract

This document presents an in-depth statistical and mathematical analysis of...

SECTION A

A1. Density Histogram and Estimation of the Mean


Suppose that we have a sample of observations x1 , x2 , . . . , xn obtained by random sam-
pling from a continuous distribution with probability density function f (x).
The range [a1 , aK+1 ] is divided evenly into K subintervals, or bins, Bk = (ak , ak+1 ], where
k = 1, 2, . . . , K, each of width h. Recall that the density histogram based on these bins
is defined as:
(
vk
for x ∈ Bk ,
Hist(x) = nh
0 otherwise,
where vk is the frequency count of the number of data points that fall in bin Bk , and n
is the total number of data points.

(i) Definition of the Notation and Equation for h


In the formula for the histogram, the notation vk refers to the number of observations
that fall within the bin Bk = (ak , ak+1 ]. The width of each bin, h, can be expressed in
terms of ak and ak+1 as:

h = ak+1 − ak for k = 1, 2, . . . , K.

(ii) Estimating the Mean Using the Histogram


The distribution mean (or population mean) is defined as:

1
Group II Inferential statistics Exam 2017-2018

Z ∞
µ= xf (x) dx.
−∞

One way of estimating the population mean is by substituting Hist(x) as an estimate of


the probability density function f (x). This gives an estimate of the mean as:
Z aK+1
µ̂Hist = x Hist(x) dx.
a1

Using the formula for Hist(x) in the intervals Bk = (ak , ak+1 ], we can write this as:

Z aK+1 K
X vk
µ̂Hist = x· · 1Bk (x) dx,
a1 k=1
nh
where 1Bk (x) is the indicator function that is 1 if x ∈ Bk and 0 otherwise.
This integral simplifies as:

K Z ak+1
1 X
µ̂Hist = vk x dx.
nh k=1 ak

We now compute the integral for each bin Bk = (ak , ak+1 ]:

ak+1 ak+1
x2 a2k+1 a2k
Z 
x dx = = − .
ak 2 ak 2 2

Thus, the estimate for the mean becomes:

K
a2k+1 a2k
 
1 X
µ̂Hist = vk − .
nh k=1 2 2

Rearranging, we can express the estimate as:

K
1 X
µ̂Hist = vk (ak+1 + ak ).
2n k=1

This is the desired expression for the estimated mean based on the histogram.

(iii) Estimating the Mean of the Distribution from the Data


1. Frequency Table:

The data set is given by the following frequency table:

Interval (5, 10] (10, 15] (15, 20] (20, 25] (25, 30] (30, 35]
Frequency 1 11 39 38 10 1
Group II Inferential statistics Exam 2017-2018

The bin midpoints are as follows:


- For (5, 10], midpoint = 5+10
2
= 7.5.
- For (10, 15], midpoint = 2 = 12.5.
10+15

- For (15, 20], midpoint = 15+20


2
= 17.5.
- For (20, 25], midpoint = 20+25
2
= 22.5.
- For (25, 30], midpoint = 25+30
2
= 27.5.
- For (30, 35], midpoint = 30+35
2
= 32.5.
The bin width is h = 5 for each interval.
Using the formula for the estimated mean:

6
1 X
µ̂Hist = vk · (ak+1 + ak ),
2n k=1

we substitute the frequency values and the midpoints:

1
µ̂Hist = (1(7.5) + 11(12.5) + 39(17.5) + 38(22.5) + 10(27.5) + 1(32.5)) ,
100
Simplifying:

1990
µ̂Hist = = 19.90.
100

Thus, the estimated mean of the distribution is 19.90 .

A2

(i) Definition of an Unbiased Estimator

An estimator θ̂ for a parameter θ is said to be unbiased if its expected value is equal to


the true value of the parameter, i.e.,

E[θ̂] = θ.

In other words, the estimator does not systematically overestimate or underestimate the
true parameter value.

Context of the Exercise:

In this exercise, we have two independent samples X1 , X2 , . . . , Xn and Y1 , Y2 , . . . , Ym ,


each drawn from independent normal distributions N (µ, σ 2 ) and N (µ, τ 2 ), respectively,
where µ is the population mean, σ 2 is the population variance for the X’s, and τ 2 is the
population variance for the Y ’s.
Group II Inferential statistics Exam 2017-2018

The two estimators of the population mean µ are defined as follows:

n m
1X 1 X
X̄ = Xi and Ȳ = Yj .
n i=1 m j=1

These are sample means of sizes n and m, respectively.

1. Expected Value of X̄:

Since X1 , X2 , . . . , Xn are independent and follow the normal distribution N (µ, σ 2 ), the
expected value of X̄ is:
" n
# n n
1X 1X 1X
E[X̄] = E Xi = E[Xi ] = µ = µ.
n i=1 n i=1 n i=1

Thus, X̄ is an unbiased estimator of µ.

2. Expected Value of Ȳ :

Similarly, since Y1 , Y2 , . . . , Ym are independent and follow the distribution N (µ, τ 2 ), the
expected value of Ȳ is:
" m
# m m
1 X 1 X 1 X
E[Ȳ ] = E Yj = E[Yj ] = µ = µ.
m j=1 m j=1 m j=1

Therefore, Ȳ is also an unbiased estimator of µ.

(ii) Unbiasedness of X̄ and Ȳ , and Calculation of Their Variances


Unbiasedness of X̄ and Ȳ (Reminder):

We have already shown in part (i) that both X̄ and Ȳ are unbiased estimators of µ.
Specifically,

E[X̄] = µ and E[Ȳ ] = µ.

This follows from the fact that each of the Xi ’s and Yj ’s are independent and follow the
distributions N (µ, σ 2 ) and N (µ, τ 2 ), respectively. Thus, their sample means are unbiased
estimators of the population mean µ.

Calculation of the Variance of X̄:

The variance of the sample mean X̄ is given by the formula:


Group II Inferential statistics Exam 2017-2018

Var(Xi )
Var(X̄) = .
n
Since the Xi ’s are independent and follow N (µ, σ 2 ), the variance of each Xi is σ 2 . There-
fore, the variance of X̄ is:

σ2
Var(X̄) = .
n

Calculation of the Variance of Ȳ :

Similarly, the variance of the sample mean Ȳ is:

Var(Yj )
Var(Ȳ ) = .
m
Since the Yj ’s are independent and follow N (µ, τ 2 ), the variance of each Yj is τ 2 . Thus,
the variance of Ȳ is:

τ2
Var(Ȳ ) = .
m

Summary of Variances:
σ2 τ2
- The variance of X̄ is Var(X̄) = n
. - The variance of Ȳ is Var(Ȳ ) = m
.
σ2 τ2
Thus, both X̄ and Ȳ are unbiased estimators of µ, and their variances are n
and m
,
respectively.

(iii) Minimizing the Variance of the Combined Estimator


We are given that the combined estimator for the population mean µ is:

µ̂ = wX̄ + (1 − w)Ȳ ,

where X̄ = n1 ni=1 Xi and Ȳ = m1 m j=1 Yj .


P P

We are tasked with finding the value of w that minimizes the variance of µ̂.

1. Variance of µ̂:

Since X̄ and Ȳ are independent, the variance of µ̂ is:

Var(µ̂) = Var wX̄ + (1 − w)Ȳ = w2 Var(X̄) + (1 − w)2 Var(Ȳ ).




We know that:
Group II Inferential statistics Exam 2017-2018

σ2 τ2
Var(X̄) = , Var(Ȳ ) = .
n m
Therefore, the variance of µ̂ becomes:

σ2 τ2
Var(µ̂) = w2 + (1 − w)2 .
n m

2. Minimizing the Variance:

To minimize the variance, we take the derivative of Var(µ̂) with respect to w:

d σ2 τ2
Var(µ̂) = 2w − 2(1 − w) .
dw n m
Setting the derivative equal to zero:

σ2 τ2
2w − 2(1 − w) = 0.
n m
Simplifying:

σ2 τ2
w = (1 − w) .
n m
Expanding:

σ2 τ2 τ2
w = −w .
n m m
Collecting terms involving w:

σ2 τ 2 τ2
 
w + = .
n m m

Solving for w:

τ2
m
w= σ2 τ2
.
n
+ m

Multiplying the numerator and denominator by nm:

nτ 2
w= .
mσ 2 + nτ 2
Thus, the value of w that minimizes the variance of µ̂ is:

nτ 2
w= .
mσ 2 + nτ 2
Group II Inferential statistics Exam 2017-2018

3. Verifying that this is a Minimum:

To ensure that this value of w minimizes the variance, we calculate the second derivative
of the variance with respect to w.
The second derivative is:

d2 σ2 τ2
Var(µ̂) = 2 + 2 .
dw2 n m
σ2 τ2
Since both n
and m
are positive, we have:

d2 σ2 τ 2
 
Var(µ̂) = 2 + > 0.
dw2 n m

Therefore, the function Var(µ̂) is concave upwards, confirming that the value of w =
nτ 2
mσ 2 +nτ 2
indeed minimizes the variance.

A3

(i) Definition of Type I and Type II Errors, and Significance Level


1. Type I Error: An error of type I occurs when we reject the null hypothesis H0 when
it is actually true. The probability of committing a type I error is denoted by α, which
is also called the significance level of the test.
2. Type II Error: An error of type II occurs when we fail to reject the null hypothesis
H0 when it is actually false (i.e., when H1 is true). The probability of committing a type
II error is denoted by β.
3. Significance Level α: The significance level α is the probability of rejecting the null
hypothesis when it is true. For example, a 5% significance level corresponds to a 5%
probability of making a type I error.

Hypothesis Testing: Goodness-of-Fit Test


In this exercise, we are performing a *Goodness-of-Fit Test* to check whether the pop-
ulation mean follows a specified value. A goodness-of-fit test compares observed data
against a hypothesized value for a population parameter, such as the mean or variance.
In this case, we are testing the population mean µ to see if it equals a specific value.
We are testing the following hypotheses:

H0 : µ = 13.5 versus H1 : µ ̸= 13.5

This means we want to test whether the population mean is 13.5, or if it differs from this
value (i.e., the mean could be either greater than or less than 13.5).
Now, we will proceed with the hypothesis tests for the following two cases:
Group II Inferential statistics Exam 2017-2018

• Part (ii): Test with known variance (σ 2 = 1).


• Part (iii): Test with unknown variance, using the sample estimate of variance.

(ii) Testing with Known Variance (σ 2 = 1) using t-Test


We are testing the hypotheses:

H0 : µ = 13.5 versus H1 : µ ̸= 13.5

Given that n = 8, ni=1 xi = 113.6627, and σ 2 = 1, we use a two-tailed t-test. Note that
P
since the sample size is small (n = 8), we use the t-distribution even though σ 2 is known.
This is because, for small sample sizes, the t-distribution is more appropriate than the
normal distribution, which assumes that the sample is large enough to approximate the
true distribution of the mean.
The test statistic is given by:

x̄ − µ0
t=
√σ
n

First, calculate the sample mean x̄:

n
1X 113.6627
x̄ = xi = = 14.2084
n i=1 8

Now, calculate the test statistic t:

14.2084 − 13.5 14.2084 − 13.5


t= = ≈ 2.00
√1 0.3536
8

The critical value for a two-tailed test at the 5% significance level with 7 degrees of
freedom is approximately ±2.3646. Since |t| = 2.00 is less than the critical value 2.3646,
we fail to reject the null hypothesis H0 at the 5% significance level.

(iii) Testing with Unknown Variance using t-Test


When σ 2 is unknown, we use a t-test. Note again that due to the small sample size
(n = 8), we continue to use the t-distribution. This is crucial because, for small samples,
the t-distribution accounts for the additional uncertainty in estimating the population
variance.
The test statistic is given by:

x̄ − µ0
t= s′

n

First, calculate the sample variance s′2 :


Group II Inferential statistics Exam 2017-2018

n Pn !
2
1 X ( i=1 xi )
s′2 = x2i −
n−1 i=1
n

(113.6627)2
 
′2 1 1
s = 1621.391 − = × 6.846 = 0.9779
7 8 7

Now, calculate the t-statistic:

14.2084 − 13.5 0.7084


t= 0.9888 = ≈ 2.03

8
0.349

The critical value for a two-tailed test at the 5% significance level with 7 degrees of
freedom is approximately ±2.3646. Since |t| = 2.03 is less than the critical value 2.3646,
we fail to reject the null hypothesis H0 at the 5% significance level.

A4

(i) Definition of a Confidence Interval


An estimator I(X) = [a(X), b(X)] for a parameter θ is said to be a 100(1−α)% confidence
interval for θ if the following condition is satisfied:

P (a(X) ≤ θ ≤ b(X)) = 1 − α.

This means that, in repeated sampling, the interval [a(X), b(X)] will contain the true
value of the parameter θ in 1−α of the cases. The confidence level 100(1−α)% represents
the probability that the confidence interval will contain the true value of the parameter.
In other words, if we performed the same experiment multiple times and calculated the
interval I(X) each time, this interval would contain the true value of θ in 1 − α of the
trials.

(ii) Expression for E(θ̂) and Var(θ̂)


Let X̄1 = n1 ni=1 X1i and X̄2 = m1 m j=1 X2j , the sample means from independent sam-
P P

ples. The estimator of the parameter θ is defined as θ̂ = X̄1 − X̄2 , where X̄1 and X̄2 are
independent and follow normal distributions:

σ12 σ2
X̄1 ∼ N (µ1 , ) and X̄2 ∼ N (µ2 , 2 ).
n m

Thus, the expected value and variance of θ̂ = X̄1 − X̄2 are:

E[θ̂] = E[X̄1 ] − E[X̄2 ] = µ1 − µ2


Group II Inferential statistics Exam 2017-2018

σ12 σ22
Var(θ̂) = Var(X̄1 ) + Var(X̄2 ) = + .
n m

This estimator θ̂ follows a normal distribution:

σ12 σ22
θ̂ ∼ N (µ1 − µ2 , + ).
n m

(iii) 95% Confidence Interval for θ = µ1 − µ2


We are given the following data:

10
X 20
X
n = 10, m = 20, X1i = 96.08, X2j = 237.09.
i=1 j=1

From this, we calculate the sample means:

96.08 237.09
X̄1 = = 9.608, X̄2 = = 11.8545.
10 20
Next, we calculate the sample variances (which are now given as σ12 = 2 and σ22 = 4):

θ̂ = X̄1 − X̄2 = 9.608 − 11.8545 = −2.2465.

Now, to construct a 95% confidence interval for θ = µ1 − µ2 , we use the fact that the
sampling distribution of θ̂ is normal. Since the sample sizes n = 10 and m = 20 are
relatively small, we will use the Student’s t-distribution for the confidence interval.
The standard error of θ̂ is:
r

r
σ12 σ22 2 4 √
SE(θ̂) = + = + = 0.2 + 0.2 = 0.4 ≈ 0.6325.
n m 10 20
Using the t-distribution with ν = min(n − 1, m − 1) = min(10 − 1, 20 − 1) = 9 degrees
of freedom and a 95% confidence level, we look up the critical value t0.025,9 from the
t-distribution table, which is approximately 2.262.
Thus, the 95% confidence interval for θ = µ1 − µ2 is:

θ̂ ± t0.025,9 × SE(θ̂) = −2.2465 ± 2.262 × 0.6325.

The margin of error is:

2.262 × 0.6325 ≈ 1.429.

Thus, the 95% confidence interval is:


Group II Inferential statistics Exam 2017-2018

[−2.2465 − 1.429, −2.2465 + 1.429] = [−3.6755, −0.8175].

Therefore, the 95% confidence interval for µ1 − µ2 is approximately:

[−3.6755, −0.8175].

(iv) Is it plausible that µ1 = µ2 ?


To determine if it is plausible that µ1 = µ2 , we observe that the 95% confidence interval
for µ1 − µ2 is [−3.6755, −0.8175]. Since this interval does not contain 0, it is not plausible
that µ1 = µ2 at the 95% confidence level.

SECTION B

B5

(i) Let’s show that:


n
X n
X
2
(Xi − X̄) = (Xi − µ)2 − n(X̄ − µ)2 .
i=1 i=1

Expanding the expression for the sum of squared deviations from the sample mean:
n
X n
X
2
(Xi − X̄) = (Xi − µ + µ − X̄)2 .
i=1 i=1

Using the binomial expansion:

(Xi − X̄)2 = (Xi − µ)2 + (µ − X̄)2 − 2(Xi − µ)(X̄ − µ).

Summing over i gives:


n
X n
X
2
(Xi − X̄) = (Xi − µ)2 − n(X̄ − µ)2 .
i=1 i=1

(ii) Let’s show that S 2 is an unbiased estimator of σ 2 :


The sample variance is defined as:
n
2 1 X
S = (Xi − X̄)2 .
n − 1 i=1
Group II Inferential statistics Exam 2017-2018

We know that: n n
X X
2
(Xi − X̄) = (Xi − µ)2 − n(X̄ − µ)2 .
i=1 i=1

Taking the expectation:


" n # n
!
1 X 1 X
E[S 2 ] = E (Xi − X̄)2 = E[(Xi − µ)2 ] − nE[(X̄ − µ)2 ] .
n−1 i=1
n−1 i=1

2
Since E[(Xi − µ)2 ] = σ 2 and E[(X̄ − µ)2 ] = σn , we get:

σ2
 
2 1 2
E[S ] = nσ − n = σ2.
n−1 n

Thus, S 2 is an unbiased estimator of σ 2 .

(iii) We are tasked with showing that the interval estimator:


!
2 2
(n − 1)S (n − 1)S
I(X) = ,
χ2n−1,1−α/2 χ2n−1,α/2
is a 100(1 − α)% confidence interval for σ 2 .
To do so, we use the following steps:
Step 1: Chi-Squared Distribution of Sample Variance
The sample variance S 2 follows a scaled chi-squared distribution:

(n − 1)S 2
∼ χ2n−1 ,
σ2
(n−1)S 2
which means that σ2
follows a chi-squared distribution with n−1 degrees of freedom.
Step 2: Confidence Interval Construction
To construct a confidence interval for σ 2 , we need to use the fact that the chi-squared
distribution is not symmetric, but its cumulative distribution function (CDF) gives us
the probability of the value falling within a particular range.
We want to find critical values corresponding to the desired confidence level. These
critical values are denoted as: - χ2n−1,α/2 , the critical value corresponding to the lower tail
of the distribution. - χ2n−1,1−α/2 , the critical value corresponding to the upper tail of the
distribution.
Step 3: Deriving the Confidence Interval
Using the properties of the chi-squared distribution, we can derive the confidence interval
for σ 2 as follows:
(n − 1)S 2
 
2 2
P χn−1,α/2 ≤ ≤ χn−1,1−α/2 = 1 − α.
σ2
Group II Inferential statistics Exam 2017-2018

Rearranging the inequality to isolate σ 2 , we obtain:


(n − 1)S 2 2 (n − 1)S 2
≤ σ ≤ .
χ2n−1,1−α/2 χ2n−1,α/2

This gives the desired 100(1 − α)% confidence interval for σ 2 .

(iv) Calculation of the 95% Confidence Interval for σ 2


Given:

• n = 10


Pn
i=1 xi = 104.334


Pn 2
i=1 xi = 1132.207

Step 1: Calculate the Sample Mean X̄


Pn
i=1 xi 104.334
X̄ = = = 10.4334
n 10
Step 2: Calculate the Sample Variance S 2
Using the formula for the sample variance:
n Pn !
2
1 X ( i=1 xi )
S2 = x2i −
n−1 i=1
n

Substitute the given values:


(104.334)2
 
2 1
S = 1132.207 −
9 10
(104.334)2
First, calculate 10
:
10886.8798
(104.334)2 = 10886.8798 ⇒ = 1088.68798
10
Now substitute this into the variance formula:
1
S2 = (1132.207 − 1088.68798)
9
1
S2 = (43.51902) = 4.8354
9
Step 3: Find the Critical Values from the Chi-Squared Distribution Table
For a 95% confidence interval, the critical values are taken from the chi-squared distri-
bution table with n − 1 = 9 degrees of freedom:

χ29,0.025 ≈ 2.700, χ29,0.975 ≈ 16.919


Group II Inferential statistics Exam 2017-2018

Step 4: Calculate the Confidence Interval for σ 2


The 95% confidence interval for σ 2 is given by:
(n − 1)S 2 (n − 1)S 2
 
,
χ29,0.975 χ29,0.025
Substitute the values:  
9 × 4.8354 9 × 4.8354
,
16.919 2.700
[2.574, 16.129]

Thus, the 95% confidence interval for σ 2 is:


[2.574, 16.129]

B6

(i) Likelihood Function


We are given that X1 , X2 , . . . , Xn are independent random samples from a discrete dis-
tribution with the probability mass function (PMF):

pX (x) = (1 − p)x p, for x = 0, 1, 2, . . .


where p ∈ [0, 1] is the unknown parameter.
The likelihood function L(p) is the product of the PMF for each sample, given by:

n
Y
L(p) = pX (Xi )
i=1

Substituting the given expression for pX (x), we get:

n
Y
L(p) = (1 − p)Xi p
i=1

Since the product involves n terms, we can simplify the expression as:

n
Y
n
L(p) = p (1 − p)Xi
i=1

This can be rewritten as:


Pn
L(p) = pn (1 − p) i=1 Xi

Thus, the likelihood function is:


Pn
L(p) = pn (1 − p) i=1 Xi
Group II Inferential statistics Exam 2017-2018

(ii) Maximum Likelihood Estimator (Continued)


From Part (i), we obtained the likelihood function:
Pn
L(p) = pn (1 − p) i=1 Xi

We want to maximize this likelihood function with respect to p. We took the log-likelihood
function:

n
!
X
ℓ(p) = log(L(p)) = n log(p) + Xi log(1 − p)
i=1

Taking the derivative of ℓ(p) with respect to p and setting it to 0, we found that:
Pn
n Xi
i=1
=
p 1−p
Cross-multiplying and simplifying:

n
X
n − np = p Xi
i=1

n
!
X
n=p n+ Xi
i=1

Solving for p:

n
p= Pn
n+ i=1 Xi

Now, we can rewrite the sum as:

n
X
Xi = nX̄
i=1

Substituting this into the equation for p:

n 1
p= =
n + nX̄ 1 + X̄
Therefore, the maximum likelihood estimator for p is:

1
p̂ =
1 + X̄
Group II Inferential statistics Exam 2017-2018

1−b 1−a

(iii) Showing the Relation Between P (a < p < b) and P b < X̄ < a

We want to show that:


 
1−b 1−a
P (a < p < b) = P < X̄ <
b a

From Part (ii), we know that the maximum likelihood estimator for p is given by:

1
p̂ =
1 + X̄
Now, we want to compute the probability P (a < p < b). Using the relationship between
p and p̂, we have:
 
1
P (a < p < b) = P a< <b
1 + X̄

We will now solve this inequality for X̄.


1. Invert the inequality:

1 1
< 1 + X̄ <
b a
2. Subtract 1 from both sides:

1 1
− 1 < X̄ < − 1
b a
3. Simplify each side:

1−b 1−a
< X̄ <
b a
Thus, we have:
 
1−b 1−a
P (a < p < b) = P < X̄ <
b a

Since X̄ is the sample mean, this relation allows us to calculate the probability for p in
terms of the sample mean X̄.

(iv) - Calculation of the Approximate Probability


We are given n = 100 and p = 0.25, and we are asked to calculate the approximate
probability that 0.22 < p < 0.26.
From the given information, the expectation and variance of a single observation X1 are:
Group II Inferential statistics Exam 2017-2018

1−p 1 − 0.25
E(X1 ) = = =3
p 0.25
1−p 1 − 0.25
Var(X1 ) = 2
= = 0.12
p (0.25)2

Thus, the expectation and variance of the sample mean X̄ are:

1−p 0.12
E(X̄) = = 3, Var(X̄) = = 0.12
p 100

The standard deviation of X̄ is:


SD(X̄) = 0.12 ≈ 0.3464

Now,using the result from part (iii), we can express the probability P (0.22 < p̂ < 0.26)
as:
 
1 − 0.26 1 − 0.22
P (0.22 < p̂ < 0.26) = P < X̄ < .
0.26 0.22

Simplifying the bounds:


 
0.74 0.78
P < X̄ < = P (2.8462 < X̄ < 3.5455).
0.26 0.22

Now, we standardize the bounds using the Z-score formula:

X̄ − E(X̄)
Z= .
SD(X̄)

Given that E(X̄) = 3 and SD(X̄) = 0.3464, we compute the Z-scores:


For X̄ = 2.8462:

2.8462 − 3
Zlower = ≈ −0.444.
0.3464
For X̄ = 3.5455:

3.5455 − 3
Zupper = ≈ 1.576.
0.3464
Using the standard normal distribution table, we find:

P (Z < 1.576) ≈ 0.94235, P (Z < −0.444) ≈ 0.3282.

Thus, the probability is:


Group II Inferential statistics Exam 2017-2018

P (−0.444 < Z < 1.576) = 0.94235 − 0.3282 = 0.61415.

Therefore, the approximate probability that 0.22 < p̂ < 0.26 is:

0.61415 .

B7

(i)
Consider a clinical study on a new treatment for Rhinovirus with n patients. Each
patient has a probability p of recovering, independently of other patients. Let Xi denote
the indicator variable for the i-th patient’s recovery:
(
1 if the i-th patient recovers
Xi =
0 if the i-th patient does not recover

We are testing the following hypotheses:

H0 : p = p0 vs H1 : p > p0

We are interested in finding the expression for the sample proportion p̂ and its approxi-
mate distribution under the null hypothesis H0 .

Estimator of the Recovery Proportion

The sample proportion p̂ is simply the average of the Xi ’s, i.e., the proportion of patients
that recover in the sample. The mathematical expression for the sample proportion is:

n
1X
p̂ = Xi
n i=1

This quantity p̂ represents the observed proportion of recovered patients in the sample.

Approximate Distribution of p̂ under H0

Under the null hypothesis H0 : p = p0 , each Xi follows a Bernoulli distribution with


parameter p0 , i.e.,

Xi ∼ Bernoulli(p0 )

The properties of the Bernoulli distribution are:


Group II Inferential statistics Exam 2017-2018

E[Xi ] = p0 and Var(Xi ) = p0 (1 − p0 )

The sample proportion p̂ is the mean of the Xi ’s:

n
1X
p̂ = Xi
n i=1

By the law of large numbers, p̂ converges to p0 as n increases:

E[p̂] = p0

Now, applying the **Central Limit Theorem (CLT)**, which states that the sample
mean of a large number of independent and identically distributed random variables will
approximate a normal distribution, we find that for large n, the sample proportion p̂
approximately follows a normal distribution:
 
p0 (1 − p0 )
p̂ ∼ N p0 ,
n

Thus, for large n, the distribution of p̂ is approximately normal with mean p0 and variance
p0 (1−p0 )
n
.

Conclusion

In summary, the sample proportion of recovery is given by:

n
1X
p̂ = Xi
n i=1

Under the null hypothesis H0 : p = p0 , for large n, the approximate distribution of p̂ is:
 
p0 (1 − p0 )
p̂ ∼ N p0 ,
n

This result allows us to use the normal approximation for hypothesis testing and confi-
dence intervals regarding the recovery proportion p.

Define the following notation:


p̂ − p0 p̂ − p0
Z1 = q and Z2 = q
p0 (1−p0 ) p̂(1−p̂)
n n

The usual way of testing H0 vs H1 is to reject H0 if Z1 > zα , where zα is the upper α-


quantile of the standard normal distribution N (0, 1). This gives a test with significance
level α1 , and for large n, α1 ≈ α.
Group II Inferential statistics Exam 2017-2018

Another way of testing H0 vs H1 is to reject H0 when Z2 > zα . This gives a test with
significance level α2 , and for large n, α2 ≈ α. Usually, α1 approximates α more closely
than α2 .
It is known that under the testing procedure based on Z2 , H0 is rejected if and only if
p̂ > γ, where:

z2 zα2
 
−b + b2 − 4ac
γ= , with a = 1 + α , b = − 2p0 + , c = p20 .
2a n n

(ii) Rejection Probability in Terms of γ


We want to find the probability of rejecting H0 based on the condition p̂ > γ, where γ is
determined by the quadratic equation.

Step 1: Initial Expression for the Probability

Under the hypothesis test, we reject H0 if p̂ > γ, where γ is defined by the quadratic
formula. The probability of rejecting H0 is:
 
γ−p 
P (p̂ > γ) = P Z > q ,
p(1−p)
n

where Z ∼ N (0, 1) is a standard normal random variable.

Step 2: Simplification of the Expression

To simplify this expression and move the square root of n to the


√ numerator, we manipulate
the fraction inside the probability. Multiply and divide by n to get:

γ−p γ−p √
q =p · n.
p(1−p) p(1 − p)
n

Step 3: Final Expression for the Probability

Thus, the probability of rejecting H0 becomes:


!
γ−p √
P (p̂ > γ) = P Z>p · n .
p(1 − p)

Using the cumulative distribution function Φ of the standard normal distribution, we can
write the probability as:
Group II Inferential statistics Exam 2017-2018

!
γ−p √
P (p̂ > γ) = 1 − Φ p · n .
p(1 − p)

Conclusion

The probability of rejecting H0 under the alternative hypothesis p > p0 is given by:

√ !
n(γ − p)
P (p̂ > γ) = 1 − Φ p .
p(1 − p)

This expression provides the desired form where the square root of n appears in the
numerator, as requested.

(iii) Rejection Probability Based on Z2


We want to compute the approximate probability of rejecting H0 when using the proce-
dure based on Z2 with the given parameters.

Given Parameters

- Significance level α = 0.05,


- Sample size n = 200,
- Hypothesized proportion p0 = 0.3,
- True proportion p = 0.35,
- zα = −1.6449 (critical value for α = 0.05 in the normal distribution table).
We use the following expression for the probability of rejecting H0 based on Z2 :
!
γ−p √
P (p̂ > γ) = 1 − Φ p · n ,
p(1 − p)

where γ is the threshold value determined by the quadratic formula.

Step 1: Compute γ

The threshold value γ is given by the solution of the quadratic equation:



−b + b2 − 4ac
γ= ,
2a
where:
2
- a=1+ zα
n
,
Group II Inferential statistics Exam 2017-2018

 2

- b = − 2p0 + zα
n
,

- c = p20 .
Substituting the known values:
- zα = −1.6449,
- p0 = 0.3,
- n = 200,
we compute:

(−1.6449)2 2.702
a=1+ =1+ = 1 + 0.01352 = 1.01352,
200 200

(−1.6449)2
 
b = − 2(0.3) + = − (0.6 + 0.01352) = −0.61352,
200

c = (0.3)2 = 0.09.

Now, substitute into the quadratic formula:


p
−(−0.61352) + (−0.61352)2 − 4(1.01352)(0.09)
γ= .
2(1.01352)

First, calculate the discriminant:

(−0.61352)2 = 0.3764, 4(1.01352)(0.09) = 0.3648, discriminant = 0.3764−0.3648 = 0.0116.

Thus,

0.61352 + 0.0116 0.61352 + 0.1077 0.72122
γ= = = ≈ 0.3557.
2(1.01352) 2.02702 2.02704

So, γ ≈ 0.3557.

Step 2: Compute the Probability of Rejecting H0

Now that we have γ ≈ 0.3557, we compute the probability of rejecting H0 when the true
proportion is p = 0.35.
The probability of rejecting H0 is:
!
γ−p √
P (p̂ > γ) = 1 − Φ p · n .
p(1 − p)

Substituting the known values:


Group II Inferential statistics Exam 2017-2018

!
0.3557 − 0.35 √
P (p̂ > γ) = 1 − Φ p · 200 .
0.35(1 − 0.35)

First, calculate the term inside the square root:

p √ √
0.35(1 − 0.35) = 0.35 × 0.65 = 0.2275 ≈ 0.4769.

Now calculate:

0.355 − 0.35 √ 0.005 √


· 200 = × 200 ≈ 0.1482.
0.4769 0.4769
Thus, the probability is:

P (p̂ > γ) = 1 − Φ(0.147).

From the standard normal distribution, Φ(0.1482) ≈ 0.5588, so:

P (p̂ > γ) = 1 − 0.5588 = 0.4412.

Conclusion

The approximate probability of rejecting H0 using the procedure based on Z2 is:

0.4412 .

This means that there is approximately **44.12%** chance of rejecting H0 when the true
proportion is p = 0.35, p0 = 0.3, and the sample size is n = 200, with a significance level
of α = 0.05.

(iv) Rejection Condition for p


We are given that the test based on Z2 rejects the null hypothesis H0 if and only if Z2 > zα ,
where zα is the critical value of the standard normal distribution corresponding to the
significance level α. The goal of this proof is to show that this condition is equivalent to
rejecting H0 if and only if p̂ > γ, where γ is a specific threshold that depends on p0 , n,
and zα .

Step 1: Expression for Z2

Recall the expression for the test statistic Z2 :

p̂ − p0
Z2 = q ,
p̂(1−p̂)
n
Group II Inferential statistics Exam 2017-2018

where:
- p̂ is the sample proportion,
- p0 is the hypothesized proportion under the null hypothesis,
- n is the sample size.
Under the testing procedure, we reject H0 if and only if Z2 > zα . This can be rewritten
as:

Z22 > zα2 .

Step 2: Setting Up the Rejection Criterion

To find the equivalent condition in terms of p̂, we start with the expression for Z22 :
 2
p̂ − p0  (p̂ − p0 )2
Z22 =  q = p̂(1−p̂) .
p̂(1−p̂)
n n

Rewriting the rejection condition Z22 > zα2 , we get:

(p̂ − p0 )2
p̂(1−p̂)
> zα2 .
n

Multiplying both sides by p̂(1−p̂)


n
, we obtain:

p̂(1 − p̂)
(p̂ − p0 )2 > zα2 · .
n

Step 3: Solving the Equation

Next, we solve for p̂ by considering the equality case. We set the inequality to equality
to find the critical threshold γ:

p̂(1 − p̂)
(p̂ − p0 )2 = zα2 · .
n
Expanding both sides:

p̂(1 − p̂)
p̂2 − 2p0 p̂ + p20 = zα2 · .
n
Now, multiply out the right-hand side:

zα2
p̂2 − 2p0 p̂ + p20 = (p̂ − p̂2 ).
n
Rearranging all terms involving p̂ to one side:
Group II Inferential statistics Exam 2017-2018

zα2 z2
p̂2 − 2p0 p̂ + p20 = p̂ − α p̂2 .
n n
Now, group the terms involving p̂2 together:

zα2 2 z2
p̂2 + p̂ − 2p0 p̂ + p20 = α p̂.
n n

zα2 zα2
   
2
1+ p̂ − 2p0 + p̂ + p20 = 0.
n n

This is a quadratic equation in p̂, which we solve using the quadratic formula:

−b ± b2 − 4ac
p̂ = ,
2a
where:
2
- a = 1 + znα ,
 2

- b = − 2p0 + zα
n
,

- c = p20 .
Substituting these values into the quadratic formula, we obtain:
r 2  
2
zα 2 2
2p0 + n
± 2p0 + znα − 4 1 + zα
n
p20
p̂ =  2
 .
2 1 + znα

The positive root of this quadratic equation corresponds to the critical value γ, so we
define:

−b + b2 − 4ac
γ= .
2a
Thus, the rejection criterion is:

H0 is rejected if and only if p̂ > γ.

Conclusion

We have shown that under the testing procedure based on Z2 , H0 is rejected if and only
if p̂ > γ, where γ is given by the quadratic formula:

−b + b2 − 4ac
γ= ,
2a
with the coefficients:
Group II Inferential statistics Exam 2017-2018

2
- a = 1 + znα ,
 2

- b = − 2p0 + zα
n
,

- c = p20 .
This shows that the test based on Z2 is equivalent to rejecting H0 when p̂ > γ, as required.

You might also like