0% found this document useful (0 votes)
22 views

Midterm 2022 Sol

Uploaded by

vanessalaucode
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Midterm 2022 Sol

Uploaded by

vanessalaucode
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

DO NOT OPEN THIS EXAM UNTIL INSTRUCTED TO

Name: Student #:

STAT 404 Midterm Exam


September–December, 2022
Instructor: Jiahua Chen
Total marks: 68.

• Put your name and student ID on the up-right corner of every sheet.

• Correct answers are usually short. Answer questions in brief but complete sentences.
For example, if we ask: Calculate SStrt , a satisfactory answer is:
The sum of square of the treatment is given by
k
X
SStrt = ni (ȳi· − ȳ·· )2 = 4 × (5 − 3.2)2 + 6 × (2 − 3.2)2 = 21.6.
i=1

An unsatisfactory answer is:


21.6.

• Use R for simple calculations such as the sample mean and sample variance (as in the
assignments). Answers obtained using one-line R functions will not be accepted.

• Save the R code you used in a .doc, .docx, .rtf, or .txt file. Include comments
describing which question the code block is used for. Leave sufficient space between code
for different questions. Submit your code to Canvas when instructed to.

• Unless otherwise specified,

1. assume common notations and model assumptions;

2. use the conventional 5% level for tests, hypothesis for two-sided alternatives, and
95% confidence level.

1
1. [6] List the three principles of design of experiments we discussed in STAT 404.
Explain each principle in 1–2 complete sentences.

Answer.

(a) Randomization: prevents the effect of lurking factors or assigning treatments


to random experiment units (either is fine).

(b) Replication: improves the precision of estimating the treatment effects or re-
peating the same treatment on several experiment units.

(c) Blocking: removes the effect of a factor that is not of interest or grouping simi-
lar experiment units to compare different treatments under similar conditions.

2. [8] The standard two-sample t-test is formulated under strict model assumptions.

(a) [4] Name two of the model assumptions. Describe each assumption in one
sentence.

Answer. Any two of the following (or other relevant assumptions) is acceptable:

• Independence: all observations are independent of each other.

• Normality: all observations have normal distributions.

• Identical means: the two populations have the same mean.

• Identical variances: the two populations have the same variance.

• Identically distributed: all observations in the same sample have the same
distribution.

(b) [4] We recommend the Welch test when two populations have different variances.
Yet, we commented that this test is (1) mathematically invalid but (2) statistically
acceptable. Explain these two points.

Answer.

• The test is mathematically invalid because the test statistic for Welch’s test
does not have a t-distribution.

2
• The test is statistically acceptable (and widely recommended) because the dis-
tribution of Welch’s test is well-approximated by the recommended t-distribution,
which leads to null rejection probabilities close to the nominal level.

3. [20] A linear regression model assumes that the response values in a study can be
expressed as

yi = x ⊤
i β + ϵi for i = 1, 2, . . . , n ,

where x⊤ 2
i β is the expected value of yi and the ϵi ’s are iid N(0, σ ) random variables.

Use the R commands provided in the file “Midterm2022.txt” on the Canvas main
page to load the data. This file also provides a few lines of code to save time.

The dataset contains


• x1 : the assignment mark,
• x2 : the midterm mark, and
• y: the final exam mark
of n = 39 students in some course. Regard x1 and x2 as predictors and y as the
response variable.

Note: in this case, x = (x0 = 1, x1 , x2 )⊤ and β = (β0 , β1 , β2 )⊤ .

(a) [4] Obtain the least squares estimator β̂ of β.

Answer. The LSE of β is

β̂ = (X⊤ X)−1 X⊤ y = (11.9091, 0.4910, 0.4028) .

(b) [4] Estimate the error variance σ 2 (use the method given in class).

Answer. The error variance is estimated as


X
σ̂ 2 = (y − ŷ)2 /(39 − 3) = 54.59 .

3
(c) [4] Estimate the variance matrix of β̂.

Answer. The variance matrix of β̂ is estimated as


 
86.8395 −0.7029 −0.4398
 
2 ⊤ −1
Var(β̂) = σ̂ (X X) = −0.7029 0.0131 −0.0041 .
 
 
−0.4398 −0.0041 0.0104

(d) [4] Estimate the variance of β̂2 − β̂1 (both LS estimators).

Answer. The variance of β̂2 − β̂1 is estimated to be

Var(β̂2 − β̂1 ) = σ̂ 2 (X⊤ X)−1 ⊤ −1 ⊤ −1



22 + (X X)33 − 2(X X)23

= 0.0131 + 0.0104 − 2(−0.0041)

= 0.0317 .

(e) [4] Construct a two-sided, non-simultaneous 95% CI for β1 − β2 (the difference


regression coefficients for the assignment and midterm marks).
Hint: remember the general recipe for constructing CIs.

Answer. The 95% CI for β1 − β2 is given by


q
(β̂1 − β̂2 ) ± qt(0.975, 39 − 3) Var(β̂1 − β̂2 ) = 0.0882 ± 0.361 = [−0.273, 0.449] .

4. [26] Consider a hypothetical one-way layout comparing k = 6 treatments under


standard model assumptions and notations.

Use the R commands provided in the file “Midterm2022.txt” on the Canvas main
page to load the data. This file also provides a few lines of code to save time.

(a) [4] Compute the treatment sum of squares SStrt .

Answer. The mean responses of these treatments are

1.5675, 1.7450, 2.0150, 1.3800, 1.5475, 1.6375

4
and the grand mean is 1.64875. We find
6
X
SStrt = [4(ȳi − ȳ)2 ] = 0.93 .
i=1

(b) [4] Compute the error (or residual) sum of squares SSerr .

Answer. Let s2i be the sample variance for treatment i. We compute it as


6
X
SSerr = 3 s2i = 0.0822 .
i=1

(c) [4] Complete the one-way layout ANOVA table. Not every cell needs to be filled.

Answer.

Source DF SS MSS F
Treatment 5 0.9304 0.1861 40.7367
Error 18 0.0822 0.00457
Total 23 1.0127

(d) [4] Test the hypothesis that all treatment means are equal at the 10% level.
State the null and alternative hypotheses, the test statistic and its reference distri-
bution, and your conclusions.

Answer. The null hypothesis is that all treatment means are equal, i.e.,

H0 : τ1 = · · · = τ6 .

The alternative is that at least two of them are not equal, i.e.,

H1 : τi ̸= τj , i ̸= j .

The test statistic (and its value) is

MSStrt 0.1861
Fobs = = = 40.73 .
MSSerr 0.0046
The reference distribution is F with degrees of freedom (5, 18). The p-value of the
test is
p = 1 − pf(40.73, 5, 18) = 3.38 × 10−9

5
which is below nominal level 0.1. We reject the null hypothesis and conclude that
at least two means are not equal.

(e) [4] Estimate the 6 treatment effects and the error variance (i.e., τ̂j and σ̂ 2 ). Use
complete sentences.

Answer. The estimated effects were calculated in a previous question and are
(τ̂i : i = 1, 2, . . . , 6)

−0.081, 0.096, 0.366, − 0.269, − 0.101, − 0.011 .

The estimated error variance is

s2 = MSSerr = 0.00457 .

(f ) [6] Construct simultaneous 90% CIs for the mean differences using Tukey’s
method. Pretend that you are computing all simultaneous CIs but show only the
first 3 (1 vs 2 ; 1 vs 3 ; 2 vs 3 ) in writing.

Answer. The mean differences are estimated as

(ȳ1 − ȳ2 , ȳ1 − ȳ3 , ȳ2 − ȳ3 ) = (−0.1775, −0.4475, −0.2700) .

The estimated error standard deviation is


p
s = MSSerr = 0.0676 .

Tukey’s 90% quantile is given by

q = qtukey(0.9, 6, 18) = 3.9836 .

We have s 
qs 1 1
√ + = 0.1347 .
2 4 4
Hence, the 90% simultaneous CIs are

1 vs 2 : (−0.312, −0.043) ,

1 vs 3 : (−0.582, −0.313) ,

2 vs 3 : (−0.405, −0.135) .

6
5. [8] The two-sample problem is a special case of the one-way layout. You may find
the following formulas helpful for this question:
n1
X n2
X
2
SStot = (y1j − ȳ·· ) + (y2j − ȳ·· )2 ,
j=1 j=1

SStrt = n1 (ȳ1· − ȳ·· )2 + n2 (ȳ2· − ȳ·· )2 ,


n1
X n2
X
2
SSerr = (y1j − ȳ1· ) + (y2j − ȳ2· )2 .
j=1 j=1

(a) [4] Suppose µ1 ̸= µ2 . Compute E [(ȳ1· − ȳ2· )2 ].

Answer. By independence and the relationship E[X 2 ] = Var(X) + E2 [X], we have

 σ2 σ2
E (ȳ1· − ȳ2· )2 = + (µ1 − µ2 )2 .

+
n1 n2

(b) [4] Prove the formula for the decomposition of the sum of squares:

SStot = SStrt + SSerr .

Remark: while this is not a bonus question, do not start this problem unless you
have extra time.

x2i = − x̄)2 + nx̄2 , we get


P P
Answer. Using the well-known fact i i (xi

n1
X n1
X
2
(y1j − ȳ·· ) = {(y1j − ȳ1· ) + (ȳ1· − ȳ·· )}2
j=1 j=1
Xn1
= (y1j − ȳ1· )2 + n1 (ȳ1· − ȳ·· )2 .
j=1

For the same reason, we have


n2
X n2
X
2
(y2j − ȳ·· ) = (y2j − ȳ2· )2 + n2 (ȳ2· − ȳ·· )2 .
j=1 j=1

We get the desired identity by summing up the two sides. This completes the proof.

You might also like