0% found this document useful (0 votes)
2 views

SampleFinal2021.sol(1)

The document is a sample final exam for STAT 404: Design of Experiment at the University of British Columbia. It covers various topics including design principles like randomization, blocking, and replication, as well as analysis techniques such as ANCOVA and ANOVA. The exam consists of theoretical questions and practical applications involving statistical calculations and interpretations.

Uploaded by

Claire Jiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

SampleFinal2021.sol(1)

The document is a sample final exam for STAT 404: Design of Experiment at the University of British Columbia. It covers various topics including design principles like randomization, blocking, and replication, as well as analysis techniques such as ANCOVA and ANOVA. The exam consists of theoretical questions and practical applications involving statistical calculations and interpretations.

Uploaded by

Claire Jiang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

THE UNIVERSITY OF BRITISH COLUMBIA

Sample Final
STAT 404: Design of Experiment

Preamble: These problems are taken and slightly revised from the historical
final exams. Partial solutions (e.g., answering in incomplete sentences) are
only provided for some questions and may be insufficient for full marks on a
real exam. Ask in office hours if something is unclear.

1. We emphasize three design of experiment dogmas in Stat 404: random-


ization, blocking and replication.
Answer the following questions with FULL sentences.
(a) which one helps to prevent the influence of lurking factors?
The randomization helps to prevent the influence of lurking factors.
(b) which one helps to eliminate the influence of known potential factors
in the data analysis?
The blocking helps to eliminate the influence of known potential factors
in the data analysis.
(c) which one is most obviously employed in the paired experiment?
The blocking is most obviously employed in the paired experiment.
2. One type of designs in this course is called balanced incomplete block
design as opposed to randomized complete block design.
(a) What it means by incomplete in this context?
Not every treatment is observed in each block.
(b) What it means by balanced in this context?
Every level-combination in a block occurs the same number of times.
3. We wish to compare three treatments by an experiment and we applied
them to 6, 10, and 15 experiment units after proper randomization.
(a) What is such a design called in this course?
One-way layout.
(b) Suppose the three treatments are applied to 10, 10, and 11 units,
instead. What might be the benefit or harm of the new scheme? Giving
one point with proper justification suffices.

1
Power for our F test may increase or decrease depending on the treat-
ment effects.
(c) Suppose only as an after fact, we realize that an attribute (covariate)
of the experiment unit has potential influence to the response value.
What is the name of the data analysis strategy we recommend in this
course?
ANCOVA.

4. Suppose in a one-way layout with k = 4 treatments, it is suspected


that treatments 1 and 4 have most extreme mean responses. Hence,
it is decided to have n1 = 2n4 = 2n2 = 2n3 . Suppose we have τi = i,
i = 1, 2, 3, 4 and σ 2 = 10.
(a) What is the distribution of the F-statistic for H0 : τ1 = τ2 = τ3 =
τ4 ?
It helps to write n1 = 2n and n2 = n3 = n4 = n for some n. Under
H0 , the F-statistics follows an F distribution with degrees of freedom
3 and 5n − 4.
(b) What is the minimum required number of experimental runs (same
as number of experiment units) in order to achieve 85% power for a 5%
level test (namely, F-test)?
P4
ni τi
We first compute the non-central parameter value δ. Let τ̄ = Pi=1
4 ,
i=1 ni
we find
4
1 X
δ= 2 ni (τi − τ̄ )2 = 0.68n.
σ i=1
The powers corresponding to hypothetical sample sizes with n = 10 to
n = 19 are respectively 0.5365, 0.5848, 0.6299, 0.6716, 0.7098, 0.7446,
0.7761, 0.8044, 0.8298, and 0.8523. The minimum sample size required
to achieve at least 85% power is when n = 19, i.e., the total number of
units being N = 38 + 19 + 19 + 19 = 95.
R code:

tau = 1:4
nn = c(2, 1, 1, 1)
tau.bar = sum(tau*nn) / sum(nn)
sigma2 = 10

2
delta = sum(nn*(tau-tau.bar)^2) / sigma2
nn = 10:20
k = 4
N = (2+1+1+1)*nn
qq = qf(0.95, k-1, N-k)
pw = pf(qq, k-1, N-k, delta*nn, lower.tail=F)
print(round(pw, 4))

5. An experiment was conducted to compare the differences in growth


among four different cultivars of a house plant. The greenhouse had
three benches in three locations. Two pots for each cultivar were ran-
dom assigned to each bench.
The response variable is the height of plants after certain number of
days. The effects of two factors: bench and cultivar are of interest.
Pot I A B C D
Bench I 42.94 44.88 39.23 38.39
Bench II 37.31 46.36 31.67 28.49
Bench III 40.17 45.10 37.33 28.61
Pot II A B C D
Bench I 36.57 44.24 36.05 34.26
Bench II 33.25 46.84 31.46 28.10
Bench III 45.05 47.56 38.83 36.20

yy1 = matrix(c(42.94, 44.88, 39.23, 38.39,


37.31, 46.36, 31.67, 28.49,
40.17, 45.10, 37.33, 28.61), 3, 4, byrow = T)
yy2 = matrix(c(36.57, 44.24, 36.05, 34.26,
33.25, 46.84, 31.46, 28.10,
45.05, 47.56, 38.83, 36.20), 3, 4, byrow = T)

Double check whether the matrices are what you think they are.
(a) Estimate the treatment effects of both factors and their interactions.
Let βi for i ∈ {A, B, C, D} denote the effect of culvitars, let αj for
j ∈ {I, II, III} denote the effects of beches, and let ωij denote the

3
interaction effect between culvitar i and bench j. The estimates for
the main effects are given by

β̂i = ȳi.. − ȳ... ,


α̂j = ȳ.j. − ȳ...

and the estimates for the interaction effects are given by

ω̂ij = ȳij. − ȳi.. − ȳj.. + ȳ... .

The estimates are

αI = 1.282917 αII = −2.852083 αIII = 1.569167,


βA = 0.9279167 βB = 7.5429167 βC = −2.5254167 βD = −5.9454167,
ωA,I = −0.7429167 ωA,II = −1.0829167 ωA,III = 1.8258333
ωB,I = −2.552917 ωB,II = 3.622083 ωB,III = −1.069167
ωC,I = 0.5954167 ωC,II = −1.3445833 ωC,III = 0.7491667
ωD,I = 2.700417 ωD,II = −1.194583 ωD,III = −1.505833.

R code:

mean.trt = colMeans(rbind(yy1,yy2))
mean.bench = rowMeans(cbind(yy1, yy2))
mean.int = (yy1 + yy2)/2
ybar = mean(mean.trt)
beta <- mean.trt - ybar
alpha <- mean.bench - ybar
omega <- t(t(mean.int - mean.bench) - mean.trt)+ ybar

(b) Estimate the variance of the experimental error.


The variance of the experimental error is estimated by σ̂ 2 = MS(Err)=7.282921.
R code:

I = 4
J = 3
n = 2
SS.trt = n*J*sum((mean.trt-ybar)^2)

4
SS.bench = n*I*sum((mean.bench-ybar)^2)
SS.int = n*sum((mean.int-
matrix(rep(mean.bench,I),J,I,byrow=F)-
matrix(rep(mean.trt,J),J,I,byrow=T)+
ybar)^2)
SS.total = sum((yy1-ybar)^2) + sum((yy2-ybar)^2)
SS.err = SS.total - SS.bench - SS.trt - SS.int
MS.trt = SS.trt / (I-1)
MS.bench = SS.bench / (J-1)
MS.int = SS.int / ((I-1)*(J-1))
MS.err = SS.err / (I*J*(n-1))

(c) Construct the complete the analysis of variance table (including


p-values for testing the standard hypotheses).
The ANOVA table is
Source DF SS MSS F p-value
Culvitar 3 596.8940 198.9647 27.31935 1.201008 × 10−5
Bench 2 97.94031 48.97015 6.723972 0.01099439
C×B 6 79.09756 13.18293 1.810115 0.1797035
Error 12 87.39505 7.282921
Total 23 861.3269

R code:

F.trt = MS.trt / MS.err


F.bench = MS.bench / MS.err
F.int = MS.int / MS.err
pf(F.trt, I-1, I*J*(n-1), lower.tail=F)
pf(F.bench, J-1, I*J*(n-1), lower.tail=F)
pf(F.int, (I-1)*(J-1), I*J*(n-1), lower.tail=F)

(d) Construct a two-sided 95% confidence interval for effect βA that of


cultivar A. Remark: not simultaneous CI.

5
The variance of β̂A = ȳA.. − ȳ... is

Var(ȳA.. − ȳ... ) = Var(ȳA.. ) + Var(ȳ... ) − 2Cov(ȳA.. , ȳ... )


σ2 σ2 2
= + − Var(ȳA.. )
nJ nIJ I
(I − 1)σ 2
=
nIJ
2
σ
= .
8

Hence, a 95% confidence interval for the main effect αA is


r
σ2
α̂A ± t12 (0.975) = (−1.150955, 3.006789) .
8
R code:

t.cv = qt(0.975, 12)


mean.trt[1]-ybar-t.cv*sqrt(MS.err/8)
mean.trt[1]-ybar+t.cv*sqrt(MS.err/8)

6. A fractional factorial design with 29−3 runs is needed to accommodate


factors named 1, 2, 3, 4, 5, 6, 7, 8, and 9, all at 2 levels.
Two regular fractional factorial designs are respectively defined by the
following two sets of defining relations:
(1) 7 = 124; 8 = 257; 9 = 1457.
(2) 7 = 125; 8 = 246; 9 = 356.
(a) Work out the defining contrasts subgroup of both designs;
I = 1247 = 2578 = 14579 = 1458 = 259 = 12489 = 789
I = 1257 = 2468 = 3569 = 145678 = 123679 = 234589 = 134789
(b) Determine the resolutions of these two designs.
3 for (1) and 4 for (2).
(c) Find all factors (main or interaction) that are aliased with factor 2
in Design (2).
2 = 157 = 468 = 23569 = 1245678 = 13679 = 34589 = 1234789

6
7. Following the last question.
Suppose 8 and 9 are two blocking factors and denote them as B1 and
B2 instead. The defining relations in (2) become
(2∗ ) 7 = 125; B1 = 246, B2 = 356;
(a) how many blocks does this design have?
4.
(b) How many runs are there in each block?
16.

8. An experimenter obtained eight yields (yield = response) for the design


given in the following table. Note that the factors are names as 1, 2,
3, 4, and 5 instead of A, B, C, D, and E. There are no replicates.
The defining contrast subgroup is given by

I = 123 = 345 = 1245.

Using the rule that lower order interactions are more likely significant.

run 1 2 3 4 5 yield
1 − − + − −
2 + + + − −
3 − + − − +
4 + − − − +
5 − + − + −
6 + − − + −
7 − − + + +
8 + + + + +

coln1 = c(-1, 1, -1, 1, -1, 1, -1, 1)


coln2 = c(-1, 1, 1, -1, 1, -1, -1, 1)
coln3 = c(1, 1, -1, -1, -1, -1, 1, 1)
coln4 = c(-1, -1, -1, -1, 1, 1, 1, 1)
coln5 = c(-1, -1, 1, 1, -1, -1, 1, 1)
yy = c(7.429, 1.596, 4.686, -0.087, -2.185, 8.304, -3.140, 9.003)

7
(a) Which interactions are aliased with two factor interaction 14?
14 = 234 = 135 = 25
(b) Create a two-factor interaction plot for factors 1 and 4.
Ignore other factors that aliased with it). Whichever way will be ac-
cepted. You may find draw it by hand is less time consuming. However,
be sure to have axises and scales clearly marked.

R code:

interaction.plot(coln1, coln4, yy,


type="b", col=c("red","blue"), legend=F,
lty=c(1,2), lwd=2, pch=c(18,24),
xlab="Factor 1", ylab="Yield")
legend("bottomright", c("-1","1"),bty="n",lty=c(1,2),
lwd=2,pch=c(18,24), col=c("red","blue"),
title="Factor 4", inset = .02)

(c) Obtain the estimates of all factors (or aliased set of factors).

8
The estimate for the main effect of factor i is given by

µ̂i = ȳi=+ − ȳi=− .

The estimates of main effects are therefore

µ̂1 = 3.0065, µ̂2 = 0.1485, µ̂3 = 1.0425, µ̂4 = −0.4105, µ̂5 = −1.1705.

The main factors have the following alises

1 = 23 = 1345 = 245
2 = 13 = 2345 = 145
3 = 12 = 45 = 12345
4 = 1234 = 35 = 125
5 = 1235 = 34 = 124

We have the following interaction terms that are not aliased

14 = 234 = 135 = 25
15 = 235 = 134 = 24.

The estimates for these two interaction effects are µ̂14 = 8.3095 and
µ̂15 = 0.6785.
R code:

N = 8
n = N / 2
mu1 = sum(yy*coln1) / n
mu2 = sum(yy*coln2) / n
mu3 = sum(yy*coln3) / n
mu4 = sum(yy*coln4) / n
mu5 = sum(yy*coln5) / n

mu14 = sum(yy*coln1*coln4) / n
mu15 = sum(yy*coln1*coln5) / n

(d) If the variance of each observation (i.e., each run) is σ 2 = 1.22 ,


what is the standard deviation of the effect estimators?

9
The variance of each effect estimator is
σ2
Var(ȳi=+ − ȳi=− ) = 2Var(ȳi=+ ) = .
2
q
2 √
Hence, the standard deviation is σ2 = 1.2/ 2 = 0.8485.
(e) which of the main effects are significant at 5% level assuming σ 2 =
1.22 ?
An effect is significant if its absolute value is greater than 97.5% critical
value of the standard normal distribution, which is 1.96. Thus, both
µ̂1 and µ̂14 are significant at the 5% level.
R code:

muhat <- c(mu1, mu2, mu3, mu4, mu5, mu14, mu15)


abs(muhat)>qnorm(0.975)

(f) What are the observed value and fitted value at run 4?
The observed value at run 4 is -0.087. Only keep the significant factors,
we have the fitted value being

ŷ = 3.20075 + (3.0065/2)x1 + (8.3095/2)x1 x4 .

In run 4, we have x1 = 1 and x4 = −1, and therefore the fitted value


is 0.54925.

9. Challenge: only if you have extra time.


In the context of the analysis of covariance discussed in this course, we
postulate the model

yij = η + τi + β(xij − x̄·· ) + ij

with other conventional assumptions omitted here.


Derive the expression of the least square estimator of β with sufficient
details and a well organized presentation.

10

You might also like