WST 311 Notes part 2 2024
WST 311 Notes part 2 2024
WST 311
© Copyright reserved
Contents
4 ONE-WAY ANALYSIS-OF-VARIANCE 19
5 ANALYSIS-OF-COVARIANCE 20
Part I
PART II - THE LINEAR MODEL
1
1 SIMPLE LINEAR REGRESSION
TEXTBOOK Chapter 6
Pages 127 to 136.
yi = 0 + 1 xi + "i ; i = 1; 2; : : : ; n:
2
1.2 Estimation of 0, 1 and
De…ne the deviation ", also referred to as residual or error, as
"=y X
2
or
"i = yi yi
= yi 0 1 xi
for i = 1; 2; : : : ; n:
Theorem 1 In the least-squares estimation approach the sum of squared deviations is minimized: That is, b 0 and
b are the values that minimize
1
n
X 2 n
X n
X
yi b b xi = (yi
2
ybi ) = b "0 b
"2i = b ":
0 1
i=1 i=1 i=1
and
b =y b x:
0 1
Proof. See textbook on page 128. Refer to Multiple linear regression for a general proof.
Residual plots are used to test the regression assumptions as well as the use of the correct functional form.
The estimated residual for observation i is b
"i = yi ybi .
1. A plot of b
"i against xi ; i = 1; 2; : : : ; n tests for the correct functional form.
2. A plot of b
"i against ybi and xi ; i = 1; 2; : : : ; n tests for constant variance.
3. A plot of b
"i against the time order in which historical data has been observed evaluate
independence. Testing for independence is excluded in the scope of this module.
The test for normality can be done by applying the Shapiro-Wilk or Kolmogorov-Smirno¤ test for normality to
the residuals.
Take note of equations (6:7) to (6:10), the expected values and variances of the least square estimators.
3
1.3 Hypothesis test and con…dence interval for 1
b
t= p 1 t (n 2)
s= (n 1) sxx
Pn 2
i=1(xi x)
where sxx = :
n 1
b s
1 t =2;n 2 p :
(n 1) sxx
4
Calculate and interpret the coe¢ cient of determination in equation (6:16),
Pn 2
2 SSR (b
yi y)
r = = Pi=1
n 2:
SST i=1 (yi y)
5
2 MULTIPLE LINEAR REGRESSION: Estimation
TEXTBOOK Chapter 7
Pages
137-146 Take note of Example 7.3.1b; leave Example 7.3.2a
149-151 Section 7.3.3 until end of Corollary 1 on page 151; Example 7.3.3
157-159 Section 7.6 until end of Theorem 7.6b
161 Equation (7:56)
162 Example 7.7
182-184 Exercise 7.54
2.1 Introduction
The aim of multiple linear regression is to predict the outcome of a dependent or response variable y based on
a linear relationship with several independent or predictor variables x1 ; x2 ; : : : ; xk :
The model is linear in the ’s (parameters) but not necessarily in the x’s.
The four assumptions of the model are
1. E ("i ) = 0 for all i = 1; 2; : : : ; n; or, equivalently, E (yi ) = 0 + 1 xi1 + 2 xi2 + + k xik :
2. var ("i ) = 2 for all i = 1; 2; : : : ; n; or, equivalently, var (yi ) = 2 :
3. cov ("i ; "j ) = 0 for all i 6= j; or, equivalently, cov (yi ; yj ) = 0:
4. "i N 0; 2 ; or, equivalently, the yi0 s are independent normal variables.
6
The regression assumptions expressed in matrix notation is
2 2
y Nn X ; In or " Nn 0; In :
2
2.3 Estimation of and
2.3.1 Least-squares estimator for
b = X 0X 1
X 0 y:
Proof: 0
"0 b
We have that b " = y Xb y Xb
0 0
= y0 y 2 b X 0y + b X 0X b :
7
The normal equations, equation (7:8) is
X 0 X b = X 0 y:
8
The model …tted to the data is
yb = b 0 + b 1 x1 + b 2 x2 + + b k xk :
Interpretation of b 0 (the estimate for 0 ): If all the predictor values are equal to zero it is estimated that y
will on average be equal to b 0 .
Interpretation of b j (the estimate for j ),
j = 1; 2; : : : ; k: For every unit increase in xj , whilst keeping all the
other predictors constant, it is estimated that y will increase (or decrease) on average by b j :
The predicted value for the dependent variable given values of the independent variable is
The option /p clm cli in the model statement of PROC GLM can be used in a similar way as was done in
simple linear regression to calculate predicted values for y given values of the predictors. This will also give
95% con…dence intervals for the mean predicted value (clm) and individual predicted value (cli).
The option /clparm in the model statement of PROC GLM will give 95% con…dence intervals for the parameter
estimates.
Work through Example 7.2 and Example 7.3.1a (Question a in Example C2).
Theorem 7.3b
If E (y) = X , then E b = :
Proof:
1
E b = E X 0X X 0y
1
= X 0X X 0 E (y)
1
= X 0X X 0X
= :
Theorem 7.3c
1
If cov (y; y 0 ) = 2 I, the covariance matrix for b is given by 2
X 0X :
Proof: h i0
0 1 1
cov b ; b = cov X 0 X X 0 y; X 0 X X 0y
1 1
= X 0X X 0 cov (y; y 0 ) X X 0 X
1 1
= X 0X X 0 2I X X 0X
1 1
= 2 X 0X X 0X X 0X
1
= 2 X 0X :
9
The results from Theorem 7.3b and 7.3c also follows from the assumptions of multiple regression analysis that
y Nn X ; 2 I n :
1 1
Therefore b = X 0 X X 0y Nk+1 ; X 0X 2
.
The regression assumptions as well as the use of the correct functional form can be tested by making residual
plots. The residual for observation i is b
"i = yi ybi .
1. A plot of b
"i against xij ; i = 1; 2; : : : ; n and j = 1; 2; : : : ; k tests for the correct functional form.
2. A plot of b
"i against ybi and xij ; i = 1; 2; : : : ; n and j = 1; 2; : : : ; k tests for constant variance.
3. A plot of b
"i against the time order in which historical data has been observed tests for
independence. This part is excluded from the module.
The test for normality can be done by applying the Shapiro-Wilk or Kolmogorov-Smirno¤ test for normality to
the residuals.
2
2.3.3 An estimator for
The M SE is
SSE
s2 = M SE = :
n k 1
10
Since y Nn X ; 2 I n , 1
(y X ) Nn (0; I n ) : From the Lemma given above
h i
0 1
1
2 (y X ) I n X X 0X X 0 (y X )
h i h i
1 1
= 1 0
2y I n X X 0X X 0 y + 12 0 X 0 I n X X 0 X X0 X
h i
0 1
1
22 X 0 I n X X 0X X0 y
h i
1
= 1 0
2y I n X X 0X X0 y
SSE
= 2
2
has a 1) distribution.
(n k
h i
1
E (SSE) = E y 0 I n X X 0 X X0 y = 2
(n k 1) :
2 SSE
An unbiased estimator for is s2 = M SE = n k 1; that is E (M SE) = 2
:
1 0 1
Since the covariance matrix for b is 2
X 0X , an unbiased estimator for cov b ; b is s2 X 0 X
where s2 = M SE:
2
2.4.2 Maximum likelihood estimators for and
Theorem 7.6a
If y Nn X ; 2 I n , where X : n (k + 1) ; rank (X) = (k + 1) < n; the maximum likelihood estimators
of and 2 are given in equation (7:48) and (7:49), that is
b = X 0X 1 1
X 0y and b2 = (y X b )0 (y X b ):
n
Proof:
h i 1 h i
2 n
2 1 1 1
L ; = (2 ) 2 j I nj 2 exp 2 (y X )0 2
In (y X ) ; 1 < yi < 1
n 1
2
= (2 ) 2 exp 2 2 (y X )0 (y X )
2 n 2 1
ln L ; = ln(2 ) 2
(y X )0 (y X )
2 2
11
Getting the partial derivative of ln L ; 2 to , setting it equal to zero and solving for gives
@ ln L ; 2 2 1
= X 0 (y X ) = 2 X 0 (y X ) = 0
@ 2 2
) X 0y X 0X =0
1
) b = X 0X X 0 y:
For a given , di¤erentiate ln L b ; 2 partially to 2 , set the result equal to zero and solve for 2
. This
gives
@ ln L b ; 2 n 1
= + 2 (y X b )0 (y X b ) = 0
@ 2 2 2 2 ( 2)
n 1
) 2
= 2 (y X b )0 (y X b )
2 2 ( 2)
1
) b2 = (y X b )0 (y X b ):
n
12
2.4.3 Properties of b and b2
Theorem 7.6b
The maximum likelihood estimators for b and b2 have the following distributional properties:
1
i. b Nk+1 ; 2
X 0X .
nb2 SSE (n k 1) s2 2
ii. 2
= 2
= 2
(n k 1).
Proof:
i. Note: We will use Theorem 13 on page 36 of the WST311 notes:
Suppose that X : p 1 has a Np ( ; ) distribution and let the rank of D : q p be q (q p). Then Y = DX
has a Nq (D ; D D 0 ) distribution.
1
Since y Nn X ; 2
I n and b = X 0 X X 0 y it follows from Theorem 13, that
b 1 1 1 1
N X 0X X 0X ; X 0X X0 2
I nX X 0X or b N ; 2
X 0X :
iii. Note: We will use Lemma 3 on page 48 of the WST 311 notes:
Suppose that X : p 1 Np (0; I p ). Let S = X 0 AX with A 0 and Y = BX: If BA = 0, then S and
Y are independent.
1 1 h 1
i 1
Let b = X 0 X X 0 y = By and b2 = y 0 I n X X 0 X X 0 y = y 0 Ay.
n n
h i
1 1
Since BA = X 0 X X 0 I n X X 0X X 0 = 0, b and b2 are independent.
The coe¢ cient of determination or the squared multiple correlation, R2 is calculates as follows
SSR
R2 = :
SST
Note that the positive square root R is called the multiple correlation coe¢ cient.
This gives the proportion of the variation in y that is explained by the multiple regression model. Alternatively:
100R2 % of the variation in y is explained by the predictors in the multiple regression model.
13
Adding a variable x to the model increases the value of R2 : If k is a relatively large fraction of n; it is possible
to have a large value of R2 that is not meaningful. To correct for this tendency, an adjusted R2 ; denoted by
Ra2 is calculated. See PROC REG output. This is brie‡y explained in Section 7.7 of the textbook. You do not
have to know the detail of this section.
Work through Example 7.7 (Question e in Example C2) and Question c in Example C2.
14
3 MULTIPLE LINEAR REGRESSION: Test of Hypotheses
TEXTBOOK Chapter 8
Pages
185
188 and 189 Example 8.1
198-200 Section 8.4.1 up to end of Theorem 8.4b (only part (ii))
202 Example 8.4.1b
204-205 Section 8.5.1, equation (8:40)
209 Example 8.5.2
or alternatively 0 1 0 1
1 0
B 2 C B 0 C
B C B C
H0 : B . C = B .. C
.
@ . A @ . A
k 0
or 0 1
0 1 0 0
1
0 1 0 0 B 1 C 0
B 0 0 1 0 CB C B 0 C
B CB 1 C B C
H0 : B .. .. .. .. .. CB C=B .. C:
@ . . . . . A B .. C @ . A
@ . A
0 0 0 1 0
k
That is
H0 : C =0
where 0 1
0 1 0 0
B 0 0 1 0 C
B C
C=B .. .. .. .. .. C
@ . . . . . A
0 0 0 1
is a k (k + 1) matrix.
15
The Analysis of Variance (ANOVA) table is given in Table 8.1 and is given here as it appears in the SAS output
from PROC GLM.
Source df Sum of Squares(SS) Mean Square(M S) F -value Pr > F
0 SSR=k
Model k SSR = b X 0 y ny 2 SSR=k p-value
SSE= (n k 1)
0
Error n k 1 SSE = y 0 y b X 0 y SSE= (n k 1)
Pn 2
Total n 1 SST = i=1 (yi y)
SSR=k
p-value = P (F > F -value) where F F (k; n k 1) and F -value = :
SSE= (n k 1)
The null hypothesis for the overall regrewwion will be rejected on a 100 % level of signi…cance if
F -value > F (k; n k 1) where is the upper th percentage point of the F distribution or if p-value
< :
0 h i 1
1
SSH Cb C X 0X C0 Cb
2
ii. Under H0 ; 2
= 2
(q) :
h i
1
SSE y0 I n X X 0X X0 y
2
iii. 2
= 2
(n k 1) :
Proof:
i. Note: We will use Theorem 13 on page 36 of the WST311 notes:
Suppose that X : p 1 has a Np ( ; ) distribution and let the rank of D : q p be q (q p). Then Y = DX
has a Nq (D ; D D 0 ) distribution.
h i
1
Since b Nk+1 ; 2 X 0 X the result follows from Theorem 13:
16
ii. Note: We will use Lemma 4 on page 48 of the WST 311 notes:
1
1
Let Y = 2 (X ) Np (0; I p ). The quadratic form S = (X )0 (X ) = Y 0Y 2
(p):
h i
1
Since C b Nq C ; 2 C X 0 X C 0 the result follows from Lemma 4.
iv. From Theorem 7.6b(iii) b and SSE are independent. Since SSH is only a function of b it follows that SSH
and SSE are independent.
Theorem 8.4b(ii)
2
Let y Nn X ; I n and de…ne the statistic
SSH=q
F =
SSE= (n k 1)
where
SSE = y 0 y b 0X 0y
0 h i 1
1
SSH = C b C X 0X C0 Cb :
SSH=q
F = F (q; n k 1) :
SSE= (n k 1)
F F (q; n k 1)
th
where is the signi…cance level of the hypothesis test and the upper percentage point of the F distribution.
The corresponding p-value for the hypothesis test is
SSH=q
p-value = P F (q; n k 1) > :
SSE= (n k 1)
The statistic h i
0 1 1
Cb C X 0X C0 C b =q
F =
SSE= (n k 1)
can be used to test the hypothesis H0 : C = 0 in the following cases:
17
(a) Test for the overall signi…cance of the regression model: H0 : 1 = 2 = = k = 0.
(b) To test one j: H0 : j = 0, j = 0; 1; : : : ; k.
(c) Test for speci…c contrasts: H0 : C = 0.
The statistic in equation (8:40) can be used to test the hypothesis H0 : j = 0; that is reject H0 if
b b
j j
jtj j = p = t =2 (n k 1) :
s gjj stderr of b j
This is a t-test.
Work through Example C3 (Questions h and i). Make sure you know how to calculate and use the p-values to
test hypothesis.
Multicollinearity
This occurs when the independent variables in a regression model are highly correlated.
Extreme cases of multicollinearity can cause the least squares point estimates to be far from the true values of
the regression parameters. This is because the point estimates, b j ’s, measures a partial in‡uence of xj upon
the mean value of the dependent variable.
In the hypothesis H0 : j = 0; the p-value measures the additional importance of the independent variable xj
over the combined importance of the other independent variables in the regression model. Thus multicollinearity
can cause some of the correlated independent variables to appear to be less important than they really are.
18
4 ONE-WAY ANALYSIS-OF-VARIANCE
TEXTBOOK Chapter 12
Pages 295-298.
Example D
OBJECTIVES
You have to be able to do the following.
1. Understand and know how the one-way analysis-of-variance model can be written in di¤erent ways as a linear
model. Understand the use of dummy variables.
2. Work through Example D. Understand all the di¤erent procedures applied in this example and the interpretation
of the results.
19
5 ANALYSIS-OF-COVARIANCE
TEXTBOOK Chapter 16
Pages 443-445.
Example E
OBJECTIVES
You have to be able to do the following.
1. Understand and know how the one-way analysis-of-covariance model can be written in di¤erent ways as a linear
model. Understand the use of dummy variables.
2. Work through Example E. Understand all the di¤erent procedures applied in this example and the interpretation
of the results.
20