0% found this document useful (0 votes)
3 views

WST 311 Notes part 2 2024

This document contains class notes on linear models, specifically focusing on simple and multiple linear regression. It covers topics such as model assumptions, estimation techniques, hypothesis testing, and the coefficient of determination. The notes are structured into sections with detailed explanations and examples to aid understanding of the linear regression concepts.

Uploaded by

kemilymagaisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

WST 311 Notes part 2 2024

This document contains class notes on linear models, specifically focusing on simple and multiple linear regression. It covers topics such as model assumptions, estimation techniques, hypothesis testing, and the coefficient of determination. The notes are structured into sections with detailed explanations and examples to aid understanding of the linear regression concepts.

Uploaded by

kemilymagaisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Department of Statistics

Class notes – Part 2


Linear Model

WST 311

Last revision: April 2024


Revision by: Prof F Kanfer

© Copyright reserved
Contents

I PART II - THE LINEAR MODEL 1

1 SIMPLE LINEAR REGRESSION 2


1.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Estimation of 0 , 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Hypothesis test and con…dence interval for 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Coe¢ cient of determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 MULTIPLE LINEAR REGRESSION: Estimation 6


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 The model . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Estimation of and 2 . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Least-squares estimator for . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Properties of the least-squares estimator b . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.3 An estimator for 2 . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Normal model . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Assumptions . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 11
2
2.4.2 Maximum likelihood estimators for and . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.3 Properties of b and b2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Coe¢ cient of determination, R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 MULTIPLE LINEAR REGRESSION: Test of Hypotheses 15


3.1 Test of overall regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 The general linear hypothesis test for H0 : C = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Testing one j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Additional topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 ONE-WAY ANALYSIS-OF-VARIANCE 19

5 ANALYSIS-OF-COVARIANCE 20

Part I
PART II - THE LINEAR MODEL

1
1 SIMPLE LINEAR REGRESSION
TEXTBOOK Chapter 6
Pages 127 to 136.

Example C (Part 1, C1)

1.1 The model


The expression for the simple linear regression model, equation (6:1) is

yi = 0 + 1 xi + "i ; i = 1; 2; : : : ; n:

The assumptions of the model are


1. E ("i ) = 0 for all i = 1; 2; : : : ; n; or, equivalently, E (yi ) = 0 + 1 xi :
2. var ("i ) = 2 for all i = 1; 2; : : : ; n; or, equivalently, var (yi ) = 2
:
3. cov ("i ; "j ) = 0 for all i 6= j; or, equivalently, cov (yi ; yj ) = 0:
4. yi N 0 + 1 xi ; 2 ; or, equivalently; "i N 0; 2 :
Assumption 2 is also known as the assumption of homoscedasticity, homogeneous variance or constant variance.
When the var ("i ) are not equal, it is known at multicollinearity.

In matrix notation the model can be written as


0 1 0 1
y1 0+ 1 x1 + "1
B y2 C B 0+ x + "2 C
B C B 1 2 C
B .. C = B .. C
@ . A @ . A
yn 0 + 1 xn + "n
0 1 0 1 0 1
y1 1 x1 "1
B y2 C B 1 x2 C B "2 C
B C B C 0 B C
B .. C = B .. .. C +B .. C
@ . A @ . . A 1 @ . A
yn 1 xn "n
y = X + ":

and the assumptuion are


1. E (") = 0:
2. cov (" ; "0 ) = 2 I::
3. " N 0; 2 I :

2
1.2 Estimation of 0, 1 and
De…ne the deviation ", also referred to as residual or error, as

"=y X

2
or

"i = yi yi
= yi 0 1 xi

for i = 1; 2; : : : ; n:

Theorem 1 In the least-squares estimation approach the sum of squared deviations is minimized: That is, b 0 and
b are the values that minimize
1
n
X 2 n
X n
X
yi b b xi = (yi
2
ybi ) = b "0 b
"2i = b ":
0 1
i=1 i=1 i=1

The least-squares estimates for and 1 are


0
Pn Pn
b = Pi=1 xi yi nxy i=1 (xi x) (yi y) sxy
1 n 2 2 = Pn 2 =
i=1 xi nx i=1 (xi x) sxx

and
b =y b x:
0 1

Proof. See textbook on page 128. Refer to Multiple linear regression for a general proof.

The predicted values of the observed yi ’s


ybi = b 0 + b 1 xi :

Residual plots are used to test the regression assumptions as well as the use of the correct functional form.
The estimated residual for observation i is b
"i = yi ybi .
1. A plot of b
"i against xi ; i = 1; 2; : : : ; n tests for the correct functional form.
2. A plot of b
"i against ybi and xi ; i = 1; 2; : : : ; n tests for constant variance.
3. A plot of b
"i against the time order in which historical data has been observed evaluate
independence. Testing for independence is excluded in the scope of this module.
The test for normality can be done by applying the Shapiro-Wilk or Kolmogorov-Smirno¤ test for normality to
the residuals.

Take note of equations (6:7) to (6:10), the expected values and variances of the least square estimators.

The error (or residual) sum of squares (SSE) is


n
X n
X
2
SSE = (yi ybi ) = "2i
b
i=1 i=1

and the mean squared error (M SE) is


SSE
M SE = s2 = :
n 2
Note that E (M SE) = E s2 = 2
:

Work through Example 6.2 in Example C1.

3
1.3 Hypothesis test and con…dence interval for 1

Understand the properties of b 1 and s2 listed in Section 6.3, that is


2
Pn 2
b i=1 (xi x)
1. 1 N 1 ; where s xx = :
(n 1) sxx n 1
(n 2) s2 2
2. 2
(n 2) :

3. b and s2 are independent.These properties will be proven in Chapter 7.


1

Know that under the null hypothesis H0 : 1 = 0,

b
t= p 1 t (n 2)
s= (n 1) sxx
Pn 2
i=1(xi x)
where sxx = :
n 1

Test the hypothesis H0 : 1 = 0 against the alternative H1 : 1 6= 0:

A 100 (1 ) % con…dence interval for 1 is given by

b s
1 t =2;n 2 p :
(n 1) sxx

Work through Example 6.3 in Example C1.

1.4 Coe¢ cient of determination


Xn 2
Know and understand equation (6:17) ; that the total sum of squares SST = (yi y) can be writ-
i=1
Xn 2
ten as the sum of the regression sum of squares SSR = yi y) and the error sum of squares
(b
i=1
Xn 2
SSE = (yi ybi ) :
i=1
That is

SST = SSR + SSE


n
X Xn n
X
2 2 2
(yi y) = yi y) +
(b (yi ybi )
i=1 i=1 i=1
Total variation = Explained variation + Unexplained variation.
Xn Xn
Know that in the regression model yi = ybi = ny:
i=1 i=1
Xn Xn
This follows from the fact that (yi ybi ) = b
"i = 0:
i=1 i=1

4
Calculate and interpret the coe¢ cient of determination in equation (6:16),
Pn 2
2 SSR (b
yi y)
r = = Pi=1
n 2:
SST i=1 (yi y)

Work through Example 6.4 in Example C1.

5
2 MULTIPLE LINEAR REGRESSION: Estimation
TEXTBOOK Chapter 7
Pages
137-146 Take note of Example 7.3.1b; leave Example 7.3.2a
149-151 Section 7.3.3 until end of Corollary 1 on page 151; Example 7.3.3
157-159 Section 7.6 until end of Theorem 7.6b
161 Equation (7:56)
162 Example 7.7
182-184 Exercise 7.54

Example C (Part 2, C2)

2.1 Introduction
The aim of multiple linear regression is to predict the outcome of a dependent or response variable y based on
a linear relationship with several independent or predictor variables x1 ; x2 ; : : : ; xk :

2.2 The model


The expression for the multiple linear regression model, equation (7:3) is given by
yi = 0 + 1 xi1 + 2 xi2 + + k xik + "i ; i = 1; 2; : : : ; n:

The model is linear in the ’s (parameters) but not necessarily in the x’s.
The four assumptions of the model are
1. E ("i ) = 0 for all i = 1; 2; : : : ; n; or, equivalently, E (yi ) = 0 + 1 xi1 + 2 xi2 + + k xik :
2. var ("i ) = 2 for all i = 1; 2; : : : ; n; or, equivalently, var (yi ) = 2 :
3. cov ("i ; "j ) = 0 for all i 6= j; or, equivalently, cov (yi ; yj ) = 0:
4. "i N 0; 2 ; or, equivalently, the yi0 s are independent normal variables.

In matrix notation the model can be written as


0 1 0 1
y1 0 + 1 x11 + 2 x12 + + k x1k
+ "1
B y2 C B 0 + 1 x21 + 2 x22 + + C
k x2k + "2 C
B C B
B .. C = B .. C
@ . A @ . A
yn 0 + 1 xn1 + 2 xn2 + + k xnk + "n
0 1 0 10 1 0 1
y1 1 x11 x12 x1k 0 "1
B y2 C B 1 x21 x22 x2k C B 1 C B "2 C
B C B CB C B C
B .. C = B .. .. .. .. C B .. C + B .. C
@ . A @ . . . . A@ . A @ . A
yn 1 xn1 xn2 xnk k "n
y = X + ":
Note that X : n (k + 1) ; n > k + 1 and rank(X) = k + 1:
The matrix X is the design matrix.

6
The regression assumptions expressed in matrix notation is
2 2
y Nn X ; In or " Nn 0; In :

2
2.3 Estimation of and
2.3.1 Least-squares estimator for

The derivation of the least-squares estimators for is given in Theorem 7.3a.


Theorem 7.3.a
If y = X + ", where X : n (k + 1) and rank(X) = k + 1 < n; then the value of b that minimizes the
Pn 2
"0 b
"i = b
error sum of squares i=1 b " = y X b is
" where b

b = X 0X 1
X 0 y:

Proof: 0
"0 b
We have that b " = y Xb y Xb
0 0
= y0 y 2 b X 0y + b X 0X b :

To …nd b that minimizes b


"0 b " with respect to b and set the result equal to zero. That is
"0 b
", we di¤erentiate b
"0 b
@b "
=0 2X 0 y + 2X 0 X b = 0:
@b
1
This gives the normal equation X 0 X b = X 0 y and from this b = X 0 X X 0 y:
1
We now show that b = X 0 X "0 b
X 0 y minimizes b ". Let b be an alternative estimator of so that
"0 b
b "
0
= (y Xb) (y Xb)
h i0 h i
= y Xb + Xb Xb y Xb + Xb Xb

= (y X b )0 (y X b ) + (X b Xb)0 (X b Xb) + (y X b )0 (X b Xb) + (X b Xb)0 (y X b)


= (y X b )0 (y X b ) + (X b Xb)0 (X b Xb) + 2( b b)0 X 0 (y X b ):
= (y X b )0 (y X b) + (b b)0 X 0 X ( b b):
The last term in the second last step is equal to 0 since
1
X 0 (y X b) = X 0 (y X X 0X X 0 y)
1
= X 0y X 0X X 0X X 0y
= X 0y X 0y
= 0.
From properties of ranks, rank X 0 X = rank (X) = k + 1: Thus, X 0 X > 0 (positive de…nite) and the
quadratic form ( b b)0 X 0 X ( b b) > 0: From this it follows that b " will be a minimum if b = b =
"0 b
1
X 0X X 0 y.

7
The normal equations, equation (7:8) is
X 0 X b = X 0 y:

8
The model …tted to the data is
yb = b 0 + b 1 x1 + b 2 x2 + + b k xk :

Interpretation of b 0 (the estimate for 0 ): If all the predictor values are equal to zero it is estimated that y
will on average be equal to b 0 .
Interpretation of b j (the estimate for j ),
j = 1; 2; : : : ; k: For every unit increase in xj , whilst keeping all the
other predictors constant, it is estimated that y will increase (or decrease) on average by b j :

The predicted value for the dependent variable given values of the independent variable is

ybi = b 0 + b 1 xi1 + b 2 xi2 + + b k xik ; i = 1; 2; : : : ; n:

The option /p clm cli in the model statement of PROC GLM can be used in a similar way as was done in
simple linear regression to calculate predicted values for y given values of the predictors. This will also give
95% con…dence intervals for the mean predicted value (clm) and individual predicted value (cli).

The option /clparm in the model statement of PROC GLM will give 95% con…dence intervals for the parameter
estimates.

Work through Example 7.2 and Example 7.3.1a (Question a in Example C2).

Work through Example 7.3.1b and Question f in Example C2.

2.3.2 Properties of the least-squares estimator b

Theorem 7.3b
If E (y) = X , then E b = :
Proof:
1
E b = E X 0X X 0y
1
= X 0X X 0 E (y)
1
= X 0X X 0X
= :

Theorem 7.3c
1
If cov (y; y 0 ) = 2 I, the covariance matrix for b is given by 2
X 0X :
Proof: h i0
0 1 1
cov b ; b = cov X 0 X X 0 y; X 0 X X 0y
1 1
= X 0X X 0 cov (y; y 0 ) X X 0 X
1 1
= X 0X X 0 2I X X 0X
1 1
= 2 X 0X X 0X X 0X
1
= 2 X 0X :

9
The results from Theorem 7.3b and 7.3c also follows from the assumptions of multiple regression analysis that
y Nn X ; 2 I n :
1 1
Therefore b = X 0 X X 0y Nk+1 ; X 0X 2
.

The regression assumptions as well as the use of the correct functional form can be tested by making residual
plots. The residual for observation i is b
"i = yi ybi .
1. A plot of b
"i against xij ; i = 1; 2; : : : ; n and j = 1; 2; : : : ; k tests for the correct functional form.
2. A plot of b
"i against ybi and xij ; i = 1; 2; : : : ; n and j = 1; 2; : : : ; k tests for constant variance.
3. A plot of b
"i against the time order in which historical data has been observed tests for
independence. This part is excluded from the module.
The test for normality can be done by applying the Shapiro-Wilk or Kolmogorov-Smirno¤ test for normality to
the residuals.

Work through Example 7.3.2b (Question d in Example C2).

2
2.3.3 An estimator for

The SSE can be calculated in any of the following ways:


P
n
2
SSE = (yi ybi )
i=1
0
= y Xb y Xb
0
= y0 y b X 0 y
h i
1
= y0 I n X X 0 X X 0 y.

The M SE is
SSE
s2 = M SE = :
n k 1

Theorem 7.3f (replace textbook proof with this proof)


2
E (M SE) = where M SE = s2 :
Note before proof: We will use Lemma 2 on page 48 of the WST311 notes:
Let X : p 1 Np (0; I p ) and A : p p be symmetrical and idempotent of rank r. Then
S = X 0 AX 2
(r) :
Proof:
1
Since I n X X 0 X X 0 is idempotent
h i h i
1 1
rank I n X X 0 X X0 = tr I n X X 0 X X0
= n tr X(X 0 X) 1 X 0
= n tr X 0 X(X 0 X) 1
= n tr(I k+1 )
= n k 1:

10
Since y Nn X ; 2 I n , 1
(y X ) Nn (0; I n ) : From the Lemma given above
h i
0 1
1
2 (y X ) I n X X 0X X 0 (y X )
h i h i
1 1
= 1 0
2y I n X X 0X X 0 y + 12 0 X 0 I n X X 0 X X0 X
h i
0 1
1
22 X 0 I n X X 0X X0 y
h i
1
= 1 0
2y I n X X 0X X0 y
SSE
= 2

2
has a 1) distribution.
(n k
h i
1
E (SSE) = E y 0 I n X X 0 X X0 y = 2
(n k 1) :
2 SSE
An unbiased estimator for is s2 = M SE = n k 1; that is E (M SE) = 2
:

1 0 1
Since the covariance matrix for b is 2
X 0X , an unbiased estimator for cov b ; b is s2 X 0 X
where s2 = M SE:

Work through Example 7.3.3 (Question b in Example C2).

2.4 Normal model


2.4.1 Assumptions

The regression assumptions expressed in matrix notation is


2 2
y Nn X ; In or " Nn 0; In :

2
2.4.2 Maximum likelihood estimators for and

Theorem 7.6a
If y Nn X ; 2 I n , where X : n (k + 1) ; rank (X) = (k + 1) < n; the maximum likelihood estimators
of and 2 are given in equation (7:48) and (7:49), that is

b = X 0X 1 1
X 0y and b2 = (y X b )0 (y X b ):
n

Proof:
h i 1 h i
2 n
2 1 1 1
L ; = (2 ) 2 j I nj 2 exp 2 (y X )0 2
In (y X ) ; 1 < yi < 1
n 1
2
= (2 ) 2 exp 2 2 (y X )0 (y X )

2 n 2 1
ln L ; = ln(2 ) 2
(y X )0 (y X )
2 2

11
Getting the partial derivative of ln L ; 2 to , setting it equal to zero and solving for gives
@ ln L ; 2 2 1
= X 0 (y X ) = 2 X 0 (y X ) = 0
@ 2 2
) X 0y X 0X =0
1
) b = X 0X X 0 y:

For a given , di¤erentiate ln L b ; 2 partially to 2 , set the result equal to zero and solve for 2
. This
gives
@ ln L b ; 2 n 1
= + 2 (y X b )0 (y X b ) = 0
@ 2 2 2 2 ( 2)
n 1
) 2
= 2 (y X b )0 (y X b )
2 2 ( 2)
1
) b2 = (y X b )0 (y X b ):
n

12
2.4.3 Properties of b and b2

Theorem 7.6b
The maximum likelihood estimators for b and b2 have the following distributional properties:

1
i. b Nk+1 ; 2
X 0X .

nb2 SSE (n k 1) s2 2
ii. 2
= 2
= 2
(n k 1).

iii. b and b2 (or s2 ; or SSE) are independent.

Proof:
i. Note: We will use Theorem 13 on page 36 of the WST311 notes:
Suppose that X : p 1 has a Np ( ; ) distribution and let the rank of D : q p be q (q p). Then Y = DX
has a Nq (D ; D D 0 ) distribution.
1
Since y Nn X ; 2
I n and b = X 0 X X 0 y it follows from Theorem 13, that
b 1 1 1 1
N X 0X X 0X ; X 0X X0 2
I nX X 0X or b N ; 2
X 0X :

ii. Proven in Theorem 7.3f.

iii. Note: We will use Lemma 3 on page 48 of the WST 311 notes:
Suppose that X : p 1 Np (0; I p ). Let S = X 0 AX with A 0 and Y = BX: If BA = 0, then S and
Y are independent.
1 1 h 1
i 1
Let b = X 0 X X 0 y = By and b2 = y 0 I n X X 0 X X 0 y = y 0 Ay.
n n
h i
1 1
Since BA = X 0 X X 0 I n X X 0X X 0 = 0, b and b2 are independent.

2.5 Coe¢ cient of determination, R2


The total sum of squares for the model is
P
n
2 0
SST = (yi y) = (y y1n ) (y y1n ) = y 0 y ny 2 :
i=1

The regression sum of squares for the model is


P
n 0 0
y) = X b Xb y1n = b X 0 y
2
SSR = (b
yi y1n ny 2 = SST SSE:
i=1

The coe¢ cient of determination or the squared multiple correlation, R2 is calculates as follows
SSR
R2 = :
SST

Note that the positive square root R is called the multiple correlation coe¢ cient.
This gives the proportion of the variation in y that is explained by the multiple regression model. Alternatively:
100R2 % of the variation in y is explained by the predictors in the multiple regression model.

13
Adding a variable x to the model increases the value of R2 : If k is a relatively large fraction of n; it is possible
to have a large value of R2 that is not meaningful. To correct for this tendency, an adjusted R2 ; denoted by
Ra2 is calculated. See PROC REG output. This is brie‡y explained in Section 7.7 of the textbook. You do not
have to know the detail of this section.

Work through Example 7.7 (Question e in Example C2) and Question c in Example C2.

14
3 MULTIPLE LINEAR REGRESSION: Test of Hypotheses
TEXTBOOK Chapter 8
Pages
185
188 and 189 Example 8.1
198-200 Section 8.4.1 up to end of Theorem 8.4b (only part (ii))
202 Example 8.4.1b
204-205 Section 8.5.1, equation (8:40)
209 Example 8.5.2

Example C (Part 2, C3)

3.1 Test of overall regression


2
Suppose y Nn X ; I n and
0
=
1
0
where 1 = ( 1 ; 2 ; : : : ; k ) : We wish to test the overall regression hypothesis that none of the x variables
predict y: This can be expressed as

H0 : 1 = 0 against the alternative H1 : 1 6= 0

or alternatively 0 1 0 1
1 0
B 2 C B 0 C
B C B C
H0 : B . C = B .. C
.
@ . A @ . A
k 0
or 0 1
0 1 0 0
1
0 1 0 0 B 1 C 0
B 0 0 1 0 CB C B 0 C
B CB 1 C B C
H0 : B .. .. .. .. .. CB C=B .. C:
@ . . . . . A B .. C @ . A
@ . A
0 0 0 1 0
k

That is
H0 : C =0
where 0 1
0 1 0 0
B 0 0 1 0 C
B C
C=B .. .. .. .. .. C
@ . . . . . A
0 0 0 1
is a k (k + 1) matrix.

15
The Analysis of Variance (ANOVA) table is given in Table 8.1 and is given here as it appears in the SAS output
from PROC GLM.
Source df Sum of Squares(SS) Mean Square(M S) F -value Pr > F
0 SSR=k
Model k SSR = b X 0 y ny 2 SSR=k p-value
SSE= (n k 1)
0
Error n k 1 SSE = y 0 y b X 0 y SSE= (n k 1)
Pn 2
Total n 1 SST = i=1 (yi y)

SSR=k
p-value = P (F > F -value) where F F (k; n k 1) and F -value = :
SSE= (n k 1)
The null hypothesis for the overall regrewwion will be rejected on a 100 % level of signi…cance if
F -value > F (k; n k 1) where is the upper th percentage point of the F distribution or if p-value
< :

Work through Example 8.1 (Question g in Example C3).

3.2 The general linear hypothesis test for H0 : C =0


A general linear hypothesis is given as
H0 : C =0 against H1 : C 6= 0
where C : q (k + 1) is a known coe¢ cient matrix of rank q (k + 1) :

Theorem 8.4a (amended)


Consider the hypothesis H0 : C = 0 versus H1 : C 6= 0 :
2
If y Nn X ; I n and C is q (k + 1) of rank q < k + 1; then
h i
1
i. C b Nq C ; 2
C X 0X C0 :

0 h i 1
1
SSH Cb C X 0X C0 Cb
2
ii. Under H0 ; 2
= 2
(q) :
h i
1
SSE y0 I n X X 0X X0 y
2
iii. 2
= 2
(n k 1) :

iv. SSH and SSE are independent.

Proof:
i. Note: We will use Theorem 13 on page 36 of the WST311 notes:
Suppose that X : p 1 has a Np ( ; ) distribution and let the rank of D : q p be q (q p). Then Y = DX
has a Nq (D ; D D 0 ) distribution.
h i
1
Since b Nk+1 ; 2 X 0 X the result follows from Theorem 13:

16
ii. Note: We will use Lemma 4 on page 48 of the WST 311 notes:
1
1
Let Y = 2 (X ) Np (0; I p ). The quadratic form S = (X )0 (X ) = Y 0Y 2
(p):
h i
1
Since C b Nq C ; 2 C X 0 X C 0 the result follows from Lemma 4.

iii. Proven in Theorem 7.3f.

iv. From Theorem 7.6b(iii) b and SSE are independent. Since SSH is only a function of b it follows that SSH
and SSE are independent.

Theorem 8.4b(ii)
2
Let y Nn X ; I n and de…ne the statistic

SSH=q
F =
SSE= (n k 1)

where
SSE = y 0 y b 0X 0y

and SSH, the sum of squares due to the hypothesis is given by

0 h i 1
1
SSH = C b C X 0X C0 Cb :

Furthermore C is q (k + 1) of rank q < k + 1: If H0 : C = 0 is true then

SSH=q
F = F (q; n k 1) :
SSE= (n k 1)

This is an F -test and H0 will be rejected if

F F (q; n k 1)
th
where is the signi…cance level of the hypothesis test and the upper percentage point of the F distribution.
The corresponding p-value for the hypothesis test is

SSH=q
p-value = P F (q; n k 1) > :
SSE= (n k 1)

The null hypothesis will be rejected if p-value < :

The statistic h i
0 1 1
Cb C X 0X C0 C b =q
F =
SSE= (n k 1)
can be used to test the hypothesis H0 : C = 0 in the following cases:

17
(a) Test for the overall signi…cance of the regression model: H0 : 1 = 2 = = k = 0.
(b) To test one j: H0 : j = 0, j = 0; 1; : : : ; k.
(c) Test for speci…c contrasts: H0 : C = 0.

Work through Example 8.4.1b.

Work through Example C3 (Questions g-j).

3.3 Testing one j

The statistic in equation (8:40) can be used to test the hypothesis H0 : j = 0; that is reject H0 if

b b
j j
jtj j = p = t =2 (n k 1) :
s gjj stderr of b j

This is a t-test.

Work through Example C3 (Questions h and i). Make sure you know how to calculate and use the p-values to
test hypothesis.

3.4 Additional topics


Type I SS and Type III SS
Type I sums of squares (SS), also called sequential sums of squares, are the incremental improvement in error
sums of squares as each e¤ect is added to the model. Type I sum of squares for all e¤ects add up to the model
sum of squares (SSR).
Type III sums of squares (SS), also referred to as partial sums of squares gives the sum of squares that would
be obtained for each variable if it were entered last into the model. That is, the e¤ect of each variable is
evaluated after all other factors have been accounted for.

Multicollinearity
This occurs when the independent variables in a regression model are highly correlated.
Extreme cases of multicollinearity can cause the least squares point estimates to be far from the true values of
the regression parameters. This is because the point estimates, b j ’s, measures a partial in‡uence of xj upon
the mean value of the dependent variable.
In the hypothesis H0 : j = 0; the p-value measures the additional importance of the independent variable xj
over the combined importance of the other independent variables in the regression model. Thus multicollinearity
can cause some of the correlated independent variables to appear to be less important than they really are.

18
4 ONE-WAY ANALYSIS-OF-VARIANCE
TEXTBOOK Chapter 12
Pages 295-298.

Example D

OBJECTIVES
You have to be able to do the following.

1. Understand and know how the one-way analysis-of-variance model can be written in di¤erent ways as a linear
model. Understand the use of dummy variables.

2. Work through Example D. Understand all the di¤erent procedures applied in this example and the interpretation
of the results.

19
5 ANALYSIS-OF-COVARIANCE
TEXTBOOK Chapter 16
Pages 443-445.

Example E

OBJECTIVES
You have to be able to do the following.

1. Understand and know how the one-way analysis-of-covariance model can be written in di¤erent ways as a linear
model. Understand the use of dummy variables.

2. Work through Example E. Understand all the di¤erent procedures applied in this example and the interpretation
of the results.

20

You might also like