Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
1 Overview
Y = β0 + β1 X + ε
• Multiple linear regression model: response Y , more than one independent variables
X1 , X2 , ..., Xp
Y = β0 + β1 X1 + β2 X2 + ... + βp Xp + ε
2 Review of Matrices
• an example
Column Column
⎡ 1 2 ⎤
Row 1 100 22
Row 2 ⎣ 300 46 ⎦ .
Row 3 600 81
• ⎡ ⎤
a11 a12 · · · a1j ··· a1c
⎢ a21 a22 · · · a2j ··· a2c ⎥
⎢ ⎥
⎢ .. .. .. .. ⎥
⎢ . . . . ⎥
A=⎢
⎢
⎥.
⎢ ai1 ai2 · · · aij ··· aic ⎥⎥
⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
ar1 ar2 · · · arj ··· arc
1
• r and c are called the dimension of a matrix
• two matrices are equal if they have the same dimension and all the corresponding
elements are equal
Suppose ⎡ ⎤ ⎡ ⎤
a11 a12 17 2
A = ⎣ a21 a22 ⎦ , B = ⎣ 14 5 ⎦ .
a31 a32 13 9
If A = B, then a11 = 17, a12 = 2, ...
• Adding or subtracting of two matrices require that they have the same dimension.
⎡ ⎤ ⎡ ⎤
1 4 1 2
A = ⎣ 2 5 ⎦, B = ⎣ 2 3 ⎦,
3 6 3 4
⎡ ⎤ ⎡ ⎤
1+1 4+2 2 6
A + B = ⎣ 2 + 2 5 + 3 ⎦ = ⎣ 4 8 ⎦,
3+3 6+4 6 10
⎡ ⎤ ⎡ ⎤
1−1 4−2 0 2
A−B=⎣ 2−2 5−3 ⎦=⎣ 0 2 ⎦
3−3 6−4 0 2
2
2.5 Matrix multiplication
2 7
A= ,
9 3
2 7 8 28
4A = A4 = 4 =
9 3 36 12
c
aik bkj
k=1
•
4 2 a1 4a1 + 2a2
A= =
5 8 a2 5a1 + 8a2
• It is easy to check ⎡ ⎤ ⎡ ⎤
1 X1 β0 + β1 X1
⎢ ⎥ ⎢ ⎥
⎢ 1 X2 ⎥ β0 ⎢ β0 + β1 X2 ⎥
⎢ .. ⎥ =⎢ .. ⎥
⎣ . ⎦ β1 ⎣ . ⎦
1 Xn β0 + β1 Xn
• Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
Y1 1 X1 ε1
⎢ ⎥ ⎢
⎢ Y2 ⎥ ⎢ 1 X2 ⎥
⎥ β0
⎢
⎢ ε2 ⎥
⎥
Y=⎢ .. ⎥, X=⎢ .. ⎥, β= , E =⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦ β1 ⎣ . ⎦
Yn 1 Xn εn
Y1 = β0 + β1 X1 + ε1 ,
Y2 = β0 + β1 X2 + ε2 ,
..
.
Yn = β0 + β1 Xn + εn
can be written as
Y = Xβ + E
3
• Other calculations
⎡ ⎤
1 X1
⎢ n
1 1 ··· 1 ⎢ 1 X2 ⎥
⎥ n X
⎥ = n n i
XX= ⎢ .. i=1
2
X1 X2 · · · Xn ⎣ . ⎦ i=1 Xi i=1 Xi
1 Xn
⎡ ⎤ ⎡ ⎤
1 X1 Y1
⎢ 1 X2 ⎥ ⎢ Y2 ⎥ n
⎢ ⎥ ⎢ ⎥ Y
⎥ ⎢ .. ⎥ = n
i
XY=⎢ . i=1
⎣ .. ⎦ ⎣ . ⎦ i=1 Xi Yi
1 Xn Yn
⎡ ⎤
Y1
⎢ ⎥
⎢ Y2 ⎥
2
n
Y Y = Y1 Y2 · · · Yn ⎢ . ⎥ = Yi
⎣ .. ⎦
i=1
Yn
• Diagonal Matrix: a square matrix whose off-diagonal elements are all zeros
⎡ ⎤
a1 0 0
⎣ 0 a2 0 ⎦
0 0 a3
• Identity Matrix ⎡ ⎤
1 0 0
I =⎣ 0 1 0 ⎦
0 0 1
facts: for any appropriate matrix AI = A and IB = B
⎡ ⎤ ⎡ ⎤
0 1
⎢ 0 ⎥ ⎢ 1 ⎥
⎢ ⎥ ⎢ ⎥
0=⎢ .. ⎥, 1=⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦
0 1
4
2.8 Inverse of a square matrix
• the inverse of a square matrix A is another square matrix, denoted by A−1 , such that
AA−1 = A−1 A = I
2 4
A=
3 1
Since
−0.1 0.4 2 4 1 0
= =I
0.3 −0.2 3 1 0 1
or
2 4 −0.1 0.4 1 0
= =I
3 1 0.3 −0.2 0 1
So
−1 −0.1 0.4
A =
0.3 −0.2
• If
a b
A=
c d
then
d −b
−1
A = D
−c
D
a
D D
where D = ad − bc
• For high dimensional matrix, its inverse is not easy to calculate by hand
5
2.11 Use of Inverse Matrix
2y1 + 4y2 = 20
3y1 + y2 = 10
• Estimation a regression model need to solve linear equations, and inverse matri is very
useful.
• A+B = B+A
• C(A + B) = CA + CB
• (A ) = A
• (AB) = B A
• (A−1 )−1 = A
• (A )−1 = (A−1 )
• Random vector ⎡⎤
Y1
Y = ⎣ Y2 ⎦
Y3
6
• Expectation of random vector
⎡ ⎤
E{Y1 }
E{Y} = ⎣ E{Y2 } ⎦
E{Y3 }
• Random vector ⎡ ⎤ ⎡ ⎤
Y1 Z1
Y = ⎣ Y2 ⎦ , Z = ⎣ Z2 ⎦
Y3 Z3
Then
E(Y + Z) = EY + EZ
we have
E{W} = AE{Y}
Var{W} = Var{AY} = AVar{Y}A
and
V ar(c + AY) = Var(AY) = A Var{Y}A
7
3.2 An illustration
•
W1 1 −1 Y1 Y1 − Y2
= =
W2 1 1 Y2 Y1 + Y2
W1 1 −1 E{Y1 } E{Y1 } − E{Y2 }
E = =
W2 1 1 E{Y2 } E{Y1 } + E{Y2 }
•
W1 1 −1 V ar{Y1 } Cov{Y1 , Y2 } 1 1
V ar{ }=
W2 1 1 Cov{Y2 , Y1 } V ar{Y2 } −1 1
The model
Y1 = β0 + β1 X1 + ε1
Y2 = β0 + β1 X2 + ε2
..
.
Yn = β0 + β1 Xn + εn
with assumption
1. E(εi ) = 0,
Y = Xβ + E
Note that ⎡ ⎤
σ2 0 0 · · · 0
⎢ 0 σ2 0 · · · 0 ⎥
⎢ ⎥
E{E} = 0, Var{E} = ⎢ .. .. .. .. ⎥ = σ2 I
⎣ . . . . ⎦
0 0 0 ··· σ2
The assumptions can be rewritten as
1. E(E) = 0,
2. V ar(E) = σ 2 I
3. E ∼ N (0, σ 2 I)
8
Thus E(Y) = Xβ and V ar(Y) = σ 2 I. The model (with assumptions 1, 2, and 3.) can also
be written as
Y ∼ N(Xβ, σ 2 I)
or
Y = Xβ + E, E ∼ N(0, σ 2 I)
• ⎡ ⎤
1 X1
⎢ n
1 1 ··· 1 ⎢ 1 X2 ⎥
⎥ n X
⎥ = n n i
XX= ⎢ .. i=1
2
X1 X2 · · · Xn ⎣ . ⎦ i=1 Xi i=1 Xi
1 Xn
⎡ ⎤⎡ ⎤
1 X1 Y1
⎢ 1 X2 ⎥ ⎢ ⎥ n
⎢ ⎥ ⎢ Y2 ⎥ Y
⎥ ⎢ .. ⎥ = n
i
XY=⎢ .. i=1
⎣ . ⎦⎣ . ⎦ i=1 Xi Yi
1 Xn Yn
• let
b0
b=
b1
Then the normal equation is
X Xb = X Y
• we can find b by
b = (X X)−1 X Y
4.2 An example
• ⎡ ⎤ ⎡ ⎤
16 1 4
⎢ 5 ⎥ ⎢ 1 1 ⎥
⎢ ⎥ ⎢ ⎥
⎢ 10 ⎥ ⎢ 1 2 ⎥
Y=⎢
⎢
⎥;
⎥ X=⎢
⎢
⎥
⎥
⎢ 15 ⎥ ⎢ 1 3 ⎥
⎣ 13 ⎦ ⎣ 1 3 ⎦
22 1 4
9
•
6 17 81
XX= ,X Y =
17 55 261
• −1
6 17 81
b=
17 55 261
• ⎡ ⎤ ⎡ ⎤
Ŷ1 b0 + b1 X1
⎢ Ŷ2 ⎥ ⎢ b0 + b1 X2 ⎥
⎢ ⎥ ⎢ ⎥
Ŷ = ⎢ .. ⎥=⎢ .. ⎥ = Xb
⎣ . ⎦ ⎣ . ⎦
Ŷn b0 + b1 Xn
• ⎡ ⎤
e1
⎢ e2 ⎥
⎢ ⎥
e = Y − Ŷ = ⎢ .. ⎥
⎣ . ⎦
en
•
Ŷ = X(X X)−1 X Y
• Denote X(X X)−1 X by H, we have
Ŷ = HY, e = (I − H)Y
• Var{Y} = Var{ε} = σ 2 I
• (I − H) = I − H = I − H
• HH = X(X X)−1 X X(X X)−1 X = X(X X)−1 X = H
• (I − H)(I − H) = I − 2H + HH = I − H
• Var{e} = σ 2 (I − H) estimated by σ̂ 2 (I − H)
10
4.5 Analysis of variance in matrix form
⎡ ⎤
1
⎢ 1 ⎥
⎢ ⎥
• Let 1 = ⎢ .. ⎥ then
⎣ . ⎦
1
⎡ ⎤
1 ··· 1
⎢ .. .. ⎥ = 11 .
J=⎣ . . ⎦
1 ··· 1
•
1 1
SST = Y Y − Y JY = Y (I − J)Y
n n
[Proof
n
2 2 ( ni=1 Yi )2
SST = (Yi − Ȳ ) = Yi −
n
i=1 i=1
n
n
YY= Yi2 , 1
1 Y=Y1= Yi
i=1 i=1
n
( Yi )2 = Y 11 Y = Y JY
i=1
n
SSE = e2i = e e = (Y − Xb) (Y − Xb) = Y (I − H)Y
i=1
[Proof
SSE = (Y − Xb) (Y − Xb) = Y Y − 2b X Y + b X Xb
= Y Y − 2b X Y + b X X(X X)−1 X Y
= Y Y − 2b X Y + b X Y = Y Y − b X Y
= Y (I − H)Y
•
1
SSR = SST − SSE = Y (H − J)Y
n
11
4.6 Variance-covariance matrix for b
•
b = (X X)−1 X Y
Var{b} = (X X)−1 X Var{Y}X(X X)−1 = σ 2 (X X)−1
1 2
+ n X̄ 2
n −X̄ 2
(X − (X −
= σ2
n i=1 i X̄) i=1 i X̄)
−X̄ n 1 n
2 2
i=1 (Xi −X̄) i=1 (Xi −X̄)
12