Multiple Regression Model
Multiple Regression Model
The Data Generating Process (DGP), or the population, is described by the following linear model.
Y j=β 0 + β 1 X j ,1 + β 2 X j ,2 + ε j
where:
Y j is the j-th observation of the dependent variable Y (it is known)
X j ,1and X j ,2 are the j-th observations of the independent variables X 1 and X 2 (they are known)
β 0 is the intercept term (it is unknown)
β 1 and β 2 are parameters (they are unknown)
ε j is the j-th error, the j-th unobserved factor that, besides X 1 and X 2 , affects Y (it is unknown)
So β 1 is the sensitivity of Y j to variations in X j ,1assuming that the other factors remain constant.
∆Y j
β 1=
∆ X j ,1
If X j ,2 varies of ∆ X j , 2, then Y j varies of
∆ Y j=β 2 ∆ X j ,2
So β 2 is the sensitivity of Y j to variations in X j ,2 assuming that the other factors remain constant.
∆Y j
β 2=
∆ X j ,2
In other words, β j measures the effect of a unit variation of the j-th regressor on Y by keeping
the other regressors (and the unknown factors) constant.
Then
E [ Y j∨X j ,1 , X j , 2=0 ]=β 0 + β 1 X j , 1+ β2 ∙0=¿ β 0 + β 1 X j ,1 ¿
E [ Y j∨X j ,1 , X j , 2=1 ]=β 0 + β 1 X j ,1 + β 2 ∙ 1=¿ β 0 + β 1 X j , 1+ β 2 ¿
β 2 measures the difference between the expected value of Y j when X j ,2=1 and the expected value of
Y j when X j ,2=0, keeping the other regressors constant.
Multiple regression model: the matrix notation
Collect the observations { Y 1 , … , Y T } , {X 1, 1 , … , X T ,1 } and { Y 1 , … , Y T } , {X 1, 2 , … , X T ,2 } into vectors
This is our new DGP. Let’s derive the OLS estimators minimizing the sum of the squared errors.
T
First of all, the objective function is O ( β 0 , β 1 , β 2 )=∑ ε t
2
t =1
You always have to verify that det(X’X) ≠ 0, because if it is not the matrix is not invertible.
Sometimes the determinant can be very close to 0 (ex. when there are regressors that are collinear, that
are high correlated), so even if the inverse can be computed and Matlab computes it, there will be huge
numerical errors and very poor results (closer is the determinant to 0, worst is the performance of the
algorithm that Matlab uses to calculate the inverse). For this reason, when the determinant is very small,
Matlab gives you a “Warning”.
Unbiasedness of the OLS estimators
Assume that
1) the DGP is Y = X β+ ε
2) det(X’X) ≠ 0, so X’X is invertible
3) E [ ε t ] =0
4) E [ ε t∨ X 1 , … , X K ] =E [ ε t ] =0
These 4 assumptions are enough to prove that the OLS estimator is unbiased.
^β can also be expressed as
^β=β + ( X ' X )−1 ( X ' ε )
E [ ^β ]=β
It’s a 3x3 symmetric matrix which has on the main diagonal the conditional variances of the estimators
(there should be “|X” inside the square bracket) and in all the other positions the conditional covariances
between the estimators.
If we assume that
1) the DGP is Y = X β+ ε
2) det(X’X) ≠ 0, so X’X is invertible
3) E [ ε t ] =0
4) E [ ε t∨ X 1 , … , X K ] =E [ ε t ] =0
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε
2 2 2 2 2
Then, the X-conditional variance covariance matrix of the estimator ^β coincides with…
2
−1
∑ ¿ E [( ^β−¿ β )( β−
'
^ β ) ∨ X ]¿ σ 2ε ( X ' X ) ¿
X
If we call ε^ =Y − X β^ =Y −Y
^ the residuals of the OLS regression, then the random variable
T
∑ ε^ 2j SSR
σ^2ε = j=1
=
T −( K +1) T −(K +1)
(where K is the number of the known regressors - then we add 1, that is the known constant regressor
associated to β 0 ¿ is an unbiased estimator of the error’s variance σ ε , in the sense that
2
E [ σ^ ¿ ¿ ε ]=σ ε ¿
2 2
If we have the additional assumption that the errors are i.i.d. r.vs. with normal distribution ε j N ( 0 , σ ε ),
2
then from the fact that ^β is a linear combination of the unknown errors ε
Perfect multicollinearity
M out of K regressors X j ,t (where j = 1, …, M) have perfect multicollinearity if you can find a coefficient
different from 0 such that the linear combination of the M regressors is equal to 0 for all t (namely, if one
regressor can be expressed as a linear function of the others):
∃ λ 0 , … , λ M ∈ R such that λ0 + λ 1 X t ,1 +…+ λ M X t , M =0 t=1 , … , T
If among the regressors X 1 , … , X K two or more of them have perfect multicollinearity, then the
assumption number 2) is not respected, because det(X’X) = 0 and ∄(X’X)-1.
Some observations:
- Multicollinearity is a data problem
- Multicollinearity is more recurrent in small samples (micronumerosity) -> Solution: increase the
number of observations
- More regressors you include in your regression, higher is the probability to have multicollinearity (if a
model with 10 regressors has more or less the same results of a model with 100 regressors, it’s better
to choose the more parsimonious model, but you pay a price: the SSR increases)
1
Multicollinearity implies inflated variance and inflated variance implies deflated t-statistics (therefore, you could say
that the regressor isn’t significant while it is)
In order to find out if there is multicollinearity between the regressors, take the j-th regressor and regress it
on the other explanatory variables:
X t , j=β 0 + β 1 X t ,1 +…+ β j−1 X t , j−1+ …+ β j +1 X t , j+1 + β K X t , K + ε t
If the regressor that I regress on the others is multicollinear with one of the other explanatory variables, the
coefficient of determination of the regression will be very high, because a huge part of the variance of the
multicollinear regressor is explained by another one.
Then the OLS estimator minus its expected value is a gaussian distributed r.v. with variance equal to the j-th
element on the diagonal of the variance covariance matrix
^β −β N ( 0 , σ 2^ )
j j β j
Where
2
2 σε
σ ^β = 2
j
SS T j (1−R j )
( )
T T 2
1
SS T j=∑ X t , j− ∑ X s , j
t =1 T s=1
SST of the j-th regressor
R j is the R of the regression X t , j=β 0 + β 1 X t ,1 +…+ β j−1 X t , j−1+ …+ β j +1 X t , j+1 + β K X t , K + ε t
2 2
2
R obtained regressing one regressor on the others
2 2
If there is multicollinearity, R j →1 and σ ^β →+ ∞ , so the estimates of β j are very noisy!
j
The OLS estimator ^β is BLUE (Best Linear Unbiased Estimator): among all the unbiased linear2 estimators, it
is the one with the least variance.